java - Checking for normal distribution hypothesis of discrete dataset -

September 15, 2011

i newbie in statistics topic, guess might obvious missing here.

basically examine if double array of integer values (histogram) conforms normal distribution (mean , standard deviation specified) significance level, basing on statistical tests apache commons math.

what understand common way calculate p-value , decide if null hypothesis true or not.

my first "baby" step check if 2 arrays coming same distribution using one-way anova test (second part taken example in documentation):

double samples1[] = new double[100]; double samples2[] = new double[100];  random rand = new random(); (int = 0; < 100000; i++) {     int index1 = (int) (rand.nextgaussian()*5 + 50);     int index2 = (int) (rand.nextgaussian()*5 + 50);     try {         samples1[index1-1]++;     }     catch (arrayindexoutofboundsexception e) {}     try {         samples2[index2-1]++;     }     catch (arrayindexoutofboundsexception e) {} }  list classes = new arraylist<>(); classes.add(samples1); classes.add(samples2);  double pvalue = testutils.onewayanovapvalue(classes); boolean fail = testutils.onewayanovatest(classes, 0.05);  system.out.println(pvalue); system.out.println(fail);

the result is:

1.0 false

assuming significance level 0.05 can deduce hypothesis true (i.e. both arrays same distribution) p > 0.05.

now let's take kolmogorov-smirnov test. example code in documentation shows how check single array against normaldistribution object (that goal). allows check 2 arrays. cannot proper result in both cases. example let's adapt above example k-s:

double samples1[] = new double[100]; double samples2[] = new double[100];  random rand = new random(); (int = 0; < 100000; i++) {     int index1 = (int) (rand.nextgaussian()*5 + 50);     int index2 = (int) (rand.nextgaussian()*5 + 50);     try {         samples1[index1-1]++;     }     catch (arrayindexoutofboundsexception e) {}     try {         samples2[index2-1]++;     }     catch (arrayindexoutofboundsexception e) {} }  double pvalue = testutils.kolmogorovsmirnovtest(samples1, samples2); boolean fail = pvalue < 0.05;  system.out.println(pvalue); system.out.println(fail);

result is:

7.475142727031425e-11 true

my question why p-value of same data small? mean test not suited such type of data?

should i:

generate reference array of normaldistribution (that is, specified mean , standard devition) , compare array using one-way anova test (or other)
somehow adapt data , use k-s compare single array against normaldistribution object

Search This Blog

Plus Code

java - Checking for normal distribution hypothesis of discrete dataset -

Comments

Post a Comment

Popular posts from this blog

r - Trouble relying on third party package imports in my package -

java - Intellij IDEA shortcut How to add new element (ex. class or package)? -

Payment information shows nothing in one page checkout page magento -