java - Checking for normal distribution hypothesis of discrete dataset -
i newbie in statistics topic, guess might obvious missing here.
basically examine if double
array of integer values (histogram) conforms normal distribution (mean , standard deviation specified) significance level, basing on statistical tests apache commons math.
what understand common way calculate p-value , decide if null hypothesis true or not.
my first "baby" step check if 2 arrays coming same distribution using one-way anova test (second part taken example in documentation):
double samples1[] = new double[100]; double samples2[] = new double[100]; random rand = new random(); (int = 0; < 100000; i++) { int index1 = (int) (rand.nextgaussian()*5 + 50); int index2 = (int) (rand.nextgaussian()*5 + 50); try { samples1[index1-1]++; } catch (arrayindexoutofboundsexception e) {} try { samples2[index2-1]++; } catch (arrayindexoutofboundsexception e) {} } list classes = new arraylist<>(); classes.add(samples1); classes.add(samples2); double pvalue = testutils.onewayanovapvalue(classes); boolean fail = testutils.onewayanovatest(classes, 0.05); system.out.println(pvalue); system.out.println(fail);
the result is:
1.0 false
assuming significance level 0.05 can deduce hypothesis true (i.e. both arrays same distribution) p > 0.05
.
now let's take kolmogorov-smirnov test. example code in documentation shows how check single array against normaldistribution
object (that goal). allows check 2 arrays. cannot proper result in both cases. example let's adapt above example k-s:
double samples1[] = new double[100]; double samples2[] = new double[100]; random rand = new random(); (int = 0; < 100000; i++) { int index1 = (int) (rand.nextgaussian()*5 + 50); int index2 = (int) (rand.nextgaussian()*5 + 50); try { samples1[index1-1]++; } catch (arrayindexoutofboundsexception e) {} try { samples2[index2-1]++; } catch (arrayindexoutofboundsexception e) {} } double pvalue = testutils.kolmogorovsmirnovtest(samples1, samples2); boolean fail = pvalue < 0.05; system.out.println(pvalue); system.out.println(fail);
result is:
7.475142727031425e-11 true
my question why p-value of same data small? mean test not suited such type of data?
should i:
- generate reference array of
normaldistribution
(that is, specified mean , standard devition) , compare array using one-way anova test (or other) - somehow adapt data , use k-s compare single array against
normaldistribution
object
?
Comments
Post a Comment