Google
 
Sponsored Ads
Free Statistics Homework Help

Was this a freak accident?

So we see that the distribution for the sum of $n$ independent random binary variables becomes a Gaussian for large $n$. Was this some weird mathematical curiosity that gets filed away in your head, right next what you had last year for breakfast? No, it's a lot more general than that. If you take Bob's blade's of grass, no matter what funky distribution each one takes, the sum of their lengths, $X$, will also be well approximated by a Gaussian distribution for large $n$.

Under a fairly broad set of conditions, the sum of a large number of independent random variables will look like a Gaussian distribution. Since the mean $X/n$ is related to $X$ by a scaling factor, one can say the same thing: the distribution of the average of a large number of independent random variables is well approximated by a Gaussian distribution. The only thing that you need to do to get the complete probability distribution is fit the Gaussian with the correct mean and variance.

This is important because the mean is something you calculate all the time with experimental data. So without knowing much about your data, you know that this mean will follow a Gaussian distribution. That has a lot of important consequences for statistics.

If you know the variance of each variable $Var(x)$, you can then also obtain the variance of the sum $X$ of $n$ of these variables, as we did for the binomial distribution, where it's just $n Var(x)$. Because the mean $\bar x$ just divides the sum by $n$, you arrive at the important conclusion

\begin{displaymath}
Var({\bar x}) = Var({X\over n}) = {Var(x) \over n}
\end{displaymath} (1.57)

This means if you toss an unbiased coin 100 times, and you measure the mean proportion of heads, you'll get $.5$ with some deviation. To get that deviation, you calculate the error of the mean. The variance for one flip, as we saw before, was $1/4$. So the variance in the mean for 100 flips is $1/400$. To get the standard deviation, you take the square root, which gives $1/20$. So if you do an experiment once you might get a mean of $.53$, or $.44$, or $.51$. Every time you do the experiment you get a different result but only different by about a standard deviation.

This is a simplified discussion of the central limit theorem. I should also state that even though the Gaussian distribution does well approximate the correct distribution for a mean (or a sum), there are circumstances where this difference can be crucial. But we're fortunate that it's normally easy to figure out when it's not possible to use this theorem.

josh 2010-10-20