The Central Limit Theorem
“ | I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the "Law of Frequency of Error". The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement, amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of Unreason. Whenever a large sample of chaotic elements are taken in hand and marshaled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to have been latent all along. | ” |
Suppose you are a 15 year-old girl, in Fourth year in a mixed College. There are 30 students in your class. During an amazing Statistics class, you start looking around you and figure out that there are 11 girls and 19 boys. Assuming that the probability of giving birth to a girl in Quebec is roughly 50%, how come you end up being only 37% in your class?
You text your friend sitting in the other class across the hall, and ask her how many girls they are in her class. They are 20 out of 30 (or 67%).
You start a Facebook page and ask all 15-year-old students across Quebec what is the proportion of girls in their classes. It turns out that you get 500 answers and start plotting the distribution of the percentages of girls in each class. On the
x-axis you have the proportion of girls in a given class and on the
y-axis you have the proportion of the 500 answers with that proportion of girls.
You start putting everything together and you finally figure out that the variable "being a girl" is a
discrete random variable with only two possible outcomes:
yes or
no (or 1 or 0). And, it is a
yes with 50% chances. It is called a
Bernoulli(p) variable, where
p is 50%. Woaw ! In other word if you randomly pick a 15 year-old person in the population, you have 1 chance over 2 of picking a girl. Big News.
You continue your investigation and find out that if you add up
n Bernouilli(p) random variables, you end up with a variable coming from
Binomial(n,p) distribution. For example, the number of girls in a random sample of 30 15 year-old kiddos follows a
Binomial(30,0.5) distribution. The expected value of the number of girls in such a sample is 30*0.5 = 15 (or 50% in percentages) !
Now, let's assume for the moment that the 500 answers that you got earlier come from 500 classes of 30 students each. These 500 answers are 500 random variables coming from the same Bernouilli(n,p) distribution.
It starts becoming really interesting when you look at the distribution of those 500 proportions. They are basically 500 realizations of the mean of random variables coming from a Bernouilli(0.5) distribution.
The
Central Limit Theorem tells us that the statistical distribution of those 500 Means is a
Normal Distribution with Mean 0.5 and Variance 0.5*(1-0.5) / 30.
The Central Limit Theorem basically states that regardless of the underlying distribution of the observations in a sample, the Mean will always follow a Normal Distribution. Isn't a powerful theorem !