The Normal Curve and Galton's Board
by Paul Trow
The normal curve - also known as the bell curve - is a statistical model that is used throughout the physical and social sciences. For example, the graph below - called a histogram - shows the distribution of combined scores for the Scholastic Aptitude Test (SAT) in 2010.
In the graph, the range of possible scores is displayed on the horizontal axis. For each score, the height of the vertical bar at that point shows the number of students who received that score. The tops of the bars trace out a familiar shape - the normal cuve.
Why does the normal curve show up so often in many apparently unrelated fields of study? In the nineteenth century, Sir FrancisGalton, one of the pioneers of statistical theory, invented a mechanical device that illustrates how the normal curve arises naturally from the combination of a large number of independent random events or factors. A re-creation of Galton's device is shown below.
The device consists of an arrayof pins mounted on a vertical board. When the board is operated, a sequence of balls drops onto the top of the array. When a ball hits a pin, it bounces to the left or right with equal probability, and then falls down to the next level. When the ball reaches the last level, it falls into one of the bins at the bottom. As the balls stack up in the bins, they form a shape that resembles a normal curve.
To see a video of a small computer simulation of Galton's board, with only 6 levels of pins, click here.
The picture below shows the final frame of the animation.
As you might expect, it is more likely that a ball will land in one of the bins near the middle than in a bin near either end (0 or 6). The reason for this is that there are more paths a ball can take that lead to bins near the middle than to bins near the sides. To see why, think of the following experiment that simulates a random path for a ball. Toss a coin 6 times and record the sequence of heads and tails that you get. If the i'th coin toss is heads, the ball bounces to the right at level i - if the i'th toss is tails, the ball bounces to the left. For example, if the sequence of coin tosses is H, H, T, H, H, T, the ball takes the pathshown in the diagram below.
Notice that the number of heads in this example is 4, and that the ball lands in bin number 4. If you try a few examples, you can convince yourself that for any sequence of coin tosses, the number of the bin the ball lands in always equals the number of times the coin comes up heads. Consequently, the number of paths to bin number 4 is the number of sequences of 6 coin tosses that have exactly 4 heads. It turns out that there are 15 such sequences. (If you don't believe this, try writing them down.)
On the other hand, there is only one path tobin 0, corresponding to the sequence of all tails, which means that the ball bounces to the left at each level. As a result, a ball is 15 times more likely to land in bin 4 than bin 0.
To find the probability that a ball will land in bin i, you divide the numbers of paths to bin i by the total number of paths to all bins.The latter number is 26 = 64, becausethe ball can bounce in either of 2 ways at each level and there are 6 levels. The table below shows the numbers of paths to each bin, and the probabilities that a ball will land in a given bin.
|Number of paths to bin||1||6||15||20||15||6||1|
|Probability of landing in bin||0.0156||0.0937||0.2344||0.3125||0.2344||0.0937||0.0156|
The figure below shows the probabilities graphically.
The Binomial Distribution and the Normal Approximation
This distribution of probabilities in the shown in the table and graph above is a special case of the binomial distribution. The general binomial distribution has two parameters:
The binomial distribution gives the probability of getting exactly k heads in n coin tosses, for each integer k between 0 and n. In the example above, n = 6 and the probability p = 0.5 - in other words, the coin is fair.
The examples above show why the distribution of ballsin Galton's board has the shape of a binomial distribution. But whatdoes this have to do with the normal curve? The answer is that,for large n, a binomial random variable, withparameter p = 0.5, has approximately the same distribution as the normalrandom variable with the same "shape" - that is, the same mean and standard deviation. This is a special case of a famous result called the Central Limit Theorem.
It turns out that a binomial random variable with parameter p = 0.5 has mean n / 2 and standard deviation .For example, if n = 36, the mean is 18 and the standard deviation is 3. The following graph shows the binomial distribution with these parameters, togetherwith a normal curve with mean 18 and standard deviation 3.
Galton's board and the Central Limit Theorem explain why so many common variables have a normaldistribution. A typical example is a person's height, which is determined by a combination of many independent factors, both genetic and environmental. Each of these factors may tend to increase or decrease a person's height,just as a ball in Galton's board may bounce to the right or the left at each level. As Galton's board shows, when you combine many chance factors, the resulting distribution is binomial. By the Central Limit Theorem, when the number of independent factors is very large, the binomial distribution is approximated by a normal curve.
Finally, here is a picture of the actual board Galton built, which still exists in University College, London. The balls that dropped through the pins were actually lead shot.
Copyright 2007 by Paul Trow