The Normal Curve and Galton's Board
by Paul Trow

The normal curve models the distribution of many common variables in the physical and social sciences. The graph below shows the standard normal curve.

In the nineteenth century, Sir Francis Galton, one of the pioneers of statistical theory, invented a mechanical device that illustrates how the normal distribution arises from the contributions of many independent random events.

Galton's device consists of an array of pins mounted on a vertical board, as shown by the blue markers in the diagram below.


The diagram has 6 horizontal rows of pins, or levels. Galton's real board, which still exists in University College, London, has many more levels. When the board is operated, a sequence of balls drops onto the pin at the top of the array. When a ball hits a pin, it bounces to the left or right with equal probability, and then falls down one level, where it hits one of the two closest pins. After the ball reaches the last level, it falls into one of the bins below the bottom row.

Click here to see an animation of Galton's board.

The picture below shows the final frame of the animation.

Galton animation final frame

As more and more balls stack up in the bins, the stacks start to take on a predictable shape - that of a normal curve. It is a remarkable fact that, as you increase the number of levels of pins and the number of balls, the distribution of the balls is approximately a normal distribution.

Here is a picture of Galton's actual board, built in 1873. The "balls" dropped in the board were actually lead shot.

You can see a working model based on Galton's board on display at the Boston Museum of Science.

The Binomial Distribution

The mathematical model for Galton's board is called the binomial distribution. The binomial distribution is the result of tossing a coin n times and counting the number of times the coin comes up heads, assuming that each toss has probability p of coming up heads. In Galton's board, instead of a coin toss, the ball hits a pin and bounces either to the left or right with equal probability - that is, p = 1/2. The number of bounces is the number of levels of pins.

The diagram below shows a typical path that a ball might follow. The number to the left of each level is 1 if the ball bounces to the right and 0 if it bounces to the left.

Notice that the number of the bin that a ball lands in is exactly equal to the number of times it bounces to the right. In the example above, the ball bounces to the right 4 times and so lands in bin 4. As a result, the bin number that a ball lands in has a binomial distribution with parameters n = 6 (the number of levels) and p = 1/2.

It is more likely that a ball will land in one of the bins in the middle than the bins on the end. This is because there are many more paths a ball can follow to get to a middle bin. In fact, there are 15 possible paths a ball can take to bin 4, corresponding to all possible sequences of ones and zeros that contain exactly 4 ones and 2 zeros. On the other hand, there is only one path to bin 0, which occurs when the ball bounces to the left at each pin.

The probability that a ball lands in a particular bin is the number of paths that lead to that bin divided by the total number of paths that a ball can take. The total number of paths is 26 = 64, because the ball can go either of 2 ways at each level and there are 6 levels. So the probability that a ball lands in bin 4 is 15/64, which is approximately 0.234. By comparison, the probability that a ball lands in bin 0 is 1/64, because there is only one path to that bin. So it is 15 times as likely that a ball will land in bin 4 than in bin 0.

The following table shows the number of paths that lead to each bin.

Bin Number 0 1 2 3 4 5 6
Number of Paths to Bin 1 6 15 20 15 6 1

The bar graph below shows the probabilities for each bin.


The Normal Approximation to the Binomial Distribution

The examples above show why the distribution of balls in Galton's board has the shape of a binomial distribution. But what does this have to do with the normal curve? The answer is that, for large n, a binomial random variable X, with parameters n and p = 0.5, has approximately the same distribution as a normal random variable with the same mean and standard deviation as X. This is a special case of a famous result called the Central Limit Theorem.

A binomial random variable X with parameters n and p = 0.5 has mean n/2 and standard deviation . For example, if n = 36, the mean is 18 and the standard deviation is 3. The following graph shows the binomial distribution with n = 36 and p = 0.5, together with a normal distribution that has mean 18 and standard deviation 3.

This explains why so many common variables, such as a person's height, have a normal distribution. A person's height is the result of many independent factors, both genetic and environmental. Each of these factors can increase or decrease a person's height, just as each ball in Galton's board can bounce to the right or the left. The Central Limit Theorem guarantees that the sum of these contributions has approximately a normal distribution.

Copyright 2007 by Paul Trow

Home