Jeff Atwood, over at Coding Horror posed an interesting little puzzle about probability:

Let’s say, hypothetically speaking, you met someone who told you they had two children, and one of them is a girl.

What are the odds that person has a boyanda girl?

To put it in more precise language so we can concentrate on probability and not nuance of word choice, the person means that *at least one* child is a girl. At first I was tempted to say what a lot of people came up with in the comments: 50%. If one is a girl, my thought process goes, then we’re just looking at the probabilities for the other child, and surely those are not affected by the child we know about.

This is, of course, wrong. In that argument we fail to take into account that having two children are distinguishable events, and we don’t know which child they were talking about when they said one was a girl. When I actually wrote it down, then the solution became more clear. Having two children gives 4 possibilities in terms of their gender (B for boy, G for girl):

BB, BG, GB, and GG

In learning that at least one is a girl, we can eliminate BB. We cannot eliminate BG or GB because we’re not told which child was being referred to when we were told one is a girl. Of the 3 remaining, 2 have one boy and one girl, so the solution is 2/3 or about 67%.

But wait! Why should order matter? As expressed by one commenter:

All the children learned probability theory and forgot how to think normally! Why would you care if the first one is a boy or a girl..they didn’t tell that their first child was a Girl, now did they? So, you have three choices: [BB, GB, GG]

To a certain extent, one is entirely justified in formulating the solution in terms that don’t include the ordering. It wasn’t asked for in the solution or mentioned it in the problem. However, if you formulate the problem in this way you are forced to abandon an implicit assumption we made in the previous reasoning: that *all possibilities are equally likely*. If we leave out order, we can simplify our notation and just count the number of boys, and know that the rest are girls (leaving aside the relatively rare occurrence of gender ambiguity). So our possible cases are [2, 1, 0]. However, the respective probabilities for these cases are [25%, 50%, 25%]. That is to say, having one boy and one girl is twice as likely as having two girls. With this in mind, it’s easy to see that the solution should be 2/3.

But why are the probabilities equal when you include order, and not equal when you don’t? Maybe you don’t even believe me. The answer has a very deep connection to physics, and so my advice to any doubter is to try it out with a pair of coins! Get two coins, flip them, and record the number of heads. Repeat this 20 or 30 times and you’ll handily see that exactly 1 head comes up roughly twice as often as either 2 heads or 0 heads. It doesn’t even matter whether you flip them at the same time or whether the coins are easily distinguishable! Even seemingly identical coins are distinguishable *in principle*. No two coins are exactly alike at the molecular level, and even if they were, it would be possible to track them individually through the air during a flip. By only recording the number of heads we are throwing out some information which is, *in principle, *available to us.

Any time we don’t include information which, in principle, exists, then we don’t get equal probabilities. However, we can still work out the probabilities of our incomplete description. In thermodynamics, our incomplete description (in this case, the number of heads) is called the *macrostate*, and a complete description that uses all the information available in principle is called a *microstate*. To find the probabilities of the macrostates, we have to weight them by the number of different microstates that give that macrostate. In the case of exactly 1 head, this has two microstates (HT and TH). The other macrostates each have only one microstate, thus exactly 1 head is twice as likely as either 2 or 0.

The number of different microstates that correspond to a particular macrostate is a measure of our lack of information. When we get a macrostate of 2 heads, we know exactly which microstate we’re in—we have complete knowledge. But imagine that we had 100 coins instead of 2. There is only one microstate that has 0 heads, but there are 100,891,344,545,564,193,334,812,497,256 different microstates for 50 heads. 50 heads is *astronomically* more likely than 0 heads. But just knowing that there are 50 heads leaves us without much knowledge of the microstate: there are over 100 thousand trillion *trillion* of them to choose from! The measure of this is called *entropy *(technically, the logarithm of the number of microstates). In our boy-girl example, having one boy and one girl has a higher entropy because we don’t know the order. Entropy is sometimes called a measure of *disorder*.

In thermodynamics the macrostate of a system is given by things like overall temperature, volume and pressure, whereas microstates would have to be given in terms of the positions and velocities of each molecule. That information is present, in principle (at least up to a quantum-mechanical limit), so it has a real effect on the probability. Just like in coin-tossing the probabilities of the macrostates are weighted by the number of microstates that correspond. The more likely macrostates must have higher entropy. This is the origin of the famous 2nd Law of Thermodynamics. Since macrostates of high entropy are so much more likely, random processes always end up there; the more elements in the system, the more this probability becomes like a simple fact.