The Two-daughter Problem

Several weeks ago, I posted a brain teaser that shows how, contrary to common sense, adding more descriptive details actually reduces the probability of an event. (Here’s a link to that post, in case you missed it.)

While reading The Drunkard’s Walk on my way back from a business trip last week I came across another probability puzzle which defies common sense – only this time it appears that specifying additional (and seemingly irrelevant) detail actually increases the probability of a situation!

Consider the following question first: Suppose I have two children. What are the chances of both children being girls?

This is obviously simple — there are only four possibilities: {Girl, Girl}, {Girl, Boy}, {Boy, Girl} and {Boy, Boy}. And since one of these four options contain the favorable outcome (i.e. both girls), we have our answer: 1 out of 4 = 25%.

Now, let’s take it to the next level: What are the chances, given that one of the children is a girl, that both children are girls?

One might think that since one child is a girl, we have 50/50 chances of the other being a girl as well. So the answer is 50%. But that’s incorrect. Because although we’re told that one child is a girl, we are not told which one. So going back to the possible outcomes outlined above, we must eliminate {Boy, Boy} option since we know one of the children is a girl. That leaves us with three options. Of these, only {Girl, Girl} is the favorable outcome. Hence, the probability that both of them are girls is: 1 out of 3 = 33%.

Now, here’s the final question:

Suppose I have two children. One of them is a girl who was born on a Friday. What are the chances of both children being girls?

Do you think the mention of Friday would impact the probability in any way? Well, actually it does! With the same basic approach used above, the answer to this question comes out to be 6/13.

I will explain the solution if anyone is interested, but what’s intriguing to me is that how this seemingly irrelevant information changes the probability from 1/3 to 6/13. What does ‘born on a Friday’ has to do with the probabilities in question? Apparently, everything! Once you calculate the probabilities, the answer becomes obvious but at first glance it seems so counter-intuitive!

This is one more rather simple demonstration that shows how the human intuition is ill-suited for solving problems that involve uncertainty and probabilities. As Leonard Mlodinow writes in his delightfully entertaining book mentioned above: “[T]he human mind is built to identify for each event a definite cause and can therefore have a hard time accepting the influence of unrelated or random factors.”

No wonder why pigeons beat humans at solving the Monty Hall problem!

Update: Per Ramanand‘s request, the solution is posted in the comments section.

Advertisements

19 responses to “The Two-daughter Problem

  1. Could you explain the last (Friday) one? 🙂

    • Vishal

      Sure! The problem can be solved using the same approach I used for the simpler versions. The concept/approach is called the law of the sample space which states “Suppose a random process has many equally likely outcomes, some favorable, some unfavorable. Then the probability of obtaining a favorable outcome is equal to the proportion of outcomes that are favorable.” Once you lay out the sample space (i.e. all possible outcomes) the solution is easy.

      For the Boy-Girl pairs, there are three possibilities: {Boy, Girl}, {Girl, Boy} and {Girl, Girl}. We eliminate {Boy, Boy} option because we know that at least one of them is a girl.

      Now, for each of these three options, let’s count possible pairs of born-on days:

      * {Girl, Girl} = {Mon, Fri}, {Tue, Fri}, {Wed, Fri}, {Thu, Fri}, {Sat, Fri}, {Sun, Fri}, {Fri, Mon}, {Fri, Tue}, {Fri, Wed}, {Fri, Thu}, {Fri, Sat} and {Fri, Sun} = Total 12 possibilities
      * {Boy, Girl} = {Mon, Fri}, {Tue, Fri}, {Wed, Fri}, {Thu, Fri}, {Fri, Fri}, {Sat, Fri}, {Sun, Fri} = Total 7 possibilities
      * And similarly, we have 7 more possibilities for {Girl, Boy} = {Fri, Mon}, {Fri, Tue} and so on…

      Note that for {Girl, Girl}, {Fri, Fri} is not a valid option because we know that one – and only one – girl is born on a Fri. And for the {Girl, Boy} and {Boy, Girl} options, since the girl is born on a Fri we eliminate all other options where the girl is not born on a Fri.

      This gives us a total of 12+7+7=26 options. And out of those 12 are favorable outcomes (i.e. all {Girl, Girl} possibilities), so the probability is = 12 out of 26 = 6/13.

      PS: If we interpret “one of them is a girl” as “at least one of them is a girl” then our answer would change to 13/27 = 48%, slightly higher than 6/13 = 46%.

  2. Kamlesh

    Put this question in CAT exams and you sure would have a very low cut-off . Amazingly complex.

  3. “Suppose I have two children. One of them is a girl who was born on a Friday. What are the chances of both children being girls?”

    It does not give you the information that “one – and only one – girl is born on a Fri.” It just suppresses ANY information about the second child by not using words/phrases like “at least” or “only”. The other child can well be a girl born on a Friday.
    On the other hand, if you want to interpret “one of them was born on a Friday” as “ONLY one of them was born on a Friday”, then “one of them is a girl” should also be interpreted as “ONLY one of them is a girl.”

    • You’re absolutely correct.

      I should have been more clear about the distinction between “one and only one” and “at least one”.

      But in both situations — as mentioned in the solution I provided as a response to Ramanand above — the probability (of both children being girls) increases.

  4. Vishal, actually I have issues with your answer even if we drop the Friday part of the question. I will try to explain it in two ways. Hope they make sense.

    1. One child can be a boy or a girl irrespective of what sex the other one is. They are two independent events. This is exactly the same as a coin toss. Your question is similar to this: [You have two coin toss results. Given that one is a head, what is the probability that the other one is also a head?] Similar to the boy and girl combinations, you can have (H,H) (H,T) (T,H) and (T,T) as probable outcomes. Would you say that the answer is 1/3?

    2. When you are considering (b,g) and (g,b) as different combinations, you are actually bringing in sequence into consideration. In that case, there are four different combinations possible (given that one kid is a girl), (b,g) (g,b) (g1,g) and (g,g1) – g1 being the other girl. We should not take sequence into account.

    [3. It’s a simple case of posterior probability. Apply the formula. The result is 1/2]

    • Hey Raja,

      I think I see where you’re coming from on this. Let me try to answer your coin-toss question:

      The ambiguity arises from confusing a conditional probability with an absolute probability. If you toss a coin once, you have an equal (1/2) probability of getting ‘H’ and ‘T’. Let’s say you get H from your first coin toss. And now you are going to toss the coin for the second time. What is the probability that you’ll get H (again) this time? That probability is, surely, 1/2. This is an absolute probability.

      However, if someone tells you that she tossed a coin twice, and got at least one H. What is the probability that she got (another) H on the second toss? Now this probability is a conditional probability. The sample space (i.e. all possible results) before she provided you the result of her first coin-toss is {H,H}, {H,T}, {T,H} and {T,T}. When you incorporate the information about the result of her first coin-toss, the sample space reduces to {H,H}, {H,T}, and {T,H}. You must drop the {T,T} possibility because you know that she didn’t get both T’s. As a result, the probability that she got H on her second toss GIVEN THAT SHE GOT AT LEAST ONE H — is 1/3. This is a conditional probability.

      I found another explanation (from this link, which you might want to checkout):

      Suppose there are 100 fathers in an auditorium, and each is the father of two children. Each father is instructed to tell you (truthfully) if at least one of his children is a boy. This will apply to about 75 of the fathers. Now, of those 75 Dads, 2/3 (i.e., 50) have a daughter, and 1/3 (i.e., 25) have two sons. Thus, if you want to guess the gender of their “other” child, the chances are 2/3 that it is a girl. (Of course, for the remaining 25 fathers – those who did not report at least one son – you know immediately they have two daughters.)

      Does that help?

  5. My bad. I feel like a drunken monkey 😦

  6. Dr. Tom Beatty,DD,BFHM

    I would like your comment on this “fantastic” political analyst that many are raving about.
    Lately there has been a flurry of news (local, national, even international) concerning a political commentator, Nate Silver, who predicted accurately the outcome of all 50 states in the recent presidential election.
    To me this looks like the classical “hot hand” syndrome. I.E. if a poker player or roulette bettor won 10 hands in a row, one would say she/he has a “hot hand.”
    I think it should be no surprise at all that someone guessed correctly the outcome in all 50 states. In fact it should be surprising that only one forecaster correctly guessed all 50 states.
    If, for example, there were 2 or 3 thousand predictors and they all failed to correctly predict 50 states, it would be unusual. And I don’t doubt that the number of forecaster far exceeds 3 thousand. Also note the problem is not 50 states in that possibly 30-40 were a given certainty for one of the candidates.
    Now if Mr. Silver did this every 4 years, that might be impressive. What do you think of his “achievement?”
    Tom Beatty, UNCA Asheville, NC
    tkbkpop@yahoo.com

    • I think the Nate Silver phenomenon is surely a good thing for popularizing the fields of statistics/data analysis/data mining, and mathematics in general. We have to give it to him for making his approach and rationale more accessible to the layperson. But I do feel that his success is a tad over-rated (and sometimes politicized to demonize the losing party).

      I can’t remember where I read this, but someone put it very nicely (paraphrasing): Silver’s success doesn’t prove that he is a genius any more than his failure would have proved that he was a fraud. I hope that, while going gaga over Silver’s accuracy, people recognize that this actually is a victory of the power of empirical evidence and rigorous statistical analysis.

      Totally agree with you on your ‘repeatability’ comment. The predictive accuracy of a model can’t be determined by just a single validation (especially when the outcomes are binary). If Silver is still doing forecast in 2016 then I would surely bet on Silver, but I wouldn’t put blind faith in him, and I would be wary of betting a large amount.

  7. Charlie

    I came across this problem in ‘The Drunkard’s Walk’ too and I spent a bit of time thinking about this because the answer seems so counter-intuitive, and though I’m not a maths person (I’m a lawyer, which is tantamount to innumeracy among most of my colleagues), I think I have a reason to disagree with the notion that “specifying additional (and seemingly irrelevant) detail” changes the probability.

    In my view, What the additional information does is to prompt (but not in any way enable) us to pose a different question from the one we initially ask and to thereby make a different estimation of probabilities. The information itself is irrelevant and this question can in fact be posed without it.

    I would be very interested to hear yours, and other people’s, opinion on this.

    I find it much easier to understand the problem by looking at it in terms of frequencies. In the problem as Leonard Mlodinov poses it in his book the additional information is that one of the girls is named Florida (which one girl in a million in the US is, apparently – p.113 of the book).

    Now, suppose there are 20 million couples and each of them has a two children. Assuming that there is a 50/50 chance of having a girl (not quite accurate, but however):

    With the first child we have:
    10 million couples with a girl (group 1)
    – 10 of these are couples who will call their child Florida
    10 million couples with a boy (group 2)
    – None of these are couples who will call their child Florida

    Each couple then has a second child, and we now have:
    Of the 10 million couples in group 1
    5 million will have a girl (group 1a)
    – 5 of these will be couples who already have a child called Florida
    (their older child)
    – Another 5 of these will be couples who call their younger child
    Florida
    (i.e. .0001% of the 4,999,995 couples who don’t already have a
    child called Florida – rounding)
    5 million will have a boy (group 1b)
    – 5 of these will be couples who already have a child called Florida
    (their older child)
    Of the 10 million couples in group 2
    5 million will have a girl (group 2a)
    – 5 of these will be couples who will call their younger child called
    Florida
    5 million will have a boy (group 2b)
    – None of these will be couples with a child called Florida

    Now, we can see that our initial question (if a couple has two children what is the probability that they are both girls?) is 25%, as only one of the four groups (1a, 1b, 2a, 2b) has two girls. Like Raja, I was confused by why we set our initial groups as {Girl, Girl}, {Girl, Boy}, {Boy, Girl} and {Boy, Boy}, but this makes it clear I think.

    Next, we are told that one of the children is a girl. We eliminate group 2b, and the probability changes to 33%. Again this seems clear.

    When we are told that one of the children is named Florida, we can then see that there are 10 girls named Florida in group 1a, and 10 girls named Florida in groups 1b and 2a, so the probability is now 50%.

    Now, my argument is that what has happened is not that a piece of irrelevant information has changed the probabilities (or allowed us to assess them more accurately) but it has caused us to ask a different question from the one we originally posed.

    What we have really done is moved from asking the question:
    – How likely is it that a couple with two children has two daughters?
    (comparing couples), to
    – How likely is it that a child of a two-child family has a sibling
    called Florida?
    (comparing children)

    Since there are twice as many children who might be called Florida
    (i.e. girls) in group 1a, comparing groups of children changes the
    probabilities (or allows us to assess them differently).

    But the information that prompts us to compare children instead of couples is irrelevant. If we didn’t learn that the one of the children was called Florida (or born on a Friday or whatever), as soon as we heard that one of the children was a girl we could ask the simple question:
    – what is the probability that that girl has a sister?

    Since there are twice as many girls in group 1a as in 1b and 2a, this would allow us to assess the probability as 50% without needing any further information.

    I think this accords better with one’s intuition that a child being called Florida (or born on a Friday) provides no information useful to determining the probability of its parents having two daughters, but I would be very interested in hearing other people’s views on this. As I said, I’m not a maths guy …

    Regarding the version of the problem hear, a child being born on a Friday, I got a different probability when I worked this out using natural frequencies (the only way I can understand probability, to be honest).

    If we start with 196 couples, we would have:

    With the first child:
    98 couples with a girl (group 1)
    – 14 of these are couples whose daughter was born on a Friday
    98 couples with a boy (group 2)
    – None of these are couples with a daughter born on a Friday

    With the second child we have:
    Of the 98 couples in group 1
    49 will have a girl (group 1a)
    – 7 of these will be couples who already have a daughter born on a
    Friday (their older daughter)
    – Of the remaining 42 couples, another 6 of these will have a daughter
    born on a Friday (their younger daughter)
    49 will have a boy (group 1b)
    – 7 of these will be couples who already have a daughter born on a
    Friday (their older child)
    Of the 98 couples in group 2
    49 will have a girl (group 2a)
    – 7 of these will be couples who will have a daughter born on a Friday
    (their younger child)
    49 will have a boy (group 2b)
    – None of these will be couples with a daughter born on a Friday

    So now we have:
    49 couples with two girls (group 1a)
    – 13 of these are couples with a daughter born on a Friday
    98 couples with a girl and a boy (groups 1b (girl is eldest) and group 2a (boy is eldest)
    – 14 of these are couples with a daughter born on a Friday
    49 couples with two boys (group 2b)
    – None of these are couples with a daughter born on a Friday

    So, I make the probability 13 out 27 (or 14 out of 28, if either daughter might have been born on a Friday).

    I don’t know how to reconcile that with the estimation above based on pairs:
    {Girl, Girl} = {Mon, Fri}, {Tue, Fri}, {Wed, Fri}, {Thu, Fri}, {Sat, Fri}, {Sun, Fri}, {Fri, Mon}, {Fri, Tue}, {Fri, Wed}, {Fri, Thu}, {Fri, Sat} and {Fri, Sun} = Total 12 possibilities
    * {Boy, Girl} = {Mon, Fri}, {Tue, Fri}, {Wed, Fri}, {Thu, Fri}, {Fri, Fri}, {Sat, Fri}, {Sun, Fri} = Total 7 possibilities
    * And similarly, we have 7 more possibilities for {Girl, Boy} = {Fri, Mon}, {Fri, Tue} and so on…

    If someone could explain the reason for the difference (no doubt my mistake in reasoning somewhere) I’d appreciate it.

    Thanks and apologies for the very lengthy post!

    • Charlie,

      Thanks for your query and explanations – which helped me learn something new today! 🙂

      First, let me answer your second question. Your approach yielded a probability of 13/27 because you did not exclude the case where *both* of the girls were born on Fridays. In your example, there is only one such couple. If you remove that case (i.e., remove 1 from the numerator as well as denominator), you’ll also get 12/26 = 6/13. Please let me know if this is unclear, and I will try to explain further. [Please note that, as I mentioned earlier in my reply to Raja, you don’t have to necessarily make that assumption, so your answer is equally valid.]

      Regarding your first conjecture – that the additional information diverts us towards a different question than the one that was originally asked – I think you are really on to something here! The approach that you laid out is very similar to how a Bayesian statistician will tackle this problem. Here’s a white-paper (PDF link) that should clarify this further. It appears that a Bayesian (as opposed to the sample-space) approach is a better way to solve this type of question/paradox.

  8. Charlie

    Hi Vishal,

    Thanks so much for your reply. I will definitely have a look at that paper and see if I can follow the maths in it. I think I get your point about excluding the case of both children being born on a Friday, but I’ll have a look at it and make sure I’ve got it. Thanks again. Charlie

  9. JeffJo

    Charlie:

    First off, your second analysis did exclude the case where both girls were born on Friday, because you first removed the 7 families whose first girl was born on a Friday. That’s why you wound up with 13 two-girl families, which include 14 girls born on a Friday.

    The reason you seem to get different answers, is because you “rounded” the source of the similarity out. (Both are wrong, btw; and the first is wrong for more than one reason. I’ll get to that in a moment.) If the extra fact that you know about a girl in the family has a probability P of occurring, then in N families:

    Group 1a: N/4 families have two girls.
    For P*N/4, the fact applies to the first.
    For P*N/4, the fact applies to the second.
    But for (P^2)*N/4, the fact applies to both.
    So (2P-P^2)*N/4 families have a girl the fact applies to.
    Group 1b: N/4 have an older girl and a younger boy.
    For P*N/4, the fact applies to the girl.
    Group 2a: N/4 have an older boy and a younger girl.
    For P*N/4, the fact applies to the girl.
    Group 2b: N/4 have two boys.

    In all, (4P-P^2)*N/4 families include a girl that the fact applies to. So the proportion of these families with two girls is (2P-P^2)/(4P-P^2)=(2-P)/(4-P). If you put P=1/7 in, this becomes 13/27 as you showed. If you put P=1/10^6 in, it is 0.499999875, which rounds to 1/2.

    Mlodinow essentially allowed a family to include two girls named Florida; he argued, much as you did when you rounded, that the P^2 term was too small to care about, so he left it out. That makes the answer exactly 2/4=1/2.

    But not only is it wrong to do that, you have to disallow any name from being duplicated. As a result, the younger girl in group 1a becomes much more likely to be named Florida than the older one, because most of the 4,999,995 older sisters will have common names. I won’t go into it, but it turns out that if Q is the probability of a typical name, the correct proportion is (2+Q-P)/(4+Q-P), which is greater than 1/2!

    But I said both answers are wrong. They ignore the fact that you know about the girl born on a Friday because somebody told you, and that person wasn’t constrained to tell you that particular fact. This is what that PDF Vishal told you about says. But your method works, too, by considering 392 couples instead of 196:

    Group 1a: 98 families have two girls.
    2 have two girls who were born on a Friday.
    2 tell you about a girl born on a Friday.
    24 have exactly one girl who was born on a Friday.
    12 tell you about the girl born on a Friday.
    12 tell you about the girl born on a different day.
    72 do not have a who were born on a Friday.
    72 tell you about the girl born on a different day.
    Group 1b: 98 have an older girl and a younger boy.
    14 have a girl who was born on a Friday.
    7 tell you about the girl born on a Friday.
    7 tell you about the boy.
    84 do not have a who were born on a Friday.
    42 tell you about the girl.
    42 tell you about the boy.
    Group 2a: 98 have an older boy and a younger girl.
    14 have a girl who was born on a Friday.
    7 tell you about the girl born on a Friday.
    7 tell you about the boy.
    84 do not have a who were born on a Friday.
    42 tell you about the girl.
    42 tell you about the boy.
    Group 2b: 98 have two boys.
    98 tell you about a boy.

    So, of the 2+12+7+7=28 families that tell you about a girl born on a Friday, 2+12=14 of them have two girls. The answer is 1/2.

    Also note, that of the 2+12+12+72+7+72+7+72=196 families that tell you about a girl, 2+12+12+72=98 of them have two girls. The answer is 1/2, not 1/3. It only changes to 1/3 when they are required to tell you “One is a girl.” And if the requirement changes to “One is a girl born on a Friday,” the answer changes to 13/27 because a two-girl family is more likely to meet the requirement.

  10. Pingback: Aunt Pythia’s advice | mathbabe

  11. James

    The problem has a 100% probability of being nonsense. Why? Simple: Without further information (not given by the inept writer), there is no significane between BG or GB. I.e., that is only one choice.

    E.g. Think of the Monte Python skit: You can have ham and spam or spam and ham. Is this two choices? No!

  12. JeffJo

    I buy a lottery ticket. There are now two “choices” : I win, or I lose. So the chances are 50-50 that I’ll win; at least, according to James’s logic. And if there are 5 prizes, it seems like I should have a 5/6 chance of winning something.

    Counting “choices,” or distinct possibilities, is not how probability works. Because as James points out, different ways of looking at them can make them distinct, or indistinct. You need to group them into possibilities that are equally likely to emerge. A first child is equally likely to be a boy, or a girl; that is, BX and GX are equally likely, at 50% each. But the X is also equally likely to be a boy or a girl, making BB, BG, GB, and GG all have a 25% probability.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 66 other followers

On Twitter

Categories

%d bloggers like this: