# Colourblindness and probability

A female acquaintance of mine was recently surprised to find that both of her sons were colourblind, despite neither parent being colourblind. A natural question to ask is ‘What are the odds?’ This question turns out to be open to interpretation, depending on what we mean by probability and odds.

(Note: In this post, the terms ‘female’ and ‘male’ shall be shorthands for ‘people with XX chromosomes’ and ‘people with XY chromosomes’ respectively, since colourblindness is usually a chromosomal condition. I shall use the pronouns ‘she’ and ‘he’ as corresponding shorthands.

Chromosomal sex is distinct from other notions of sex or gender. Unfortunately, this usage may be misleading for transsexual and intersex people, as well as people with some chromosomal abnormalities. This problem is further discussed by Randall Munroe, author of the xkcd comics, here. As he says, ‘The role of gender in society is the most complicated thing I’ve ever spent a lot of time learning about, and I’ve spent a lot of time learning about quantum mechanics.’)

Colourblindness is usually hereditary, caused by defects in the X chromosome that inhibit the proper formation of colour sensors in the eye. (There are also other rarer genetic defects which can cause colourblindness, as well as environmental factors, but we shan’t discuss them here.) Males are more likely than females to be colourblind, as they have only one copy of the X chromosome: the male will be colourblind if this copy is defective (we say that he has genotype xY, where the lower-case x denotes a defective allele, which is recessive). In contrast, a female has two copies of the X chromosome and will be colourblind only if both are defective (genotype xx). Epidemiologists have shown that, across the population,  p = 0.08 = 8% of males, and p2 = 0.0064 = 0.64% of females, are colourblind. The means that if a male is sampled uniformly randomly from the male population, then the probability that he is colourblind is 8%.

The above not the same as saying ‘The probability that a particular male is colourblind is 8%.’ Suppose Thomas is a male who we know to be colourblind: then the probability that Thomas is colourblind is 100%! This demonstrates the need to distinguish between prior probability and posterior probability. Suppose Joanna is a female, sampled uniformly randomly from the female population. Without any further information about Joanna, we would say that the probability that Joanna is colourblind is 0.64%. This is the prior probability.

Suppose, however, we know that Joanna’s mother is colourblind, while her father is not. What is the probability that Joanna is colourblind? Joanna’s mother has genotype xx, which means that she must pass on a defective allele. However, Joanna’s father is not colourblind, so we know that he must have genotype XY. We know that Joanna is female, so her father must be passing on his X chromosome. Joanna’s genotype must therefore Xx, and she has a 0% probability of being colourblind! This is the posterior probability, which takes into account the information that we are given about Joanna’s parents.

The same is true of any female child of these parents. Similarly, any male child, then he must have genotype xY, and has a 100% probability of being colourblind! If they have a child of unspecified sex, then the probability that that child will be colourblind is 50%. This 50% is the probability posterior to us being told about the genotypes of the parents, but prior to us being told the sex of the child.

## The original problem

Now let’s return to the question at the top of this post. Two parents, Alice and Bob, neither of whom are colourblind, have two sons, both of whom are colourblind. ‘What are the odds?’ It depends on when you ask this question.

Before they have any children, all the information we know about Alice and Bob are their phenotypes, that is, the fact that they are not colourblind. This means that Bob’s genotype must be XY (if it were xY, then he would be colourblind), while Alice’s genotype may be XX, Xx or xX (it cannot be xx). What are the probabilities of each of these possibilities? Since Alice’s genotype cannot be xx, the probability of XX is $\frac{(1-p)^2}{((1 - p)^2 + 2p(1-p)} = \frac{(1-p)^2}{1-p^2} \approx 0.852$, while the probability of Xx or xX, i.e. that she carries a defective allele, is $\frac{2p(1-p)}{1-p^2} \approx 0.148$.

Suppose they have their first child. If the child is female, then Bob must have passed on his X chromosome; the daughter must have a non-defective X chromosome, and has a probability 0 of being colourblind. If the child is male, then Bob must have passed on his Y chromosome, while Alice may have passed on either of her X chromosomes. The child will be colourblind if and only if Alice passes on a defective X chromosome. This happens with probability 0 if Alice’s genotype is XX, but with probability 0.5 if Alice carries a defective allele. The probability that a son will be colourblind is therefore $0.148\times0.5\approx0.074$. If the sex of the child is not known, the probability that the child is colourblind is $0 \times 0.5 + 0.074 \times 0.5 \approx 0.037$. This is the prior probability that the child will be colourblind.

Alice and Bob have their first child, Charlie, a son who is tested and found to be colourblind. They then have a second son, David. What is the probability that David is colourblind? After Charlie’s diagnosis, we now know that Alice must carry a defective allele: her genotype must be either Xx or xX. She has a 50% chance of passing on this defective allele to David. Given that his older brother is colourblind, the probability that David is colourblind is as high as 50%! But if the parents have both Charlie and David before testing them at the same time, then the probability that Charlie and David are both diagnosed as colourblind is $0.037 \times 0.037 \approx 0.001$, or 1 in 1000, which is much lower!

The above demonstrates Bayesian probability, an interpretation of the concept of probability. The above probabilities are ‘degrees of belief’, or ‘how likely we think something is, based on the information that we have’. We start with prior probabilities based on already established data (such as the findings of the epidemiologists); when more information is given about a situation, we use this information to update our beliefs, obtaining posterior probabilities (which often use the phrase ‘given that’). The mathematical statement of how to relate prior and posterior probabilities is Bayes’ theorem.

Although the above calculations are all done mathematically, the probabilities can nonetheless be interpreted as being subjective: they reflect how certain we are of something. Indeed, they might represent ‘What payout odds would I accept if I’m making a bet on this?’.

(Addendum: I’d forgotten that I wrote another piece on probability and eyes!)