Make the data speak itself: Bayes Theorem

Probability is a cornerstone in artificial intelligence. It expresses the degree of uncertainty of an event. Probability is widely used in Bayes Network, Computer Vision, Machine Learning, Natural Language Processing,...

This article will explains 3 types of probabilities: Single event, joint event and conditional event. Finally, it is an introduction of Bayes theorem which is derived from joint and conditional probabilities.

1. Independent / Single Event Probability

If we know the probability of an outcome (A) is p, we can reason the probability of opposite outcome ~A is 1-p:

P(X) = p
P(~X) = 1- p

If we flip a fair coin, the chance of getting head or tail is 0.5 (50%) .

In case the coin is loaded and the probability of head coming out is 0.75. Applying what we know, we certainly know that the chance of getting tail with loaded coin is 1 - 0.75 = 0.25

2. Joint Event Probability

It is the likelihood of event X and Y. Joint probability is the probability of event Y occurring at the same time that event X occurs. It is also called the intersection of 2 or more events.

P(X, Y) = P(X) * P(Y) = P(Y) * P(X)

Let's do an example. We have 52 cards with 13 hearts, 13 diamonds, 13 clubs and 13 spades. What is the probability of picking up a card of 6 and it is a heart ?

The question is about calculating the chance of 2 events happening at the same time: picking up a 6 heart card.

P(6, heart) = P(card is 6) * P(card is heart) = 4/52 * 13/52 = 1/52

3. Conditional Probability

Conditional probability is defined as the likelihood of an event or outcome occurring, based on the occurrence of a previous event or outcome. Actually, it is a combination of Joint Event and Single Event. It is easy to derive the formula from the above pictures:

Picture 1: P(X|Y) = P(X) * P(Y) / P(Y)

Picture 2: P(Y|X) = P(X) * P(Y) / P(X)

We revisit the earlier example. We want to know the probability of drawing a card of 6 given that an heart has already been drawn. Thus the event X is that we draw a heart. Event Y is that we draw a 6.

P(card = 6| card = heart)

= P(card=6)*P(card=heart)/P(card=heart)

= (6/52 * 13/52)/13/52 = 6/52

4. Total Probability

In real life, things or events depend on others. To calculate the possible outcome of an event, we have to combine it with all possibilities that already happened prior to this event. This can be somehow called total probability.

Assuming that some days are sunny and some days are rainy and the probability of day 1 sunny is 90% and the chance that it is continuously sunny in the following day is 80%. What is the probability of next day is actually sunny ?

We are asked to calculate the possibility that next day is sunny whatever day before is sunny or raining. We have to take care both situations. That is why total probability is helpful here.

- We have:

P(D1=sunny) = 0.9

P(D2=sunny, D1=sunny) = 0.8

- Probabilistic Inference:

P(D1=rainy) = 1 - P(D1=sunny) = 1 - 0.9 = 0.1

P(D2=sunny, D1=rainy) = 1 - P(D2=sunny, D1=sunny) = 1 - 0.8 = 0.2

- Finally:

P(D2=sunny) = P(D2=sunny|D1=sunny)*P(D1=sunny)

+ P(D2=sunny|D1=rainy)*P(D1=rainy)

= 0.8*0.9 + 0.2*0.1 = 0.72 + 0.02 = 0.74

As we can see, we get 74% chance of next day sunny with given conditions.

Total probability is a part of Bayes Rule that we are going to explore next part.

5. Bayes Rule

Bayes Rule was found by Reverend Thomas Bayes, a British mathematician. It allows us to take into account the prior probability to calculate the posterior probability of a random variable.

If we know the probability of some prior events before running the test and we got some evidence of the test itself that all leads us to the finding called posterior probability.

P(X|Y) or the likelihood: Probability of event X happening given event Y

P(Y|X) or the posterior: Probability of event Y happening given event X

P(X) or the prior: Probability of prior event

P(Y) or the evidence: Probability of event coming out given the fact that prior event is X or Y (not X)

Lets dig deep into an example:

Parkinson's disease affects 0.3% of the general population. Imagine that we had a testing system for early detection. If a person has the disease, then the system will test positive for 99% of people who have the disease. This system will also test positive for 0.1% of people who do not have the disease. What is the percentage of false positives? i.e: What is the proportion of people without the disease who tested positive?

Applying Bayes Rule, we analyse the problem to find prior probability of event, testing results and compute the probability of posterior.

- Prior event:

P(D) = 3% = 0.03. It is P(A) is our formula

- The testing evidences from prior event:

P(+ | D) = 99% = 0.99

P(+ | ~D) = 0.1% = 0.001

Our reasoning:

P( ~D ) = 1 - P(D) = 997% = 0.997

Posterior: is what we are looking for:

P( ~D | +) = ?

P(+) is total probability where cancer test is positive when the sample test is collected from population of Parkinson's disease or not Parkinson disease.

P(+) = P(+ | D)*P(D) + P(+ | ~D)*P( ~D) = 0.003967

P( ~D | +) = 0.001 * 0.997 / 0.003967 = 0.25 = 25%

Conclusion: Even with positive test of Parkinson disease, the patient has chance of 25% the test result wrong.

Make the data speak itself

Saturday, December 28, 2019

Probability and Bayes Theorem