Make the data speak itself: 2019

Saturday, December 28, 2019

Probability and Bayes Theorem

Probability is a cornerstone in artificial intelligence. It expresses the degree of uncertainty of an event. Probability is widely used in Bayes Network, Computer Vision, Machine Learning, Natural Language Processing,...

This article will explains 3 types of probabilities: Single event, joint event and conditional event. Finally, it is an introduction of Bayes theorem which is derived from joint and conditional probabilities.

1. Independent / Single Event Probability

If we know the probability of an outcome (A) is p, we can reason the probability of opposite outcome ~A is 1-p:

P(X) = p
P(~X) = 1- p

If we flip a fair coin, the chance of getting head or tail is 0.5 (50%) .

In case the coin is loaded and the probability of head coming out is 0.75. Applying what we know, we certainly know that the chance of getting tail with loaded coin is 1 - 0.75 = 0.25

2. Joint Event Probability

It is the likelihood of event X and Y. Joint probability is the probability of event Y occurring at the same time that event X occurs. It is also called the intersection of 2 or more events.

P(X, Y) = P(X) * P(Y) = P(Y) * P(X)

Let's do an example. We have 52 cards with 13 hearts, 13 diamonds, 13 clubs and 13 spades. What is the probability of picking up a card of 6 and it is a heart ?

The question is about calculating the chance of 2 events happening at the same time: picking up a 6 heart card.

P(6, heart) = P(card is 6) * P(card is heart) = 4/52 * 13/52 = 1/52

3. Conditional Probability

Conditional probability is defined as the likelihood of an event or outcome occurring, based on the occurrence of a previous event or outcome. Actually, it is a combination of Joint Event and Single Event. It is easy to derive the formula from the above pictures:

Picture 1: P(X|Y) = P(X) * P(Y) / P(Y)

Picture 2: P(Y|X) = P(X) * P(Y) / P(X)

We revisit the earlier example. We want to know the probability of drawing a card of 6 given that an heart has already been drawn. Thus the event X is that we draw a heart. Event Y is that we draw a 6.

P(card = 6| card = heart)

= P(card=6)*P(card=heart)/P(card=heart)

= (6/52 * 13/52)/13/52 = 6/52

4. Total Probability

In real life, things or events depend on others. To calculate the possible outcome of an event, we have to combine it with all possibilities that already happened prior to this event. This can be somehow called total probability.

Assuming that some days are sunny and some days are rainy and the probability of day 1 sunny is 90% and the chance that it is continuously sunny in the following day is 80%. What is the probability of next day is actually sunny ?

We are asked to calculate the possibility that next day is sunny whatever day before is sunny or raining. We have to take care both situations. That is why total probability is helpful here.

- We have:

P(D1=sunny) = 0.9

P(D2=sunny, D1=sunny) = 0.8

- Probabilistic Inference:

P(D1=rainy) = 1 - P(D1=sunny) = 1 - 0.9 = 0.1

P(D2=sunny, D1=rainy) = 1 - P(D2=sunny, D1=sunny) = 1 - 0.8 = 0.2

- Finally:

P(D2=sunny) = P(D2=sunny|D1=sunny)*P(D1=sunny)

+ P(D2=sunny|D1=rainy)*P(D1=rainy)

= 0.8*0.9 + 0.2*0.1 = 0.72 + 0.02 = 0.74

As we can see, we get 74% chance of next day sunny with given conditions.

Total probability is a part of Bayes Rule that we are going to explore next part.

5. Bayes Rule

Bayes Rule was found by Reverend Thomas Bayes, a British mathematician. It allows us to take into account the prior probability to calculate the posterior probability of a random variable.

If we know the probability of some prior events before running the test and we got some evidence of the test itself that all leads us to the finding called posterior probability.

P(X|Y) or the likelihood: Probability of event X happening given event Y

P(Y|X) or the posterior: Probability of event Y happening given event X

P(X) or the prior: Probability of prior event

P(Y) or the evidence: Probability of event coming out given the fact that prior event is X or Y (not X)

Lets dig deep into an example:

Parkinson's disease affects 0.3% of the general population. Imagine that we had a testing system for early detection. If a person has the disease, then the system will test positive for 99% of people who have the disease. This system will also test positive for 0.1% of people who do not have the disease. What is the percentage of false positives? i.e: What is the proportion of people without the disease who tested positive?

Applying Bayes Rule, we analyse the problem to find prior probability of event, testing results and compute the probability of posterior.

- Prior event:

P(D) = 3% = 0.03. It is P(A) is our formula

- The testing evidences from prior event:

P(+ | D) = 99% = 0.99

P(+ | ~D) = 0.1% = 0.001

Our reasoning:

P( ~D ) = 1 - P(D) = 997% = 0.997

Posterior: is what we are looking for:

P( ~D | +) = ?

P(+) is total probability where cancer test is positive when the sample test is collected from population of Parkinson's disease or not Parkinson disease.

P(+) = P(+ | D)*P(D) + P(+ | ~D)*P( ~D) = 0.003967

P( ~D | +) = 0.001 * 0.997 / 0.003967 = 0.25 = 25%

Conclusion: Even with positive test of Parkinson disease, the patient has chance of 25% the test result wrong.

Thursday, December 19, 2019

Hypothesis Testing - ANOVA

1. What is ANOVA?

ANOVA stands for Analysis of Variance in which we compare variances in order to accept or to reject null hypothesis.

2. A Case Study

Your company wants to deploy a new promotion plan that change the price of some products. They want to test if the new plan (Plan B) works better than the current one (Plan A). They randomly collect the average revenue in 15 months from both plans.

Dataset for this case is available on Github

3. Data Analysis

- Show statistical data description

- Make a descriptive plots with data to analyse the variability with scatter plot and box plot

Scatter Plot

Box Plot

Observation: PlanA and PlanB are significant difference.

- Calculate one-way ANOVA

Observation: p_value = 0.2%, the confidence is greater than 95% so that we can reject null hypothesis that group A and group B have same population mean

Why there is F statistic value in the output of ANOVA? Are we sure about the p-value we just got?

One-way ANOVA result is done with the Fisher assumption. To be sure about P-value we got, we have to test the normality of samples. The ANOVA's assumption is simple to validate but some it is tricky.

4. Assumptions

- The sample should be random and independent.
- Each treatment should be normally distributed.
- The treatments should be homoscedastic. In another word, the population standard deviations of the groups are all equal.

4.1 Normality Test
There is a theorem saying that normality for every level is equivalent to normality of residues. The normality can be tested with Shapiro-Wilk and Q-Q Plot

Shapiro-Wilk

Observation:

p_value of group A equals 45%, the significance is 55%. We cannot reject the null hypothesis that data in group A is normal distribution.
In contrast, in group B, p_value = 5% -> confidence = 95%, we reject NH hypothesis that data in group B is normally distributed.

Q-Q Plot

Observation: Graphical plots show us that data of group A is normally distributed while data of group B is not.

The result from both tests Shapiro-Wilk and QQ-Plot complements to each other that can help us to make sure about the normality of given data.

4.2. Homoscedascity

Homoscedascity means the equivalence of variances in a population. Like normality, we can do it with graphical method and Levene Test.

Graphical Method: Residual Plot

Levene-Test

Conclusion:
- p_value from Levene-Test tells that data in both group is significantly different.
- Residual plot shows that data in Group A and data in Group B is not correlated.

5. Key Take-way

- With ANOVA, assumptions are normality and homoscedasticity
- For normality we use Shapiro-Wilk and QQ plot for the residues
- For homoscedasticity we plot the residues and use Levene Test.

The testing result and charts are generated by Python NoteBook

Sunday, December 15, 2019

Hypothesis Testing - T-Test

1. What is T-Test

The T-Test or also called Student’s T-Test is one of statistical techniques which compares mean of two samples in order to tell us if they come from the same population or not.

T - Student Distribution

The outcome of T-Test will give us p-value , different significance and confidence interval. In another words, it lets us know if those differences could have happened by chance or not. In case p-value is less than 5% or the confidence interval is more than 95%, the hypothesis is rejected.

2. Case Study

You work for an investment bank and you are making decision to invest your client's money. You're thinking about investing in Telecom or Hospitality but you're not so sure which one you will go with. So you're gonna make a test.

In order to do that, we randomly grab last quarter revenue of 20 companies of both industries. In this case, we don't know the variance of collected samples. We will compute and compare the means of both samples to verify the normality of their distribution.

The correct way to compare statistics is to define a hypothesis. Null Hypothesis (NH) is means of Telecom and Hospitality equals and Alternative Hypothesis (AH) is their mean is different.

NH: Mean (Telecom) = Mean (Hospitality)

AH: Mean (Telecom) # Mean (Hospitality)

Read data file and show statistical description of each sample:

Data Description of Telecom and Hospitality

Have a look how data is distributed in histogram and box plot:

Hospitality Revenue Distribution

Telecom Revenue Distribution

Calculate p-value from two independent samples with scipy in Python:

p_value from t_test is < 0.5% so we can reject NH. We can say that both data sample comes from different population

Data file and Notebook is available on Github