Make the data speak itself: Hypothesis Testing

1. What is T-Test

The T-Test or also called Student’s T-Test is one of statistical techniques which compares mean of two samples in order to tell us if they come from the same population or not.

T - Student Distribution

The outcome of T-Test will give us p-value , different significance and confidence interval. In another words, it lets us know if those differences could have happened by chance or not. In case p-value is less than 5% or the confidence interval is more than 95%, the hypothesis is rejected.

2. Case Study

You work for an investment bank and you are making decision to invest your client's money. You're thinking about investing in Telecom or Hospitality but you're not so sure which one you will go with. So you're gonna make a test.

In order to do that, we randomly grab last quarter revenue of 20 companies of both industries. In this case, we don't know the variance of collected samples. We will compute and compare the means of both samples to verify the normality of their distribution.

The correct way to compare statistics is to define a hypothesis. Null Hypothesis (NH) is means of Telecom and Hospitality equals and Alternative Hypothesis (AH) is their mean is different.

NH: Mean (Telecom) = Mean (Hospitality)

AH: Mean (Telecom) # Mean (Hospitality)

Read data file and show statistical description of each sample:

Data Description of Telecom and Hospitality

Have a look how data is distributed in histogram and box plot:

Hospitality Revenue Distribution

Telecom Revenue Distribution

Calculate p-value from two independent samples with scipy in Python:

p_value from t_test is < 0.5% so we can reject NH. We can say that both data sample comes from different population

Data file and Notebook is available on Github

Make the data speak itself

Sunday, December 15, 2019

Hypothesis Testing - T-Test

No comments:

Post a Comment