A Non-Statistician’s Personal Guide to Hypothesis Testing

Lauren Phipps
5 min readOct 13, 2020

I never took a stats class in college. Sure, I learned how to calculate the basics — mean, median, standard deviation, and variance. I learned which numbers to plug into which variables and heard people mention t-test and p-values, but I never learned the big picture of all these topics. So, when we got to the stats module in the bootcamp, my brain started to hurt. T-tests, p-values, z-tests, critical values, alphas, z-scores, nulls, type 1 and 2 errors. Coupling all the new vocabulary with the additional new methods in Python (which I’m still very much a beginner in), I got lost. I found myself defaulting to trying to memorize processes and functions, so I needed to take a step back and make sure I knew what I was doing first. This is my exercise in taking a step back.

Central Limit Theorem:

These concepts are based on the Central Limit Theorem, which states that as you take samples from a population, the mean of those samples of data will fall into a normal distribution. This normal distribution can help us estimate the likelihood of scenarios based on where the values fall in the distribution.

Z-Test:

The z-test is used to compare sample data to population data to determine if there is a statistically significant difference in the data, which can provide information into if a sample came from the population. It is used when you know the standard deviation of the population and the sample size is greater than 30. This test is less common in data science because the population parameters (standard deviation and mean) are not commonly known. Usually you’re only able to work with samples of data. However, when this information is known, you can use the population’s standard deviation to make a distribution of the expected mean values.

Your goal is to find z-score of the data, which represents how many standard deviations away from the population’s mean that sample’s mean is. Using the 68–95–99.7 rule (which uses the area under the curve to give probabilities that your data’s value would fall within that range), you can determine the chances that the sample came from the population and is statistically similar. For example, 68% of the values fall within one standard deviation (on each side) of the mean. However, the goal is to determine whether or not you can reject the null hypothesis, or reject the hypothesis that the sample mean is equal to the population’s. In order to do that, you’re looking at the very ends of the data, the spots where it’s very unlikely you’d get that sample mean. It’s possible, but not very likely. The amount of risk you’re willing to take on to be wrong is your alpha. An alpha of 0.05 represents a 5% chance that you’ll say they’re different, but really they’re the same. There’s a 5% chance that there would be a sample of that population with that mean. Again, it’s possible, but it’s more likely that they aren’t equal, so that’s the conclusion you draw. The less risk you’re willing to take in being wrong, the lower your alpha.

Steps:

  1. Write out your null and alternative hypothesis. Select your alpha.
  2. Find your z-score or standard statistic. This is done with the formula:
    z = x — u / std/sqrt(n). You’re finding the difference in the masses as it relates to the population’s standard deviation and sample size. Again, the z score determines the number of standard deviations away from the population’s mean your sample is.
  3. Find your p-value. In Python, use 1 — stats.norm.cdf(z). The second part (stats.norm.cdf(z)) gives the amount of area under the curve up until that point, as a ratio of the whole curve. It is subtracted from 1 to give the area remaining. This gives a sense of the probability it would fall within that region.
  4. If the calculated p-value is less than the alpha determined at the beginning (usually 0.05), you can reject the null hypothesis. There is a lower probability that the sample mean and the population mean are equal.

T-Test:

The t-test is used for situations where the population’s standard deviation is unknown or the sample size is less than 30. The goal is to determine the t-value of the set, which is the ratio of the difference between the means of the sets and the difference within the sets. It helps determine if the differences in means is due to an actual difference or if there is just a high variance in the data?

These t-tests are Student’s t-test which assume the same sample size and variance with the data.

One sample:
A one sample t-test compares a sample value against a population mean. Again, you determine a p-value to determine the level of “risk” you’re willing to take on. As with before, there’s a trade-off. A low p-value means you’re less likely to be able to reject your null hypothesis; however, there’s less of a chance of erroneously rejecting the null when you shouldn’t (Type 1 Error). A high p-value means you’re more likely to be able to reject the null, but you potentially introduce more error. 0.05 is used because it’s the sweet spot between the two scenarios.

Steps:

  1. Write out your null and alternative hypothesis. Select your alpha.
  2. Find the critical values of the population’s distribution. This is similar to the z-score. If you are trying to determine if there is any difference (greater or less than), use a two-tailed test. You’ll be determining the upper and lower bounds of your rejection region. If your calculated t-statistic is below the lower bound (alpha) or above the upper bound (1-alpha), you will be able to reject the null. If you are only looking at one side (above or below), you will only have one boundary for your rejection region.

Critical_value = stats.t.ppf(alpha (or 1-alpha), df (n-1))

3. Find test-statistic and p-value. Using ttest_1samp(data, average of comparison). If the test-statistic is greater than or less than the critical values from above and the p-value is less than the alpha, you will reject your null hypothesis.

Two sample:
A two sample t-test is similar, but instead of comparing one sample to the population mean, you are comparing two samples to determine if there’s a significant difference between their means. This is commonly used when comparing the effectiveness of interventions. Are differences between the samples different enough to warrant further study? The process for these calculations are similar.

Steps:

  1. Write out your null and alternative hypothesis. Select your alpha.
  2. Find the critical values of the distribution for comparison. The degrees of freedom (df) is the sum of the sample sizes — 2, rather than n-1 as above. Alpha will give you the lower bounds, while 1-alpha gives the upper bounds.

Critical_value = stats.t.ppf(alpha (or 1-alpha), df (n1 + n2–2))

3. Find test-statistic and p-value for the data. Again, if the test-statistic falls in the rejection region and the p-value is less than your selected alpha, you can reject the null.

Ttest = ttest_ind(sample_1, sample_2, equal_var = True/False)

Of course, all you’re doing with this is rejecting the null hypothesis, which is to say there’s not no difference between the data. No further conclusions can be drawn immediately; however, slowly but surely you can start to learn more about your data.

--

--