A two-sample t-test (aka an independent t-test or student’s t-test) is most often used to compare the means of two samples. For example: you might be checking if black cats, on average, have the same body weight as white cats. In this blog post I explain how to do a t-test in R. I also cover the requirements and constraint.
Just like a simple linear regression, a t-test is one of those procedures I run into on a weekly base. I often use it in the process of exploratory analysis, or when building features for a model and even when a colleague simply asks “what the difference is between two things.”
The data I use here is taken from a real-world problem I encountered recently. It’s a set of two samples. As you can see, they are of unequal sample size: one contains 10 values, the other contains 19 values. Clearly, the two samples are also unpaired.
a <- c(97,124,124,154,136,134,139,123,147,112) b <- c(100,117,93,94,148,75,106,131,139,104,99,85,105,80,84,74,83,99,61)
The goal here is to check if the mean of both samples differs significantly. For this I will use a t-test. However, to compare the means of two samples, there are some assumptions to be met:
- The observations in both groups should be distributed normally.
- Both samples should have the same variance.
- Both samples should be sampled independently from both populations (which I can confirm).
Checking the normality assumption can be done in two ways. The first way: by constructing a Q-Q plot we visually compare the quantiles of the real-world data and compare it to the quantiles of a normal distribution with the same mean and standard deviation. The following two lines of code will produce a Q-Q plot for sample a. I also draw a line where all the dots should be if the data is perfectly normally distributed. The same can be done for sample b.
As you can see the tails seem somewhat off and extreme, especially on the left side. If you go back to the data (supra), that’s where the low value of 97 kicks in.
We can also use a statistical test: the Shapiro-Wilk test. This is a very common test to check for normality. The closer the W-value (the result from the test) is to 1, the higher the probability that your data is normal and that the null-hypothesis can not be rejected.
You should remember to use it for small samples only, as deviations in large samples might impact the result of the test drastically. As demonstrated in this blog post by Emil Kirkegaard, in very large samples, even W-values lower than 0.99 tend to reject the null hypothesis.
In the following line of code, I run a Shapiro-Wilk test on sample b. Once again, this is completely analogous for sample a.
Let’s say we set the desired alpha level to reject the null hypothesis of normality on 0.05. In this case, we cannot reject that sample b is non-normally distributed. By the way, the same goes for sample a.
Next, we can check if both variances are equal. For this, we can use an F-test of equality of variances. A required assumption for this test is that both samples are both distributed normally. As we know from the previous tests we did, we could not reject normality.
Any F-test compares variances, but this one in particular checks the ratio of two sample variances. The following line of code does the test:
The following result is produced. The 95% confidence interval for the F-statistic of this test has a confidence interval of 0.187 to 2.029. As you can see here, the value of 0.548 is well within this interval, and using this test, we cannot reject the null-hypothesis that the variances in both samples differ.
This paves the road to the (traditional) Student’s t-test, which can be done as follows:
t.test(a,b, var.equal = T)
The t-value of the difference of the means is 3.7, which gives a p-value < 0.05. On a 95%-level confidence interval we can clearly reject the null hypothesis that the means of both samples are the same. The true population difference is somewhere between 13.51 and 46.92.
Finally, you can also do a t-test that does not require equal variance. This is called Welch’s t-test or simply unequal variances t-test. In R, you can easily run it by dropping the var.equal parameter:
The t-value is even larger (and the p-value smaller). However, the confidence interval seems to be really similar.
This poses the question: what test should you run? Should you first check if variances are equal and do a Student’s t-test? Or should you always run a Welsh’s t-test? Multiple papers assess this question and more or less come to the same conclusion:
“If you want to compare the central tendency of 2 populations based on samples of unrelated data, then the unequal variance t-test should always be used in preference to the Student’s t-test or Mann–Whitney U test. To use this test, first examine the distributions of the 2 samples graphically. If there is evidence of nonnormality in either or both distributions, then rank the data. Take the ranked or unranked data and perform an unequal variance t-test. Draw your conclusions on the basis of this test.”“The unequal variance t-test is an underused alternative
to Student’s t-test and the Mann–Whitney U test” (2006), Graeme D. Ruxton
Conclusion: Welsh’s t-test appears to be superior to the Student’s t-test. But remember to check for normality.