Paired Two-sample T-test (Dependent T-test)

Test your knowledge

What is a Paired 2-sample T-test?

Let’s analyze this definition from scratch.

A T-test is a statistical test whose outcomes follow a T-distribution.
Two-sample means we have 2 sets of samples, and our target is to verify if the means of the 2 distributions that generate these 2 sample sets are equal.
Paired means these 2 sample sets are not independent of each other, each observation in one sample set must correspond to one and only one observation in the other set.

Thus, in summary, a Paired 2-sample T-test takes as input 2 sample sets that have their observations linked to the other on a 1-to-1 basis, and the test’s outputs follow a T-distribution. This is also abbreviated as the Paired T-test or Dependent T-test.

In contrast to the Paired 2-sample T-test, we also have the Unpaired 2-sample T-test.

	Paired 2-sample T-test	Unpaired 2-sample T-test
Usage	When each observation in a sample set is semantically related to one and only one observation in the other set.	When the requirement of correspondence for the Paired 2-sample T-test does not hold.
Usecase examples	We have a soft-skill course. We measure the performance of our company’s employees before and after learning the course to see if this course enhances employee productivity. We do A/B Testing on our ads. For each of our webpages, blue-colored theme ads are shown for some guests while yellow-colored theme ads are shown to the others. After a day, we have 2 sets of sample data, one contains the average click-through rate (for ads) of each webpage when blue ads are shown, the other contains the same measure for yellow ads. We compare these 2 sets to see if one color is significantly more attractive than the other.	There are 2 sample sets, one is the weights of 30 men and the other is the weights of 30 women. We want to test if the weight is significantly different for different genders. We do A/B Testing on our ads. For some random webpages, we show ads with blue-colored themes, while for the other webpages, yellow-colored theme ads are shown. After a day, we have 2 sets of sample data, one contains the average click-through rate (for ads) of each webpage that is attached with blue ads, the other contains the same measure for webpages with yellow ads. We compare these 2 sets to see if one color is significantly more attractive than the other.
Sample sizes	The sizes of the 2 sample sets must be the same.	2 sample sets may have different sizes.

Notice that the result taken from the Paired Test is more significant than from the Unpaired Test given the same samples (because the 1-to-1 relationship gives additional information), it is recommended to take the Paired Test if possible (i.e. if the conditions for a Paired Test hold).

Assumptions

The Paired 2-sample T-test is a parametric test, thus it requires some assumptions to be true (or at least approximately true):

The observations must be measured in numerical values (i.e. continuous, interval or ratio). For categorical variables, we should use another test, for example, the Chi-squared test.
The distribution that generates the differences between paired values must be a Normal distribution. This normality can be roughly verified by examining the differences using, for instance, QQ-plot, Shapiro-Wilk or Anderson Darling test. For non-normal data, we can either try to transform it to normal or use a non-parametric test, e.g. Mann Whitney Test and Wilcoxon Signed Rank Test. Note that we do NOT need each set to follow Normal distribution, but rather the differences must (approximately) normally distributed.
Each observation must be independent of the others in the same sample set. Dependencies between observations in the same set may affect the objectivity of the test, which makes the test result unreliable. This assumption is almost always attached to all, including both parametric and non-parametric tests.
There must not be any big-influencers (outliers). Outliers may bias the test, especially when the sample size is small (, which case is often for medical tests). In cases there are big-influencers in our data, we may choose to remove it (with care) or switch to a robust test like the Wilcoxon Signed Rank Test, which calculates the ranking thus is not impacted so much by extreme values.

Note that even though the tests’ general goal is to check if the means of the 2 distributions are equal or not, it is usually the case that it checks whether the 2 distributions are the same or not. That is, we usually assume the variances of the 2 distribution are equal, then we test if the means are also equal (both the means and variances are equal indicates that the 2 distributions are also the same) or not (implies that 2 distributions are not the same).

Conduct the Test

The Paired 2-sample T-test is just a One-sample T-test in disguise. Put it another way, we can transform the Paired T-test into a One-sample T-test.

This transformation can be elaborated by restating the problem: we want to test if the 2 sample sets are generated by the same distribution, which is identical to test if the differences between them are generated by a distribution with mean 0.

Call:

n as the size of each sample set.
$X_1$ and $X_2$ as the 2 sets, where the i-th observation of $X_1$ ( $X_1^{(i)}$ ) is related to the i-th observation of $X_2$ ( $X_2^{(i)}$ ), $1 \leq i \leq n$ .
$X_D$ as the set of differences between each observation in $X_1$ with the correspondence in $X_2$ . ( $X_D^{(i)} = X_1^{(i)} - X_2^{(i)}$ .)
$\overline{X_D}$ is the mean of $X_D$ .
$S_D$ is the sample standard deviation of $X_D$ . Note that this is the sample std, so we compute the unbiased std (i.e. divide by n-1 instead of n).

Here, the T-statistic is taken by:

$\begin{aligned}T = \frac{\overline{X_D}}{S_D / \sqrt{n}}\end{aligned}$
with the degree of freedom DF = n – 1.

After getting the T-statistics and the degree of freedom, we can verify our hypothesis using the T-table (or with the help of Python, or other means) as previously described here.

Example

Suppose we want to measure the effectiveness of a diet. We gauss the weights of 10 people before and after practicing the diet to verify if there is any statistical difference. The weights are shown in the table below, where each row represents 1 person.

Weight before diet ( $X_1$ )	Weight after diet ( $X_2$ )
50	52
74	70
65	58
80	79
66	66
58	53
49	47
54	55
71	60
55	52

Let’s suppose our significance level ( $\alpha$ ) is 0.05.

To solve this problem, firstly, we make the set of differences, $X_D$ .

$\begin{aligned}X_D &= \text{\{50 - 52, 74 - 70, 65 - 58, ..., 55 - 52\}} \\ &= \{-2, 4, 7, 1, 0, 5, 2, -1, 11, 3\}\end{aligned}$

Secondly, we calculate the mean and standard deviation of this set:

$\overline{X_D} = 3$

$S_D = 3.944$

The T-statistic is then:

$T = \frac{3}{3.944 / \sqrt{10}} = 2.405$

with the degree of freedom DF = n – 1 = 9.

Look up the T-table, we take it that the Critical Value for $\alpha = 0.05$ of a 2-tailed test with DF = 9 is 2.262, which is smaller than our T-statistic 2.405. Hence, we conclude that the impact of the diet is statistically significant.

Test your understanding

References:

A post on statisticssolution about paired sample t-test: link
Wikipedia’s page about Student T-test: link

Tung M Phung's Blog

Paired Two-sample T-test (Dependent T-test)

Leave a ReplyCancel reply