T-Test Calculator
T-Statistic
—
Degrees of Freedom
—
P-Value (two-tailed, approx)
—
Significance
—
How the T-Test Works
The Student's t-test is a statistical hypothesis test used to determine whether there is a significant difference between means. Developed by William Sealy Gosset in 1908 while working at the Guinness Brewery (published under the pseudonym "Student"), the t-test has become one of the most widely used statistical procedures in science. According to a 2019 analysis in PLOS ONE, the t-test appears in over 60% of published biomedical research papers. A one-sample t-test compares a sample mean to a hypothesized population mean, while a two-sample t-test compares the means of two independent groups. This calculator uses Welch's t-test for two-sample comparisons, which does not assume equal variances and is recommended as the default by statisticians.
The t-statistic measures how many standard errors the sample mean is from the hypothesized or comparison mean. Larger absolute t-values indicate stronger evidence against the null hypothesis. The p-value represents the probability of observing results as extreme as yours if the null hypothesis were true. A p-value below 0.05 is traditionally considered statistically significant, but the American Statistical Association (ASA) issued a 2016 statement cautioning against rigid p-value thresholds and emphasizing that statistical significance alone does not imply practical importance. For related statistical tools, see our confidence interval calculator and sample size calculator.
The T-Test Formulas
One-sample t-test: t = (sample mean - hypothesized mean) / (standard deviation / sqrt(n))
Two-sample Welch's t-test: t = (mean1 - mean2) / sqrt(s1^2/n1 + s2^2/n2)
Worked example: A one-sample test with sample mean = 105, hypothesized mean = 100, standard deviation = 15, and n = 30: t = (105 - 100) / (15 / sqrt(30)) = 5 / 2.739 = 1.826. With df = 29, this yields a two-tailed p-value of approximately 0.078, which is not statistically significant at the 0.05 level.
Key Terms
- T-Statistic: The ratio of the difference between means to the standard error. Larger absolute values indicate greater evidence against the null hypothesis.
- P-Value: The probability of observing results at least as extreme as those obtained, assuming the null hypothesis is true.
- Degrees of Freedom (df): The number of independent values free to vary. For one-sample: df = n-1. For Welch's: calculated via the Welch-Satterthwaite equation.
- Null Hypothesis (H0): The default assumption that there is no significant difference between means.
- Standard Error: The standard deviation of the sampling distribution of the mean, equal to s/sqrt(n).
- Effect Size (Cohen's d): The standardized difference between means: d = (mean1 - mean2) / pooled SD. Values of 0.2, 0.5, and 0.8 are considered small, medium, and large effects.
Common Significance Levels
| P-Value Range | Significance | Interpretation | Common Use |
|---|---|---|---|
| p < 0.001 | Highly significant | Very strong evidence against H0 | Most scientific fields |
| p < 0.01 | Very significant | Strong evidence against H0 | Medical, social sciences |
| p < 0.05 | Significant | Sufficient evidence to reject H0 | Standard threshold |
| p < 0.10 | Marginally significant | Suggestive but inconclusive | Exploratory research |
| p ≥ 0.10 | Not significant | Insufficient evidence to reject H0 | Fail to reject null |
Practical Examples
Example 1 -- Drug efficacy: A pharmaceutical trial compares blood pressure reduction between drug (mean = -12.5 mmHg, SD = 8, n = 50) and placebo (mean = -3.2 mmHg, SD = 7, n = 50). Welch's t = (-12.5 - (-3.2)) / sqrt(64/50 + 49/50) = -9.3 / 1.503 = -6.19. With approximately 94 df, p < 0.001. The drug shows a highly significant effect.
Example 2 -- Quality control: A factory claims bolts have mean diameter 10.0 mm. A sample of 25 bolts has mean 10.15 mm and SD 0.3 mm. One-sample t = (10.15 - 10.0) / (0.3/sqrt(25)) = 0.15/0.06 = 2.5. With df = 24, p ≈ 0.020. The sample provides significant evidence that the true mean differs from 10.0 mm. Use our standard deviation calculator to compute SD from raw data.
Tips and Strategies
- Always check assumptions: Plot your data to verify approximate normality before running a t-test. For non-normal data with small samples, consider the Mann-Whitney U test as a non-parametric alternative.
- Use Welch's t-test by default: It performs well regardless of whether variances are equal, with negligible loss of power when they are equal.
- Report effect sizes: A statistically significant p-value does not tell you how large the effect is. Always calculate and report Cohen's d alongside your t-test results.
- Plan sample size in advance: Use a sample size calculator to determine the minimum n needed to detect a meaningful effect before collecting data.
- Beware of multiple comparisons: Running many t-tests inflates the overall false-positive rate. For comparing 3+ groups, use ANOVA instead.
Frequently Asked Questions
What is the difference between one-sample and two-sample t-tests?
A one-sample t-test compares a sample mean to a hypothesized population value to determine if there is a statistically significant difference. A two-sample t-test compares the means of two independent groups. For example, a one-sample test might check if the average height of students at a school differs from the national average, while a two-sample test compares average test scores between two different teaching methods. The choice depends on your research question and experimental design.
What does the p-value mean in a t-test?
The p-value is the probability of observing results as extreme as yours if the null hypothesis were true. A p-value below 0.05 means there is less than a 5% chance the observed difference occurred by random chance alone, which is the conventional threshold for statistical significance. However, statistical significance does not imply practical significance -- a very large sample can make trivially small differences statistically significant. Always report effect sizes alongside p-values for a complete picture.
When should I use a t-test instead of a z-test?
Use a t-test when the population standard deviation is unknown and must be estimated from the sample, which describes most real-world research situations. A z-test requires knowing the true population standard deviation and is only appropriate for large samples (typically n > 30) from a known distribution. The t-distribution has heavier tails than the normal distribution, accounting for the additional uncertainty from estimating the standard deviation. As sample size increases, the t-distribution approaches the normal distribution.
What are the assumptions of a t-test?
The t-test assumes that data are approximately normally distributed (less critical for large samples above 30 due to the Central Limit Theorem), observations are independent of each other, and data are measured on an interval or ratio scale. For two-sample tests, the standard version assumes equal variances between groups, while Welch's t-test (used by this calculator) relaxes that assumption. Violations of normality matter most for small samples; with n > 30, the t-test is robust to moderate non-normality.
What is Welch's t-test and when should I use it?
Welch's t-test is a modification of the two-sample t-test that does not assume equal variances between the two groups. It adjusts the degrees of freedom using the Welch-Satterthwaite equation, which accounts for differences in sample sizes and variances. According to research published in the British Journal of Mathematical and Statistical Psychology, Welch's test should be the default choice for two-sample comparisons because it performs well regardless of whether variances are equal or unequal. This calculator uses Welch's method for all two-sample tests.
How do I interpret degrees of freedom in a t-test?
Degrees of freedom (df) represent the number of independent values that can vary in the calculation. For a one-sample t-test, df = n - 1, where n is the sample size. For Welch's two-sample t-test, df is calculated from both sample sizes and variances using the Welch-Satterthwaite formula, often resulting in a non-integer value. Higher degrees of freedom produce a t-distribution closer to the normal distribution, making it easier to achieve statistical significance. A study with 10 df needs a larger t-statistic to reach significance than one with 100 df.