T-Test Calculator

T-Statistic

Degrees of Freedom

P-Value (two-tailed, approx)

Significance

How the T-Test Works

The Student's t-test is a statistical hypothesis test used to determine whether there is a significant difference between means. Developed by William Sealy Gosset in 1908 while working at the Guinness Brewery (published under the pseudonym "Student"), the t-test has become one of the most widely used statistical procedures in science. According to a 2019 analysis in PLOS ONE, the t-test appears in over 60% of published biomedical research papers. A one-sample t-test compares a sample mean to a hypothesized population mean, while a two-sample t-test compares the means of two independent groups. This calculator uses Welch's t-test for two-sample comparisons, which does not assume equal variances and is recommended as the default by statisticians.

The t-statistic measures how many standard errors the sample mean is from the hypothesized or comparison mean. Larger absolute t-values indicate stronger evidence against the null hypothesis. The p-value represents the probability of observing results as extreme as yours if the null hypothesis were true. A p-value below 0.05 is traditionally considered statistically significant, but the American Statistical Association (ASA) issued a 2016 statement cautioning against rigid p-value thresholds and emphasizing that statistical significance alone does not imply practical importance. For related statistical tools, see our confidence interval calculator and sample size calculator.

The T-Test Formulas

One-sample t-test: t = (sample mean - hypothesized mean) / (standard deviation / sqrt(n))

Two-sample Welch's t-test: t = (mean1 - mean2) / sqrt(s1^2/n1 + s2^2/n2)

Worked example: A one-sample test with sample mean = 105, hypothesized mean = 100, standard deviation = 15, and n = 30: t = (105 - 100) / (15 / sqrt(30)) = 5 / 2.739 = 1.826. With df = 29, this yields a two-tailed p-value of approximately 0.078, which is not statistically significant at the 0.05 level.

Key Terms

Common Significance Levels

P-Value RangeSignificanceInterpretationCommon Use
p < 0.001Highly significantVery strong evidence against H0Most scientific fields
p < 0.01Very significantStrong evidence against H0Medical, social sciences
p < 0.05SignificantSufficient evidence to reject H0Standard threshold
p < 0.10Marginally significantSuggestive but inconclusiveExploratory research
p ≥ 0.10Not significantInsufficient evidence to reject H0Fail to reject null

Practical Examples

Example 1 -- Drug efficacy: A pharmaceutical trial compares blood pressure reduction between drug (mean = -12.5 mmHg, SD = 8, n = 50) and placebo (mean = -3.2 mmHg, SD = 7, n = 50). Welch's t = (-12.5 - (-3.2)) / sqrt(64/50 + 49/50) = -9.3 / 1.503 = -6.19. With approximately 94 df, p < 0.001. The drug shows a highly significant effect.

Example 2 -- Quality control: A factory claims bolts have mean diameter 10.0 mm. A sample of 25 bolts has mean 10.15 mm and SD 0.3 mm. One-sample t = (10.15 - 10.0) / (0.3/sqrt(25)) = 0.15/0.06 = 2.5. With df = 24, p ≈ 0.020. The sample provides significant evidence that the true mean differs from 10.0 mm. Use our standard deviation calculator to compute SD from raw data.

Tips and Strategies

Frequently Asked Questions

What is the difference between one-sample and two-sample t-tests?

A one-sample t-test compares a sample mean to a hypothesized population value to determine if there is a statistically significant difference. A two-sample t-test compares the means of two independent groups. For example, a one-sample test might check if the average height of students at a school differs from the national average, while a two-sample test compares average test scores between two different teaching methods. The choice depends on your research question and experimental design.

What does the p-value mean in a t-test?

The p-value is the probability of observing results as extreme as yours if the null hypothesis were true. A p-value below 0.05 means there is less than a 5% chance the observed difference occurred by random chance alone, which is the conventional threshold for statistical significance. However, statistical significance does not imply practical significance -- a very large sample can make trivially small differences statistically significant. Always report effect sizes alongside p-values for a complete picture.

When should I use a t-test instead of a z-test?

Use a t-test when the population standard deviation is unknown and must be estimated from the sample, which describes most real-world research situations. A z-test requires knowing the true population standard deviation and is only appropriate for large samples (typically n > 30) from a known distribution. The t-distribution has heavier tails than the normal distribution, accounting for the additional uncertainty from estimating the standard deviation. As sample size increases, the t-distribution approaches the normal distribution.

What are the assumptions of a t-test?

The t-test assumes that data are approximately normally distributed (less critical for large samples above 30 due to the Central Limit Theorem), observations are independent of each other, and data are measured on an interval or ratio scale. For two-sample tests, the standard version assumes equal variances between groups, while Welch's t-test (used by this calculator) relaxes that assumption. Violations of normality matter most for small samples; with n > 30, the t-test is robust to moderate non-normality.

What is Welch's t-test and when should I use it?

Welch's t-test is a modification of the two-sample t-test that does not assume equal variances between the two groups. It adjusts the degrees of freedom using the Welch-Satterthwaite equation, which accounts for differences in sample sizes and variances. According to research published in the British Journal of Mathematical and Statistical Psychology, Welch's test should be the default choice for two-sample comparisons because it performs well regardless of whether variances are equal or unequal. This calculator uses Welch's method for all two-sample tests.

How do I interpret degrees of freedom in a t-test?

Degrees of freedom (df) represent the number of independent values that can vary in the calculation. For a one-sample t-test, df = n - 1, where n is the sample size. For Welch's two-sample t-test, df is calculated from both sample sizes and variances using the Welch-Satterthwaite formula, often resulting in a non-integer value. Higher degrees of freedom produce a t-distribution closer to the normal distribution, making it easier to achieve statistical significance. A study with 10 df needs a larger t-statistic to reach significance than one with 100 df.

Related Calculators