What is a p-value and what does it mean?

A p-value is the probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true. In simpler terms, it measures how likely your data would be if there were no real effect or difference. A p-value of 0.03 means there is a 3% chance of seeing results this extreme by random chance alone. The smaller the p-value, the stronger the evidence against the null hypothesis. According to the American Statistical Association (ASA), a p-value does not measure the probability that the hypothesis is true or the probability that the data were produced by random chance alone.

What p-value is considered statistically significant?

The most commonly used threshold for statistical significance is p < 0.05, meaning there is less than a 5% probability the results occurred by chance. This threshold was popularized by Ronald Fisher in the 1920s and has become the standard in most scientific fields. However, some fields use stricter thresholds: particle physics requires p < 0.0000003 (5 sigma), genomics studies often use p < 0.00000005 (genome-wide significance), and clinical trials may use p < 0.01. In 2019, over 800 scientists signed a letter in Nature calling for an end to rigid p < 0.05 cutoffs, advocating instead for reporting exact p-values and effect sizes.

What is the difference between one-tailed and two-tailed p-values?

A one-tailed (one-sided) test checks for an effect in only one direction (e.g., "is the mean greater than X?"), while a two-tailed (two-sided) test checks for an effect in either direction (e.g., "is the mean different from X?"). For the same test statistic, a two-tailed p-value is exactly twice the one-tailed p-value. For example, a z-score of 1.96 gives a one-tailed p-value of 0.025 and a two-tailed p-value of 0.05. Use a one-tailed test only when you have a strong prior hypothesis about the direction of the effect. Most medical and social science research uses two-tailed tests as the default.

How do I calculate a p-value from a z-score?

To calculate a p-value from a z-score, you look up the area under the standard normal distribution curve beyond your z-score. For a one-tailed test, the p-value equals the area in one tail: P(Z > z) for a right-tailed test. For a two-tailed test, multiply by 2: p = 2 × P(Z > |z|). For example, a z-score of 2.0 has a one-tailed p-value of 0.0228 and a two-tailed p-value of 0.0456. This calculation uses the cumulative distribution function (CDF) of the standard normal distribution. Our calculator performs this computation instantly.

What is the relationship between p-value and confidence interval?

P-values and confidence intervals are closely related but convey different information. A 95% confidence interval contains all values for which a two-tailed test would not reject the null hypothesis at the 0.05 significance level. If the 95% CI for a mean difference does not include zero, the p-value for testing whether the difference equals zero will be less than 0.05. Confidence intervals are generally considered more informative because they show the range of plausible values and the precision of the estimate, not just whether the result is "significant." The American Statistical Association recommends reporting both p-values and confidence intervals.

Can a p-value be exactly zero?

In theory, a continuous probability distribution means the p-value is never exactly zero. However, in practice, very small p-values (below about 10−16) may be reported as "p < 0.001" or "p < 2.2e-16" due to computational precision limits. Statistical software often displays 0.000 when the p-value is extremely small. This commonly occurs with very large sample sizes, where even tiny effects become statistically significant. A very small p-value does not necessarily mean a large or important effect; it could simply reflect a massive sample size detecting a trivially small difference. Always report effect sizes alongside p-values.

P Value Calculator

Q: What is the difference between one-tailed and two-tailed p-values?

A one-tailed (one-sided) test checks for an effect in only one direction (e.g., "is the mean greater than X?"), while a two-tailed (two-sided) test checks for an effect in either direction (e.g., "is the mean different from X?"). For the same test statistic, a two-tailed p-value is exactly twice the one-tailed p-value. For example, a z-score of 1.96 gives a one-tailed p-value of 0.025 and a two-tailed p-value of 0.05. Use a one-tailed test only when you have a strong prior hypothesis about the direction of the effect. Most medical and social science research uses two-tailed tests as the default.

Q: How do I calculate a p-value from a z-score?

To calculate a p-value from a z-score, you look up the area under the standard normal distribution curve beyond your z-score. For a one-tailed test, the p-value equals the area in one tail: P(Z > z) for a right-tailed test. For a two-tailed test, multiply by 2: p = 2 × P(Z > |z|). For example, a z-score of 2.0 has a one-tailed p-value of 0.0228 and a two-tailed p-value of 0.0456. This calculation uses the cumulative distribution function (CDF) of the standard normal distribution. Our calculator performs this computation instantly.

Quick Answer

A p-value is the probability of obtaining a test statistic at least as extreme as the observed one if the null hypothesis is true; results are commonly called statistically significant when p < 0.05, the threshold recommended in most scientific fields per the American Statistical Association.

Also searched as: p value calculator

P-Value from Z-Score

Z-Score

Test Type

P-Value

0.0500

Significance at α = 0.05

Statistically Significant

The p-value (0.0500) is ≤ 0.05. Reject the null hypothesis.

How P-Values Work

A p-value is the probability of observing a test statistic as extreme as (or more extreme than) the one calculated from your sample data, assuming the null hypothesis is true. It is the cornerstone of null hypothesis significance testing (NHST), the most widely used framework for statistical inference in science, medicine, social science, and business analytics.

The concept was formalized by statistician Ronald Fisher in the 1920s. According to the American Statistical Association (ASA), a p-value does not measure the probability that the studied hypothesis is true, nor does it measure the probability that the data were produced by random chance alone. Rather, it quantifies the compatibility of the observed data with a specified statistical model (the null hypothesis).

P-values are used across virtually every quantitative discipline. Medical researchers use them to evaluate drug efficacy in clinical trials. Social scientists use them to assess the significance of survey results. Business analysts use them in A/B testing to determine whether website changes affect conversion rates. Our Standard Deviation Calculator and Chi-Square Calculator are companion tools for the underlying statistical computations.

How P-Values Are Calculated

P-values are calculated from the cumulative distribution function (CDF) of the relevant probability distribution, depending on the test being used:

From a z-score: Uses the standard normal distribution (mean=0, SD=1). P = 1 − Φ(z) for right-tailed; P = Φ(z) for left-tailed; P = 2 × (1 − Φ(|z|)) for two-tailed, where Φ is the CDF.
From a t-statistic: Uses the Student's t-distribution with specified degrees of freedom (df). Similar to the z-score but accounts for small sample sizes. As df increases, the t-distribution approaches the normal distribution.
From chi-square: Uses the chi-square distribution with specified df. Always right-tailed. P = 1 − CDF(χ², df).

Worked example (z-score): You test whether a sample mean differs from the population mean and get z = 2.33. Two-tailed p-value = 2 × P(Z > 2.33) = 2 × 0.0099 = 0.0198. Since 0.0198 < 0.05, the result is statistically significant at the 5% level.

Key Terms You Should Know

Null hypothesis (H₀): The default assumption that there is no effect or no difference. The p-value measures evidence against this hypothesis.

Alternative hypothesis (H₁ or H_a): The hypothesis that there is a real effect or difference. It is what you hope to support with evidence.

Significance level (α): The predetermined threshold for rejecting the null hypothesis. Commonly 0.05 (5%), 0.01 (1%), or 0.001 (0.1%). If the p-value is less than α, the result is declared statistically significant.

Type I error (false positive): Rejecting the null hypothesis when it is actually true. The probability of this equals α. At α = 0.05, you accept a 5% chance of a false positive.

Type II error (false negative): Failing to reject the null hypothesis when it is actually false. The probability is denoted β. Statistical power = 1 − β.

Effect size: A measure of the magnitude of the difference or relationship, independent of sample size. Common measures include Cohen's d, Pearson's r, and odds ratios. The ASA recommends always reporting effect sizes alongside p-values.

Common Significance Thresholds by Field

Different scientific disciplines use different significance thresholds. A 2019 Nature article signed by over 800 statisticians argued that the arbitrary p < 0.05 threshold leads to both false confidence and false despair. The table below shows conventions across fields, based on published standards from each discipline.

Field	Typical α	P-Value Threshold	Notes
Social Sciences	0.05	p < 0.05	Standard since Fisher; replication crisis debate ongoing
Medical Research	0.05 or 0.01	p < 0.05	FDA typically requires p < 0.05 in Phase III trials
Particle Physics	0.0000003	5σ (p < 3 × 10⁻⁷)	Used for discovery claims (e.g., Higgs boson)
Genomics (GWAS)	5 × 10⁻⁸	p < 5 × 10⁻⁸	Bonferroni correction for ~1 million SNP tests
A/B Testing (Tech)	0.05–0.10	p < 0.05 or 0.10	Often combined with minimum detectable effect (MDE)
Proposed New Standard	0.005	p < 0.005	2017 Nature Human Behaviour proposal by 72 statisticians

Practical Examples

Example 1 — Medical trial: A clinical trial tests a new drug against a placebo. The t-test produces t = 2.89 with 48 degrees of freedom. Two-tailed p-value = 0.0058. Since p < 0.05, the drug shows a statistically significant effect. However, the researchers also report Cohen's d = 0.42 (medium effect size), providing context that the magnitude of the effect is clinically meaningful.

Example 2 — A/B test: A website runs an experiment comparing two homepage designs. After 10,000 visitors per group, the conversion rate is 3.2% (control) vs 3.8% (treatment). The z-test produces z = 2.14, giving a two-tailed p = 0.032. The result is statistically significant, and the 0.6 percentage point lift translates to meaningful revenue impact. Use our Percentage Increase Calculator to quantify the improvement.

Example 3 — Chi-square test: A survey asks 200 people their preferred social media platform across 4 age groups. The chi-square test produces χ² = 15.4 with 9 degrees of freedom. The p-value is 0.080. Since p > 0.05, we fail to reject the null hypothesis; the data do not provide sufficient evidence that platform preference differs significantly by age group.

Tips for Interpreting P-Values Correctly

Report exact p-values, not just "significant" or "not significant": Writing p = 0.032 is more informative than p < 0.05. The ASA explicitly recommends this practice.
Always consider effect size: A p-value of 0.001 with a tiny effect size may be less meaningful than p = 0.04 with a large effect size. Statistical significance does not equal practical significance.
Beware of multiple comparisons: Testing 20 hypotheses at α = 0.05 means you expect 1 false positive by chance. Use corrections like Bonferroni (α/n) or Benjamini-Hochberg (FDR) when performing multiple tests.
A non-significant result is not proof of no effect: Failing to reject H₀ does not mean H₀ is true. The study may simply lack sufficient statistical power (sample size too small).
Consider the study design: P-values from well-designed randomized controlled trials carry more weight than those from observational studies, regardless of how small the p-value is.

The Replication Crisis and P-Values

The "replication crisis" in science has brought p-values under intense scrutiny. A landmark 2015 study by the Open Science Collaboration attempted to replicate 100 published psychology experiments and found that only 36% produced statistically significant results on replication, compared to 97% in the original publications. This has led to calls for pre-registration of studies, transparent reporting, and greater emphasis on effect sizes and confidence intervals alongside p-values.

In 2016, the American Statistical Association issued its first formal statement on p-values, emphasizing six principles including that p-values do not measure the importance of a result and that scientific conclusions should not be based solely on whether a p-value passes a specific threshold. This statement has been cited over 7,000 times and has influenced reporting standards across multiple journals.

P Value Calculator

P-Value from Z-Score

P-Value from T-Statistic

P-Value from Chi-Square Statistic