P-Value Calculator
P-Value from Z-Score
P-Value
0.0500
Significance at α = 0.05
Statistically Significant
The p-value (0.0500) is ≤ 0.05. Reject the null hypothesis.
How P-Values Work
A p-value is the probability of observing a test statistic as extreme as (or more extreme than) the one calculated from your sample data, assuming the null hypothesis is true. It is the cornerstone of null hypothesis significance testing (NHST), the most widely used framework for statistical inference in science, medicine, social science, and business analytics.
The concept was formalized by statistician Ronald Fisher in the 1920s. According to the American Statistical Association (ASA), a p-value does not measure the probability that the studied hypothesis is true, nor does it measure the probability that the data were produced by random chance alone. Rather, it quantifies the compatibility of the observed data with a specified statistical model (the null hypothesis).
P-values are used across virtually every quantitative discipline. Medical researchers use them to evaluate drug efficacy in clinical trials. Social scientists use them to assess the significance of survey results. Business analysts use them in A/B testing to determine whether website changes affect conversion rates. Our Standard Deviation Calculator and Chi-Square Calculator are companion tools for the underlying statistical computations.
How P-Values Are Calculated
P-values are calculated from the cumulative distribution function (CDF) of the relevant probability distribution, depending on the test being used:
- From a z-score: Uses the standard normal distribution (mean=0, SD=1). P = 1 − Φ(z) for right-tailed; P = Φ(z) for left-tailed; P = 2 × (1 − Φ(|z|)) for two-tailed, where Φ is the CDF.
- From a t-statistic: Uses the Student's t-distribution with specified degrees of freedom (df). Similar to the z-score but accounts for small sample sizes. As df increases, the t-distribution approaches the normal distribution.
- From chi-square: Uses the chi-square distribution with specified df. Always right-tailed. P = 1 − CDF(χ², df).
Worked example (z-score): You test whether a sample mean differs from the population mean and get z = 2.33. Two-tailed p-value = 2 × P(Z > 2.33) = 2 × 0.0099 = 0.0198. Since 0.0198 < 0.05, the result is statistically significant at the 5% level.
Key Terms You Should Know
Null hypothesis (H0): The default assumption that there is no effect or no difference. The p-value measures evidence against this hypothesis.
Alternative hypothesis (H1 or Ha): The hypothesis that there is a real effect or difference. It is what you hope to support with evidence.
Significance level (α): The predetermined threshold for rejecting the null hypothesis. Commonly 0.05 (5%), 0.01 (1%), or 0.001 (0.1%). If the p-value is less than α, the result is declared statistically significant.
Type I error (false positive): Rejecting the null hypothesis when it is actually true. The probability of this equals α. At α = 0.05, you accept a 5% chance of a false positive.
Type II error (false negative): Failing to reject the null hypothesis when it is actually false. The probability is denoted β. Statistical power = 1 − β.
Effect size: A measure of the magnitude of the difference or relationship, independent of sample size. Common measures include Cohen's d, Pearson's r, and odds ratios. The ASA recommends always reporting effect sizes alongside p-values.
Common Significance Thresholds by Field
Different scientific disciplines use different significance thresholds. A 2019 Nature article signed by over 800 statisticians argued that the arbitrary p < 0.05 threshold leads to both false confidence and false despair. The table below shows conventions across fields, based on published standards from each discipline.
| Field | Typical α | P-Value Threshold | Notes |
|---|---|---|---|
| Social Sciences | 0.05 | p < 0.05 | Standard since Fisher; replication crisis debate ongoing |
| Medical Research | 0.05 or 0.01 | p < 0.05 | FDA typically requires p < 0.05 in Phase III trials |
| Particle Physics | 0.0000003 | 5σ (p < 3 × 10−7) | Used for discovery claims (e.g., Higgs boson) |
| Genomics (GWAS) | 5 × 10−8 | p < 5 × 10−8 | Bonferroni correction for ~1 million SNP tests |
| A/B Testing (Tech) | 0.05–0.10 | p < 0.05 or 0.10 | Often combined with minimum detectable effect (MDE) |
| Proposed New Standard | 0.005 | p < 0.005 | 2017 Nature Human Behaviour proposal by 72 statisticians |
Practical Examples
Example 1 — Medical trial: A clinical trial tests a new drug against a placebo. The t-test produces t = 2.89 with 48 degrees of freedom. Two-tailed p-value = 0.0058. Since p < 0.05, the drug shows a statistically significant effect. However, the researchers also report Cohen's d = 0.42 (medium effect size), providing context that the magnitude of the effect is clinically meaningful.
Example 2 — A/B test: A website runs an experiment comparing two homepage designs. After 10,000 visitors per group, the conversion rate is 3.2% (control) vs 3.8% (treatment). The z-test produces z = 2.14, giving a two-tailed p = 0.032. The result is statistically significant, and the 0.6 percentage point lift translates to meaningful revenue impact. Use our Percentage Increase Calculator to quantify the improvement.
Example 3 — Chi-square test: A survey asks 200 people their preferred social media platform across 4 age groups. The chi-square test produces χ² = 15.4 with 9 degrees of freedom. The p-value is 0.080. Since p > 0.05, we fail to reject the null hypothesis; the data do not provide sufficient evidence that platform preference differs significantly by age group.
Tips for Interpreting P-Values Correctly
- Report exact p-values, not just "significant" or "not significant": Writing p = 0.032 is more informative than p < 0.05. The ASA explicitly recommends this practice.
- Always consider effect size: A p-value of 0.001 with a tiny effect size may be less meaningful than p = 0.04 with a large effect size. Statistical significance does not equal practical significance.
- Beware of multiple comparisons: Testing 20 hypotheses at α = 0.05 means you expect 1 false positive by chance. Use corrections like Bonferroni (α/n) or Benjamini-Hochberg (FDR) when performing multiple tests.
- A non-significant result is not proof of no effect: Failing to reject H0 does not mean H0 is true. The study may simply lack sufficient statistical power (sample size too small).
- Consider the study design: P-values from well-designed randomized controlled trials carry more weight than those from observational studies, regardless of how small the p-value is.
The Replication Crisis and P-Values
The "replication crisis" in science has brought p-values under intense scrutiny. A landmark 2015 study by the Open Science Collaboration attempted to replicate 100 published psychology experiments and found that only 36% produced statistically significant results on replication, compared to 97% in the original publications. This has led to calls for pre-registration of studies, transparent reporting, and greater emphasis on effect sizes and confidence intervals alongside p-values.
In 2016, the American Statistical Association issued its first formal statement on p-values, emphasizing six principles including that p-values do not measure the importance of a result and that scientific conclusions should not be based solely on whether a p-value passes a specific threshold. This statement has been cited over 7,000 times and has influenced reporting standards across multiple journals.