*Geek Box: P-values vs. Confidence Intervals

Most of us are not statisticians, and yet statistics are at the core of science. This means that often, ideas and misconceptions about statistics frustrate statisticians, when those misconceptions are actually coming from within researchers! For most of you who constantly read scientific literature, the p-value – as a significance test – and confidence intervals as parameters of reliability in an estimated mean – are the most common you’ll come across.

Yet, both have their own misconceptions, and it is important to understand that while p-values and confidence intervals are related, they are providing different information. A p-value is the result of a hypothesis test. In this way, it relates directly to the null hypothesis, i.e., “there is no difference between Diet A and Diet B”. The threshold of p=0.05 for significance is arbitrary, and more an industry standard than any hard and fast rule.

When the hypothesis is tested, if the p-value is <0.05, then this is deemed to be statistically significant and the null hypothesis is rejected, i.e., there is a difference between Diet and Diet B. A p-value, therefore, is only saying that there is either a statistically significant difference or a statistically insignificant difference.

A confidence interval indicates the precision of the estimate, and generates an interval with a lower and upper limit for the mean value [or whatever the measure is]. Generally, the narrower the estimate, the more precise the estimate, and the wider the interval, the less precise an estimate. 95% simply means that if the same study was repeated multiple times from the same population, 95% of the confidence intervals generated from the study would contain the actual true population mean.

Confidence can relate to the p-value, for example intervals can be used in this way for calculating risk ratios [e.g., relative risk, hazard ratio, etc.], and if the confidence interval contains 1, the finding will be a p-value of >0.05 and is not significant; conversely, if the internal does not contain 1, then the finding will have a p-value of <0.05 and be statistically significant. It is commonly misinterpreted that if confidence intervals overlap, there is no statistically significant difference between two means; this is not the case. It is also a common misinterpretation that if the mean of one group is outside the interval of the other, there is a statistically significant difference. It’s important to note that just because the result of a p-value is not statistically significant, that this does not mean there is no difference or no effect. It’s also helpful to know that a confidence interval may not necessarily contain the true population mean [it’s 95% probability coverage, not 100%!].

Both are still valuable. Confidence intervals in particular, beyond the statistical significance in the hypothesis test, contain key information to help in the interpretation of the data.