The Conundrum of P-Values: Statistical Significance is Unavoidable but Need Medical Significance Too
Background: Small P-values have been conventionally considered as evidence to reject a null hypothesis in empirical studies. However, there is widespread criticism of P-values now and the threshold we use for statistical significance is questioned.
Methods: This communication is on contrarian view and explains why P-value and its threshold are still useful for ruling out sampling fluctuation as a source of the findings.
Results: The problem is not with P-values themselves but it is with their misuse, abuse, and over-use, including the dominant role they have assumed in empirical results. False results may be mostly because of errors in design, invalid data, inadequate analysis, inappropriate interpretation, accumulation of Type-I error, and selective reporting, and not because of P-values per se.
Conclusion: A threshold of P-values such as 0.05 for statistical significance is helpful in making a binary inference for practical application of the result. However, a lower threshold can be suggested to reduce the chance of false results. Also, the emphasis should be on detecting a medically significant effect and not zero effect.
2.Hubbard R, Lindsay RM. Why P values are not a useful measure of evidence in statistical significance testing. Theory Psychol 2008;18(1):69–88.
3.Trafimow D, Marks M. Editorial. Basic and Applied Social Psychology 2015;37;1–2.
4.Wasserstein RL Lazar NA. The ASA statement on p-values: Context, process and purpose. Am Stat 2016;70:129–133.
5.Nuzzo R. Scientific method: Statistical errors. Nature 2014;506:152–156.
6.Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2(8):e124.
7.Wasserstein RL,Schirm AL, Lazar NA. Moving to a world beyond “p<0.05”. The Am Stat 2019;73(Sup1):1–19.
8.Amrhein V, Greenland S, Mc Shane B. Scientistsrise up against statistical significance. Nature 2019;567:305–307.
9.Ioannidis JPA. What have we (not) learnt from millions of scientific papers with P-values? Am Stat 2019;73(Sup1):20–25.
10.Gagnier J, Morgenstern H. Misconception, misuses and misinterpretation of P-value and significance testing. J Bone Joint Surg 2017;99(18):1598–1603.
11.Young NS, Ioannidis JPA, Al-Ubaydli O. Why current publication practices may distort science. PLoS Med 2008;5(10):e201.
12.Gelman A. Ethics in statistical practice and communication: Five recommendations. Significance 2018 (October); 37:40-43.
13.Nahm FS. What the P values really tell us. Korean J Pain. 2017;30(4):241–242.
14.Cohen HW. P-Values: Use and misuse in medical literature. Am J Hypert. 2011;24:18–23.
15.Wei YY. Statistical P-values do not dominate scientific research.Europmc2019;53(5):441–444.
16.Indrayan A, Malhotra RK. Medical Biostatistics, Fourth Edition. CRC Press, 2018.
17.Sullivan LM, Weinberg J, Keaney JF Jr. Common statistical pitfalls in basic science research. J Am Heart Assoc. 2016;5(10):e004142.
18.Harrington D, D’Agostino RB, Sr., Gatsonis C, Hogan JW, Hunter DJ, Normand ST, Drazen JM, M.D., Hamel MB.New guidelines for statistical reporting in the Journal (Editorial). N Engl J Med 2019; 381:285-286. DOI: 10.1056/NEJMe1906559.
19.Szucs D, Ioannidis JPA. When null hypothesis significance testing is unsuitable for research: A reassessment. Front Hum Neurosci. 2017;11:390.
20.Lytsy P. P in the right place: Revisiting the evidential value of P-values. J Evid Based Med 2018;11:288–291.
21.Concato J, Hartigan JA. P values: From suggestion to superstition. J Investig Med2016;64:1166–1171.
22.Manning JT, Anderson RH, Shutt M. Parental age gap skews child sex ratio. Nature 1997;389:344.
23.Drefahl S. How does the age gap between partners affect their survival? Demography 2010;47:313–326.
24.Nahar VK. Using the multitheory model to predict initiation and sustenance of physical activity behavior among osteopathic medical students J Am Osteopath Assoc 2019;119:479–487.
25.Foster K, Younger N, Aiken W, Brady-West D, Delgoda R. Reliance on medicinal plant therapy among cancer patients in Jamaica. Cancer Causes & Control 2017;28:1349–1356.
26.Plebani M. Errors in clinical laboratories or errors in laboratory medicine?Clin Chem and Lab Med 2006;44:750–759.
27.Indrayan A, Holt MP. Concise Encyclopedia of Biostatistics for Medical Professionals. CRC Press, 2016.
28.Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD. The extent and consequences of p-hacking in science. PLoS Biol2015;13:e1002106. doi: 10.1371/journal.pbio.1002106.
29.Jacobson SW, Jacobson JL. The risk of low-to-moderate prenatal alcohol exposure on child academic underachievement and behaviour may be difficult to measure and should not be underestimated. Evid Based Med 2014;19:e7. doi:10.1136/eb-2013-101535.
30.Patel CJ, Ji J, Sundquist J, Ioannidis JPA, Sundquist K. Systematic assessment of pharmaceutical prescriptions in association with cancer risk: A method to conduct a population-wide medication-wide longitudinal study. Scientific Reports 2016;6: 31308.
31.Padovani F, Richardson A, Tsou JY (Editors). Objectivity in Science: New Perspective from Science and Technological Studies, Springer, 2015.
32.Benjamin DJ. Redefine statistical significance. Nature Human Behaviour 2017;2: 6–10.
33.Victor A, Elsässer A, Hommel G, Blettner M. Judging a plethora of p-values: How to contend with the problem of multiple testing--part 10 of a series on evaluation of scientific publications. DtschArztebl Int 2010;107(4):50–56.
34.Feise RJ. Do multiple outcome measures require p-value adjustment? BMC Med Res Methodol 2002;2:8.
35.Vega JF, Strnad GJ, BenaJ, Spindler KP. Predicting the need for surgical intervention prior to first encounter for individuals with knee complaints: A novel approach. Orthop J Sports Med 2019;7(7):2325967119859485. Published 2019 Jul 25.
.36.Koratala A, Dass B, Alquadan KF, Sharma S, Singhania G, Ejaz AA. Static pressures, intra-access blood flow and dynamic Kt/V profiles in the prediction of dialysis access function. World J Nephrol 2019;8(3):59-66.
37.Alves G, Yu YK. Accuracy evaluation of the unified P-value from combining correlated P-values. PLoS One. 2014;9(3):e91225.
38.Editorial. Significant debate: Looking beyond statistical significance would make science harder, but might help to avoid false positives, overhyped claims and overlooked effects. Nature 2019;567:283.
39.Blume JD, Greevy RA, Welty VF, Smith JR, Dupont WD (2019) An introduction to second-generation p-values, Am Stat 2019;73(Sup1):157-167. DOI: 10.1080/00031305.2018.1537893.
40.Lakens D, Scheel AM, Isager PM. Equivalence testing for psychological research: A tutorial. Advances in Methods and Practices in Psychological Science 2018;1: 259–269.
41.Duffy B. The Perils of Perception: Why we are Wrong About Nearly Everything Atlantic Books, 2018.
42.Kahan DM, Wittlin M, Peters E et al. The Tragedy of the Risk-Perception Commons: Culture Conflict, Rationality Conflict, and Climate Change. Temple University Legal Studies Research Paper No. 2011-26; Cultural Cognition Project Working Paper No. 89; Yale Law & Economics Research Paper No. 435; Yale Law School, Public Law Working Paper No. 230. Disponibileall’indirizzo:
Files | ||
Issue | Vol 5 No 4 (2019) | |
Section | Methodology | |
DOI | | |
Keywords | ||
Empirical studies; P-values ; Sampling fluctuation; Type-I error |
Rights and permissions | |
![]() |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |