Methodology

The Conundrum of P-Values: Statistical Significance is Unavoidable but Need Medical Significance Too

Abstract

Background: Small P-values have been conventionally considered as evidence to reject a null hypothesis in empirical studies. However, there is widespread criticism of P-values now and the threshold we use for statistical significance is questioned.
Methods: This communication is on contrarian view and explains why P-value and its threshold are still useful for ruling out sampling fluctuation as a source of the findings.
Results: The problem is not with P-values themselves but it is with their misuse, abuse, and over-use, including the dominant role they have assumed in empirical results. False results may be mostly because of errors in design, invalid data, inadequate analysis, inappropriate interpretation, accumulation of Type-I error, and selective reporting, and not because of P-values per se.
Conclusion: A threshold of P-values such as 0.05 for statistical significance is helpful in making a binary inference for practical application of the result. However, a lower threshold can be suggested to reduce the chance of false results. Also, the emphasis should be on detecting a medically significant effect and not zero effect.

 

1.McGough JJ, Faraone SV. Estimating the size of treatment effects: Moving beyond p values. Psychiatry (Edgmont) 2009;6(10):21–29.https://www.ncbi.nlm.nih.gov/pubmed/20011465
2.Hubbard R, Lindsay RM. Why P values are not a useful measure of evidence in statistical significance testing. Theory Psychol 2008;18(1):69–88. https://doi.org/10.1177/0959354307086923
3.Trafimow D, Marks M. Editorial. Basic and Applied Social Psychology 2015;37;1–2. https://www.researchgate.net/publication/304150529_Editorial
4.Wasserstein RL Lazar NA. The ASA statement on p-values: Context, process and purpose. Am Stat 2016;70:129–133.https://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108
5.Nuzzo R. Scientific method: Statistical errors. Nature 2014;506:152–156. https://www.nature.com/news/scientific-method-statistical-errors-1.14700
6.Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2(8):e124. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/
7.Wasserstein RL,Schirm AL, Lazar NA. Moving to a world beyond “p<0.05”. The Am Stat 2019;73(Sup1):1–19. https://doi.org/10.1080/00031305.2019.1583913
8.Amrhein V, Greenland S, Mc Shane B. Scientistsrise up against statistical significance. Nature 2019;567:305–307. https://www.nature.com/articles/d41586-019-00857-9
9.Ioannidis JPA. What have we (not) learnt from millions of scientific papers with P-values? Am Stat 2019;73(Sup1):20–25. https://www.tandfonline.com/doi/pdf/10.1080/00031305.2018.1447512?needAccess=true
10.Gagnier J, Morgenstern H. Misconception, misuses and misinterpretation of P-value and significance testing. J Bone Joint Surg 2017;99(18):1598–1603.https://insights.ovid.com/pubmed?pmid=28926390
11.Young NS, Ioannidis JPA, Al-Ubaydli O. Why current publication practices may distort science. PLoS Med 2008;5(10):e201. https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0050201
12.Gelman A. Ethics in statistical practice and communication: Five recommendations. Significance 2018 (October); 37:40-43.http://www.stat.columbia.edu/~gelman/research/published/SIGN_15(5)_09_InPractice_Gelman_EthicsAndComm.pdf
13.Nahm FS. What the P values really tell us. Korean J Pain. 2017;30(4):241–242. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5665734/
14.Cohen HW. P-Values: Use and misuse in medical literature. Am J Hypert. 2011;24:18–23. https://academic.oup.com/ajh/article/24/1/18/165807
15.Wei YY. Statistical P-values do not dominate scientific research.Europmc2019;53(5):441–444. https://europepmc.org/abstract/med/31091597
16.Indrayan A, Malhotra RK. Medical Biostatistics, Fourth Edition. CRC Press, 2018.
17.Sullivan LM, Weinberg J, Keaney JF Jr. Common statistical pitfalls in basic science research. J Am Heart Assoc. 2016;5(10):e004142. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5121512/
18.Harrington D, D’Agostino RB, Sr., Gatsonis C, Hogan JW, Hunter DJ, Normand ST, Drazen JM, M.D., Hamel MB.New guidelines for statistical reporting in the Journal (Editorial). N Engl J Med 2019; 381:285-286. DOI: 10.1056/NEJMe1906559.https://www.nejm.org/doi/full/10.1056/NEJMe1906559
19.Szucs D, Ioannidis JPA. When null hypothesis significance testing is unsuitable for research: A reassessment. Front Hum Neurosci. 2017;11:390. https://www.ncbi.nlm.nih.gov/pubmed/28824397
20.Lytsy P. P in the right place: Revisiting the evidential value of P-values. J Evid Based Med 2018;11:288–291.https://onlinelibrary.wiley.com/doi/full/10.1111/jebm.12319
21.Concato J, Hartigan JA. P values: From suggestion to superstition. J Investig Med2016;64:1166–1171.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5099183/
22.Manning JT, Anderson RH, Shutt M. Parental age gap skews child sex ratio. Nature 1997;389:344. https://www.nature.com/articles/38647
23.Drefahl S. How does the age gap between partners affect their survival? Demography 2010;47:313–326.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3000022/
24.Nahar VK. Using the multitheory model to predict initiation and sustenance of physical activity behavior among osteopathic medical students J Am Osteopath Assoc 2019;119:479–487.https://jaoa.org/article.aspx?articleid=2739371
25.Foster K, Younger N, Aiken W, Brady-West D, Delgoda R. Reliance on medicinal plant therapy among cancer patients in Jamaica. Cancer Causes & Control 2017;28:1349–1356. https://link.springer.com/article/10.1007%2Fs10552-017-0924-9
26.Plebani M. Errors in clinical laboratories or errors in laboratory medicine?Clin Chem and Lab Med 2006;44:750–759.https://www.degruyter.com/view/j/cclm.2006.44.issue-6/cclm.2006.123/cclm.2006.123.xml
27.Indrayan A, Holt MP. Concise Encyclopedia of Biostatistics for Medical Professionals. CRC Press, 2016.
28.Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD. The extent and consequences of p-hacking in science. PLoS Biol2015;13:e1002106. doi: 10.1371/journal.pbio.1002106. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4359000/
29.Jacobson SW, Jacobson JL. The risk of low-to-moderate prenatal alcohol exposure on child academic underachievement and behaviour may be difficult to measure and should not be underestimated. Evid Based Med 2014;19:e7. doi:10.1136/eb-2013-101535. https://ebm.bmj.com/content/19/2/e7.long
30.Patel CJ, Ji J, Sundquist J, Ioannidis JPA, Sundquist K. Systematic assessment of pharmaceutical prescriptions in association with cancer risk: A method to conduct a population-wide medication-wide longitudinal study. Scientific Reports 2016;6: 31308.https://www.nature.com/articles/srep31308
31.Padovani F, Richardson A, Tsou JY (Editors). Objectivity in Science: New Perspective from Science and Technological Studies, Springer, 2015.
32.Benjamin DJ. Redefine statistical significance. Nature Human Behaviour 2017;2: 6–10.https://www.nature.com/articles/s41562-017-0189-z
33.Victor A, Elsässer A, Hommel G, Blettner M. Judging a plethora of p-values: How to contend with the problem of multiple testing--part 10 of a series on evaluation of scientific publications. DtschArztebl Int 2010;107(4):50–56. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2822959/
34.Feise RJ. Do multiple outcome measures require p-value adjustment? BMC Med Res Methodol 2002;2:8.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC117123/
35.Vega JF, Strnad GJ, BenaJ, Spindler KP. Predicting the need for surgical intervention prior to first encounter for individuals with knee complaints: A novel approach. Orthop J Sports Med 2019;7(7):2325967119859485. Published 2019 Jul 25. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6659191/
.36.Koratala A, Dass B, Alquadan KF, Sharma S, Singhania G, Ejaz AA. Static pressures, intra-access blood flow and dynamic Kt/V profiles in the prediction of dialysis access function. World J Nephrol 2019;8(3):59-66. https://www.wjgnet.com/2220-6124/full/v8/i3/59.htm
37.Alves G, Yu YK. Accuracy evaluation of the unified P-value from combining correlated P-values. PLoS One. 2014;9(3):e91225.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3963868/
38.Editorial. Significant debate: Looking beyond statistical significance would make science harder, but might help to avoid false positives, overhyped claims and overlooked effects. Nature 2019;567:283. https://www.nature.com/magazine-assets/d41586-019-00874-8/d41586-019-00874-8.pdf
39.Blume JD, Greevy RA, Welty VF, Smith JR, Dupont WD (2019) An introduction to second-generation p-values, Am Stat 2019;73(Sup1):157-167. DOI: 10.1080/00031305.2018.1537893. https://www.tandfonline.com/doi/full/10.1080/00031305.2018.1537893
40.Lakens D, Scheel AM, Isager PM. Equivalence testing for psychological research: A tutorial. Advances in Methods and Practices in Psychological Science 2018;1: 259–269. https://doi.org/10.1177/2515245918770963.https://journals.sagepub.com/doi/pdf/10.1177/2515245918770963
41.Duffy B. The Perils of Perception: Why we are Wrong About Nearly Everything Atlantic Books, 2018.
42.Kahan DM, Wittlin M, Peters E et al. The Tragedy of the Risk-Perception Commons: Culture Conflict, Rationality Conflict, and Climate Change. Temple University Legal Studies Research Paper No. 2011-26; Cultural Cognition Project Working Paper No. 89; Yale Law & Economics Research Paper No. 435; Yale Law School, Public Law Working Paper No. 230. Disponibileall’indirizzo: https://ssrn.com/abstract=1871503
Files
IssueVol 5 No 4 (2019) QRcode
SectionMethodology
DOI https://doi.org/10.18502/jbe.v5i4.3862
Keywords
Empirical studies; P-values ; Sampling fluctuation; Type-I error

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
How to Cite
1.
Indrayan A. The Conundrum of P-Values: Statistical Significance is Unavoidable but Need Medical Significance Too. JBE. 2020;5(4):259-267.