# Wrong sample size bias

When the wrong sample size is used in a study: small sample sizes often lead to chance findings, while large sample sizes are often statistically significant but not clinically relevant.

When the wrong sample size is used in a study: small sample sizes often lead to chance findings, while large sample sizes are often statistically significant but not clinically relevant.

Studies of human health use samples to obtain information on the whole relevant population and to represent the population of interest accurately. When small sample size is used, the risk is high that observations will be due to chance, something studies with larger sample sizes avoid. However, while larger studies can detect tiny or small associations, they might not be important or relevant to improving human health.

Statistical indicators, including p-values and confidence intervals, are used to help determine how certain we are that the results observed did not arise by chance. So, huge sample sizes can often show a large number of statistically significant results, but these may not be important, if the effect size is small, or if the relationship is not clinically relevant to health.

Large study sizes are much better than small, but appropriate caution is needed in interpreting the results of both, large sample size can magnify any bias that is present.

In Ioannidis paper on ‘Why Most Published Research Findings Are False,’ he states that ‘the smaller the studies conducted in a scientific field, the less likely the research findings are to be true.’ As an example, in placebo-controlled trials of second-line antirheumatic drugs, sample size bias demonstrated the effect decreased with increasing sample size.

Large sample sizes can also prove to be wrong. In the 1936 US election, the largest public opinion poll in US history amongst 2.4 million respondents got it completely wrong. The poll results said Landon would win by a landslide. However, Roosevelt won 46 of the 48 states. The survey was conducted by the Literary Digest who surveyed their readers who supported Landon. This was a result of ascertainment bias, but the point is, don’t be fooled by the size of the sample.

A comparison of the estimated benefits of treatment between large trials (at least 100 patients per arm) and small trials found that on average, treatment effects were greater in small than in large trials. In neuroscience study sizes tend to be typically small, giving low power to detect associations and therefore the likelihood that a finding is true is small.

To prevent wrong sample size bias, when designing studies, it is essential that statistical advice is sought. When interpreting results, do not be persuaded purely by large numbers and small confidence intervals: consider rational explanations for the observed findings, and the relevance of the effect size observed. Be cautious in using p-values to support or disprove hypotheses, especially when a large number of statistical tests to provide the p-values have been done.

Very large observational studies, such as the Million Women Study, instead of using 95% confidence intervals to indicate confidence in the estimate, sometimes use 99% confidence intervals.

Another area in which more stringent levels for statistical significance are set is in genome-wide association studies, which deal with vast amounts of data and examine large numbers of associations. For the gene discovery work in these studies, it is more critical to avoid false-positives than to miss some true associations; therefore, it is has been the convention for these to use a much more stringent statistical significance level.

Armstrong ME et al. Million Women Study Collaborators. Relationship of Height to Site-Specific Fracture Risk in Postmenopausal Women. J Bone Miner Res. 2016 Apr;31(4):725-31.

Barsh GSet al. Guidelines for genome-wide association studies. PLoS Genet. 2012 Jul;8(7):e1002812.

Bartolucci AA et al. Meta-analysis of multiple primary prevention trials of cardiovascular events using aspirin. Am J Cardiol. 2011;107(12):1796–801

Button KS et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013 May;14(5):365-76.

Gøtzsche PC et al. Meta-analysis of second-line antirheumatic drugs: sample size bias and uncertain benefit. J Clin Epidemiol. 1992 Jun;45(6):587-94.

Ioannidis JP. Why most published research findings are false. PLoS Med. 2005 Aug;2(8):e124.

Nüesch E et al. Small study effects in meta-analyses of osteoarthritis trials: meta-epidemiological study. BMJ 2010;341:c3515

Porta M et al. A dictionary of epidemiology. 6th edition. New York: Oxford University Press: 2014

Sackett DL. Bias in analytic research. J Chron Dis 1979; 32: 51-63

Wellcome Trust Case Control Consortium: Burton PR, Clayton DG et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls”. Nature 2007. 447 (7145): 661–78

These sources are retrieved dynamically from PubMed

- Habermehl C, Benner A, Kopp-Schneider A. Addressing small sample size bias in multiple-biomarker trials: Inclusion of biomarker-negative patients and Firth correction.
- Gøtzsche PC, Pødenphant J, Olesen M, Halberg P. Meta-analysis of second-line antirheumatic drugs: sample size bias and uncertain benefit.