Background
Studies of human health use samples to obtain information on the whole relevant population and to represent the population of interest accurately. When small sample size is used, the risk is high that observations will be due to chance, something studies with larger sample sizes avoid. However, while larger studies can detect tiny or small associations, they might not be important or relevant to improving human health.
Statistical indicators, including p-values and confidence intervals, are used to help determine how certain we are that the results observed did not arise by chance. So, huge sample sizes can often show a large number of statistically significant results, but these may not be important, if the effect size is small, or if the relationship is not clinically relevant to health.
Large study sizes are much better than small, but appropriate caution is needed in interpreting the results of both, large sample size can magnify any bias that is present.
Example
In Ioannidis paper on ‘Why Most Published Research Findings Are False,’ he states that ‘the smaller the studies conducted in a scientific field, the less likely the research findings are to be true.’ As an example, in placebo-controlled trials of second-line antirheumatic drugs, sample size bias demonstrated the effect decreased with increasing sample size.
Large sample sizes can also prove to be wrong. In the 1936 US election, the largest public opinion poll in US history amongst 2.4 million respondents got it completely wrong. The poll results said Landon would win by a landslide. However, Roosevelt won 46 of the 48 states. The survey was conducted by the Literary Digest who surveyed their readers who supported Landon. This was a result of ascertainment bias, but the point is, don’t be fooled by the size of the sample.
Impact
A comparison of the estimated benefits of treatment between large trials (at least 100 patients per arm) and small trials found that on average, treatment effects were greater in small than in large trials. In neuroscience study sizes tend to be typically small, giving low power to detect associations and therefore the likelihood that a finding is true is small.
Preventive steps
To prevent wrong sample size bias, when designing studies, it is essential that statistical advice is sought. When interpreting results, do not be persuaded purely by large numbers and small confidence intervals: consider rational explanations for the observed findings, and the relevance of the effect size observed. Be cautious in using p-values to support or disprove hypotheses, especially when a large number of statistical tests to provide the p-values have been done.
Very large observational studies, such as the Million Women Study, instead of using 95% confidence intervals to indicate confidence in the estimate, sometimes use 99% confidence intervals.
Another area in which more stringent levels for statistical significance are set is in genome-wide association studies, which deal with vast amounts of data and examine large numbers of associations. For the gene discovery work in these studies, it is more critical to avoid false-positives than to miss some true associations; therefore, it is has been the convention for these to use a much more stringent statistical significance level.