Background
Stopping rules are a common feature of clinical trials, implemented primarily for ethical and safety reasons. They ensure that a study does not continue when an intervention is clearly better (stopping for benefit), clearly worse (stopping for futility), or poses unacceptable risk (stopping for safety).
Although stopping rules are an ethical imperative, they can affect the interpretation of treatment effects. Trials terminated early, known as truncated trials, because of a beneficial treatment effect (stopped for benefit), are particularly susceptible to early stopping bias. Such trials risk exaggerating the treatment effect, especially when the total number of events is small.
The consequences of early stopping for benefit extend beyond statistical distortion. Compelling early findings can trigger a cascade of effects, including media attention, publication in high-impact journals, and rapid translation into clinical practice and guidelines. While additional evidence may weaken or even contradict these early findings, there may be reluctance to reverse recommendations in the face of new data, and the gathering of such data in further trials may be considered unethical.
Early stopping for futility, in contrast, aims to prevent participants from receiving an ineffective intervention, while conserving time and resources. However, stopping for this reason may underestimate the treatment effect, particularly when interim results are imprecise or not representative of future data, and may lead to inappropriately stopping a study for an intervention with a modest treatment effect.
Early stopping for safety is often considered the most ethically straightforward reason to terminate a trial, as patient safety and protection must supersede statistical considerations. Nevertheless, early stopping for participant safety may still introduce bias and overestimate the risk of harm or adverse events when event counts are low.
Finally, some trials stop early for reasons unrelated to the study’s own findings, such as loss of funding, slow recruitment, difficulties accessing the intervention, or the emergence of external evidence that makes the research question less relevant. However, when a trial terminates for operational or feasibility reasons, bias is unlikely to be introduced by stopping, and so will not be further considered in this article.
Example
Stopping for benefit
β-blocker bisoprolol
Poldermans et al. (1999) is an example of a trial stopped early for benefit. The trial evaluated the β-blocker bisoprolol for patients with vascular disease undergoing non-cardiac surgery. After early trial termination, guideline recommendations supported the use of bisoprolol in the perioperative setting. However, later studies and pooled analyses revealed that perioperative bisoprolol in non-cardiac surgery was associated with an increased risk of disabling strokes, casting doubt on the reliability of the early-stopped trial. Despite this contradictory evidence, perioperative β-blockers continued to be used for a time, possibly reflecting a reluctance to overturn established practice and commercial influences.
rhAPC
In 2001, a trial of rhAPC (recombinant human activated protein C) in severe sepsis was stopped early after interim analyses suggested reduced mortality. A subsequent trial raised concerns about increased bleeding risk and failed to confirm the mortality benefit. Despite this, the drug continued to be recommended for several years before being withdrawn from the market in 2010–11, illustrating the long-term impact of early-stopped evidence on practice.
Stopping for safety/harm
The Women’s Health Initiative (WHI) trial evaluated the long-term effects of combined estrogen and progestin therapy in postmenopausal women (known as hormone replacement therapy, HRT). An interim safety review found that women in the treatment group had a higher risk of breast cancer and no cardiovascular benefit. The Data and Safety Monitoring Board recommended stopping the trial early in 2002 to protect participants, leading to a rapid global decline in HRT use. Later reviews showed that the WHI findings were influenced by factors such as participants beginning hormone therapy more than 10 years after menopause, and by the specific HRT formulation used. This research has demonstrated that benefits and risks vary by timing of therapy initiation and by type of HRT, with some regimens showing more favourable safety profiles. On the balance of evidence, HRT can benefit specific patient groups. This illustrates how early stopping for harm, though ethically necessary in trial settings, can contribute to uncertainty or oversimplified interpretations of treatment effects.
Stopping for futility
The GUIDE-IT trial tested whether using a blood marker (NT-proBNP) to guide treatment for people with heart failure could improve outcomes. The study was stopped early because an interim analysis suggested the approach was unlikely to show benefit, as both groups had similar rates of hospitalisation or cardiovascular death. Although stopping was reasonable, ending the trial before it reached its planned size risked underestimating the intervention’s true effectiveness. A later meta-analysis, which included the GUIDE-IT trial, did find evidence of benefit, although this was not seen in sensitivity analyses restricted to low-risk-of-bias studies.
Impact
It is important to be aware of and account for early stopping bias when investigating the effect of interventions. Early stopping for benefit is likely to overestimate the treatment effect; stopping for futility or patient safety may underestimate it. Stopping for reasons unrelated to the study’s own findings is unlikely to introduce bias.
Bassler et al.’s systematic review and meta-analysis of stopping trials early for benefit concluded that RCTs stopped early for benefit overestimate treatment effects. This overestimation was most common when only a small number of outcome events had occurred, especially fewer than 200. Stopping rules for benefit should be very strict in the magnitude of evidence and plausibility, e.g. before 500 events accumulated. Another review found that trials stopped for benefit had a median of 66 accrued events prior to stopping, with smaller event numbers yielding the largest treatment effects.
Stopping for futility may underestimate treatment effects. A review by Walter concluded that stopping early was probably reasonable when the interim results are likely to reflect future trends. Statistical analyses, such as conditional power, should be calculated, and assumptions about future data should be specified.
Stopping for safety in studies may lead to underestimation of treatment benefit or overestimation of risk. As with all early stopping, reporting of methods and rationale for stopping should be clear. Trials stopped early for harms need to be interpreted cautiously, and related evidence should be taken into account when evaluating results.
Preventive steps
To prevent or limit the impact of early stopping bias, whilst upholding the ethical and safety reasons to end a study early, the following steps can be considered:
Pre-specify stopping boundaries, particularly when stopping for benefit, requiring a large number of accumulated events before considering early stopping.
When deciding whether to stop for futility, use conditional power analyses to estimate the probability of obtaining a statistically significant result if the trial continues to completion, noting that these analyses should be based on realistic assumptions about future data.
Report full details of interim analyses, stopping boundaries, and decision-making processes, including whether an independent data committee was involved in the decision and whether interim analyses were planned or ad hoc, to allow critical appraisal of the trial. See Item 23b of the 2025 CONSORT reporting guidelines.
Promote caution in guideline development when evidence is based on truncated trials, and consider the need to confirm results in subsequent trials or pooled reviews.