Data & methods

This site summarizes stopped clinical trials and the recorded stop reasons in the registry.

Loading dataset info…

Data sources

Primary source is ClinicalTrials.gov registry metadata as recorded by sponsors and investigators.

Likely scientific failure

A trial is flagged when the stated stop reason suggests the intervention did not work as intended (e.g., lack of efficacy or futility). This is inferred from registry text and may be incomplete. Verify using primary sources.

Reason buckets

Stop reasons are grouped into high-level buckets (e.g., efficacy/futility, safety, operational, enrollment, funding, regulatory, other/unknown) using rule-based parsing of the recorded reason text and structured fields where available.

Outliers calculations

The Outliers page highlights sponsors or disease areas that appear unusually often in a particular stop-reason bucket (for example: Safety in Phase II). All metrics are computed within a chosen cohort (scope × phase × bucket) and then compared to that cohort’s baseline rate.

Cohorts and counts

For a selected cohort, each group (sponsor or disease area) has:

  • n: total stopped trials in the cohort for that group
  • k: trials in the selected bucket (hits) for that group
  • Raw rate: k/n

Baseline

The baseline rate is computed over the same cohort across all groups:

p0 = K/N, where N is the cohort total trials and K is the cohort total bucket hits.

Shrunk rate and 90% CI

To avoid over-emphasizing small-sample groups, we use a simple Beta–Binomial shrinkage model. Each group’s bucket rate is treated as a probability p with a Beta prior Beta(a, b) (read from specialness_index.json; defaults to a=b=1). After observing k hits out of n trials:

Posterior: p | data ~ Beta(a + k, b + (n - k))

“Shrunk rate” shown in the table is the posterior mean:

posterior_mean = (a + k) / (a + b + n)

The displayed 90% CI is an approximation using the posterior standard deviation and a normal approximation:

sd = sqrt( (αβ) / ((α+β)^2 (α+β+1)) ) with α=a+k, β=b+(n-k), and a two-sided 90% z-value z≈1.645. Then:

CI90 ≈ [mean - z·sd, mean + z·sd] clipped to [0, 1].

P(>baseline)

We report an approximate probability that a group’s true rate exceeds the cohort baseline. Using the same normal approximation:

z = (posterior_mean - p0) / sd and P(>baseline) ≈ Φ(z), where Φ is the standard normal CDF.

Interpretation: values near 50% indicate “not distinguishable from baseline”; values near 100% indicate the group is very likely above the cohort baseline after shrinkage.

Lift

Lift is a ratio of the shrunk rate to the baseline:

lift = posterior_mean / p0 (shown as “×”). If p0 is zero (rare), lift is omitted.

Filters

“Min trials” and “Min bucket hits” suppress noisy rows by requiring n ≥ minTrials and k ≥ minHits before a group is eligible for ranking.

Limitations

  • Registry stop reasons can be incomplete or inconsistently reported.
  • Some trials stop for non-scientific reasons (enrollment, funding, strategic decisions).
  • Labels are probabilistic and should be verified against primary sources.