Predicting Rising Risk in Crohn’s Disease Before the Spending Hits

Large language models from OpenAI and Anthropic are accelerating how teams summarize literature, draft protocols, and brainstorm product ideas. They are truly helpful for many aspects of healthcare work. However, they are not, on their [read more...]

Large language models from OpenAI and Anthropic are accelerating how teams summarize literature, draft protocols, and brainstorm product ideas. They are truly helpful for many aspects of healthcare work. However, they are not, on their own, a complete solution for developing real-world clinical decision support.

The reason is simple: a chat model can produce plausible guidance, but it does not automatically become a validated, time-aware forecasting system that can process longitudinal claims, prevent data leakage, benchmark against existing methods, and provide calibrated risk signals month after month with auditability. Converting real-world data into actionable lead time requires disciplined data engineering, careful label design, prospective validation, and workflow integration. That is the gap this study aims to address.

Our new paper in BMJ Health & Care Informatics describes an explainable, claims-only machine learning framework that predicts which Crohn’s disease (CD) patients are likely to enter a near-term high-spending phase. This is Decode Health’s second peer-reviewed publication validating the platform, following our work in Communications Medicine last year: the same core platform philosophy applied to a new clinical domain.

Why This Matters Now

Crohn’s disease is a highly variable condition. Patients can move quickly from stability to acute episodes, and the resulting costs are often driven by hospital stays and other unplanned care.

In our analysis of a ~267,000-member commercial population, only 994 members (about 0.4%) had Crohn’s disease, yet they accounted for roughly $108 million (about 2.7%) of the total $4 billion in expenses. On a per-person basis, average spending was approximately $3,127 per member per month for Crohn’s members, compared to about $665 for non-Crohn’s members, with notably higher month-to-month variability within the Crohn’s cohort.

That variability matters because many risk programs still depend on trailing spend rules. These methods effectively identify people who have spent a lot but are less successful at spotting those whose risk is rising.

From a life sciences perspective, this also matters because the economic burden and clinical decline are increasingly linked to adoption, reimbursement, and long-term commercial success. Biomarkers can explain biology and variability. But when a payer or health system asks “who should we focus on next month,” “who is likely to deteriorate,” or “where is the preventable utilization,” biology alone does not answer the operational question.

Biomarkers can explain biology and variability. But when a payer or health system asks “who should we focus on next month,” “who is likely to deteriorate,” or “where is the preventable utilization,” biology alone does not answer the operational question.

That’s where complementary predictive analytics can help translate biology into an effective deployment strategy.

What We Studied

Using de-identified commercial medical and pharmacy claims (2016–2018), we created a member-month dataset that included spending and utilization summaries, demographics, binary flags for diagnosis, procedure, and drug codes, and timing features such as days since the last inpatient, outpatient, or pharmacy encounter.

For each month, the prediction target was the member’s total paid amount over the next four months. We trained and tuned models across seven algorithm families and evaluated them prospectively using a time-ordered split with a buffer period to minimize leakage.

To keep the results grounded in real-world operations, we compared machine learning to two common benchmarks: rolling four-month historical spend and prior-month spend. Each method highlighted the top 25% of members predicted to have the highest future spending.

Headline Results in Crohn’s Disease

Across the prospective evaluation period, the machine learning approach:

Captured about 80% of the dollars spent by the true top-quartile Crohn’s spenders over the next four months, compared to about 67% for the rolling four-month spend baseline and around 62% for the prior-month baseline.
Identified more “rising-risk” entrants: an average of 51 new high-cost entrants each month, nearly double the yield of the rolling four-month historical method.
Better anticipated inpatient-driven cost spikes, aligning more closely with the utilization patterns of the true rising-risk group. This is important because it suggests earlier detection of impending acute episodes, rather than simply relabeling people on expensive therapies.

In plain terms: the model didn’t just re-identify yesterday’s high-cost members. It also surfaced members whose risk was changing.

In plain terms: the model didn’t just re-identify yesterday’s high-cost members. It also surfaced members whose risk was changing.

Why Claims-Only Signals Are Still Powerful

Claims data are often underestimated because they are not “deep clinical.” However, for operational forecasting, claims offer three key advantages: longitudinal continuity across care sites, standardized coding (diagnoses, procedures, pharmacy), and direct linkage to utilization and cost, which often influences whether an intervention is adopted at scale.

In this study, permutation-based feature impact revealed that biologic therapy signals were important, including HCPCS codes for monoclonal antibodies. However, medication alone did not predict who was likely to become high-cost. High-impact features also included infusion-related procedures, ileostomy status, high-complexity evaluation and management codes, and comorbidity indicators. The model’s strength came from combining therapy patterns with the broader clinical context.

This pattern can be conceptually described by a general-purpose language model, but it cannot be reliably generated without underlying longitudinal data, a time-aware evaluation design, and a performance benchmark against existing rules.

There is also the issue of explainability. When a model flags a patient as high risk, clinical and operational teams need to understand why. Permutation-based feature impact and code-level drivers provide reviewers with a transparent basis for action. That kind of auditability is not a feature of large language models, and in healthcare decision-making, it is essential.

Two Complementary Programs at Decode Health

Decode Health operates in two linked domains.

Biomarker and molecular signature discovery. We use multi-omics and other data types to identify signatures that explain disease biology, response, and patient heterogeneity.

Clinical decision support and predictive risk modeling. We use real-world data, including claims and EHRs, to forecast risk, utilization, and disease progression, enabling teams to act earlier.

These are designed to reinforce each other.

Biomarkers can identify “who is biologically distinct” and “who is likely to respond.” Claims-based forecasting can determine “who is approaching a high-cost, high-acuity phase,” “when it will likely happen,” and “what utilization pattern is driving it.” Together, they create a more complete translation layer from discovery to real-world action.

Biomarkers can identify “who is biologically distinct” and “who is likely to respond.” Claims-based forecasting can determine “who is approaching a high-cost, high-acuity phase,” “when it will likely happen,” and “what utilization pattern is driving it.” Together, they create a more complete translation layer from discovery to real-world action.

That complement is crucial for med device and diagnostics teams because success is increasingly defined not just by analytical performance, but by measurable downstream impact. A diagnostic may stratify risk or reveal subtypes, but adoption depends on its use in the right patient at the right time and on demonstrating that it changes the trajectory of care. A device may reduce exacerbations or enable monitoring, but value-based stakeholders want evidence of avoided utilization, not just a plausible mechanism.

Forecasting imminent rising risk offers a practical way to guide deployment, generate real-world evidence, and link clinical benefits to outcomes that health systems and payers monitor daily.

What This Platform Enables in the Real World

Longitudinal rising-risk stratification. We convert raw claims into a member-month view of risk and identify members likely to enter the highest-spending segment in the near term.

Explainable drivers at both the code and encounter levels. Instead of a black-box score, the system can reveal the patterns that contribute to risk, aiding clinical review and workflow alignment.

A deployment-ready monthly workflow. A practical implementation model looks like this: monthly claims-based risk scoring flags members at imminent risk; clinical and care teams review the drivers behind each prediction; and teams choose targeted actions such as proactive outreach, telemonitoring, therapy optimization, or scheduling elective care earlier to reduce emergency admissions.

This is also where complementary partnerships can become concrete. If a diagnostic or device is designed to prevent escalation, you need to identify where escalation is likely to happen and which members should be prioritized first. A rising-risk signal provides that operational starting point.

If a diagnostic or device is designed to prevent escalation, you need to identify where escalation is likely to happen and which members should be prioritized first. A rising-risk signal provides that operational starting point.

Why This Is Designed to Generalize

Because the framework relies exclusively on standard claim fields, it is designed to adapt quickly to other episodic, high-variance conditions in which sudden deterioration drives utilization.

The paper also emphasizes practical deployment factors that are crucial for an approach to succeed in operational settings: claims processing delays, refresh schedules, workflow integration, clinician acceptance, alert fatigue, and ongoing monitoring for drift and bias.

Closing Thought

The healthcare AI conversation is expanding fast. Generative AI will continue to be important for communication and productivity. But when the goal is early warning, risk stratification, and measurable impact, the work still depends on time-sensitive predictive models linked to real-world data and real operational choices.

This study is one example of what that looks like in practice.