Transforming evidence generation: How predictive AI can optimize clinical development

6 min

By Jackie Vanderpuye-Orgle, Ph.D., Vice President, and Global Head of Advanced Analytics, RWE & HEOR

Mike D’Ambrosio, Senior Vice President and Global Head, Real World Research

Throughout the product development and post-approval lifecycle, manufacturers are inundated with vast amounts of real-world data (RWD), with a promise of improved timelines and more effective ways to address the needs of regulatory and commercial stakeholders. Indeed, RWD can be leveraged to accelerate clinical development and support a wide variety of commercial objectives — but only if researchers have meaningful ways to harness the potential power of massive and often unstructured datasets.

In an increasingly complex and competitive environment, artificial intelligence (AI) solutions can streamline and enhance the use of RWD to generate real-word evidence (RWE), helping to fill evidence gaps and address research questions faster, which ultimately provides patients access to much needed therapies.

Defining AI

AI is defined as the simulation of human intelligence processes by computer systems which are grounded in available data. While generative AI (a form of AI that can create original text, images, and other content) has gained prominence in past couple of years, AI encompasses several areas, with some overlap in subsets.

Through machine learning (ML), computers learn from data and continually improve their performance as they identify patterns and make predictions. In deep learning, a subset of machine learning, artificial neural networks solve complex problems and discover intricate patterns in large datasets. Computer vision, which enhances a machine’s ability to interpret and understand images or videos, can tap into machine learning or deep learning. The same is true of natural language processing (NLP), which allows computers to interpret text and perform speech recognition, sentiment analysis, and text summarization. Large language models (LLMs) are trained on massive datasets of text and codes to learn patterns in human language and make predictions.

Using AI to streamline RWE generation

RWD are collected along the patient journey, from a variety of sources other than traditional clinical trials. Examples include electronic health records (EHRs), health insurance claims, and safety databases. These rich data sources can be leveraged to obtain patient insights to support product development and market access. Specifically, RWE derived from the analysis of RWD can be deployed across the product lifecycle to articulate the differentiated value of therapeutic assets and improve decision-making. Among other use cases, sponsors and their partners can deploy RWE to:

Map disease landscapes and identify unmet needs
Inform biomarker identification
Model patient characteristics and current treatments
Assess site feasibility and streamline study recruitment
Build external control arms
Enhance safety evaluations
Demonstrate product value

AI can be applied to evidence generation and analysis throughout the product lifecycle

The volume and nature of RWD as well as the complexity of its analysis make RWE generation a fruitful area for the use of AI solutions. This opens up the types of research questions that can be answered with RWD in a robust manner and improved precision. For example, AI can better help researchers:

Optimize clinical study design
Conduct evidence-based site identification
Identify meaningful patient subgroups
Derive patient phenotyping algorithms
Predict optimal treatments and clinical outcomes
Generate synthetic data for external control arms

Addressing regulatory concerns

Given the rapidly advancing AI landscape, there are new and growing opportunities for collaboration among AI experts, health care professionals, clinical researchers, and other industry stakeholders — cooperative efforts that can lead to improved outcomes for patients. But the emerging nature of AI means that regulatory responses are also evolving.

In considering AI, regulators from both the EMA and FDA have expressed concerns with opacity in some algorithms, potential for bias and error, and risks to patients. If sponsors plan to use AI for any part of development subject to regulatory review, we recommend seeking regulatory advice early in design process — engagement that will likely make the subsequent AI journey smoother for all stakeholders.

Regulators are also signaling that AI-system developers will need to provide step-by-step documentation of how such systems are built, deployed, and monitored while in use. We recommend that sponsors and their software partners design AI systems with regulatory compliance in mind. But we also recognize that regulatory frameworks are relatively immature and that the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH) is yet to issue formal guidance on the use of AI. So, while we urge sponsors to be mindful of regulatory signals, we also believe we should not allow the limited regulatory guidance to become a rate-limiting step in the use of AI technologies, recognizing the opportunity that these tools present to better capture, analyze, and understand the dynamic nature of real-world health care data.

Increasing transparency

Modern ML and LLM systems are trained on massive amounts of data from public and proprietary sources. The applications of algorithms to large numbers of trainable parameters involve model architectures that are not always transparent.
While explainability is difficult to fully establish, developers are finding ways to address the AI black box. For instance, Parexel produced promising results using machine-learned Bayesian networks, a form of predictive modeling that uses networks of data to reduce uncertainty and increase predictive power. The output of the Bayesian AI model (which is a directed acyclic graph) visually illustrates the interdependencies between the variables of interest. This allows stakeholders to identify which variables are prognostic or informative to the efficacy and safety outcomes in the clinical study. And this is only one approach in a suite of methods to make AI more transparent.

Controlling for confounding and reducing bias

One critique of RWE is the risk of confounding or bias due to unobserved patient data. Various statistical approaches have been developed to help address this. However, these may be challenging to implement, especially in complex scenarios with dynamic objectives, such as biases introduced over time. Recent advancements in methods have focused on addressing underlying causal inference assumptions. For instance, Targeted Maximum Likelihood Estimation (TMLE) allows researchers to use an ensemble machine learning model, with minimal assumptions about the data distribution. It applies an innovative targeting step and allows users to optimize the bias-variance tradeoff for the target of interest, meaning that they can minimize the risk of error even as model parameters and sample sets increase. TMLE allows researchers to focus on determining the improved outcome in the treated population — particularly important in the high-dimensional space, where traditional methods struggle with big data.

For example, without properly addressing complex confounding, vaccine studies using RWD often yield highly biased (and even paradoxical) results. In a recent study of pneumococcal vaccine effectiveness using de-identified EHR records from approximately 300,000 patients, we demonstrated the effectiveness of the causal roadmap with TMLE. The key challenge was accounting for the context of care and the data generation process that influences the “types of patients likely to receive” a pneumonia vaccine. Additionally, it was necessary to address the dynamic context and "intercurrent events." The results successfully recovered the true protective effect, aligning with estimates from randomized controlled trials, the gold standard for causal attribution. Moreover, these methods extend beyond simple point effect estimation, allowing for the identification of optimal strategies and subgroup analyses.

Optimizing clinical development for the good of all stakeholders

With the potential to transform evidence generation, AI can support optimized study design, execution, and data analysis across the clinical development and market access spectrum, enabling real-time scenario testing and nimble decision making to arrive at patient insights faster.

When applied to RWD, AI allows researchers to more confidently identify causal relationships. It enables us to do more with the data by mining a richer combination of data from different source to capture high numbers of interacting variables — something that is not always possible with traditional statistical modeling.

In short, RWE helps researchers better understand patient experiences and outcomes — and AI can help us generate that evidence faster. When used as part of a well-considered study design, AI empowers us to make the most of RWD. That benefits all stakeholders, including sponsors working to streamline clinical development and the patients waiting for critical in-development treatments.

Please get in touch. We are always available for a conversation to discuss applications of AI to enhance your RWE generation.

Return to Insights Center

Related Insights

Blog

Integrated evidence planning and RWE: Realizing value throughout the development lifecycle

May 13, 2025

Article

Parexel principles for artificial intelligence (AI)

Mar 6, 2024

Article

Guide for Real-World Evidence

May 21, 2021

Article

Lessons from China and the United States on the use of RWE in regulatory submissions

Jul 19, 2021

Article

New FDA Guidance Addresses the Need for Data-Generation Strategies Across the Drug Development Lifecycle

May 10, 2022

Blog

Maintaining Data Integrity for Quality and Compliance – Essential Despite Pandemic Disruptions

May 16, 2022

Playbook

Are you using real-world evidence?

Feb 1, 2023

Video

The science and practice of ethnobridging

May 17, 2023

Webinar

Adaptive strategies for more efficient, data-rich and patient-friendly trials

May 28, 2023

Webinar

Assessing appropriate use of ECAs in clinical trials

May 28, 2023

Blog

A hybrid model supports globally diverse site participation for a retrospective cancer study

Jul 24, 2023

Blog

Studying rare cancer patient populations using integrated genomic and real-world data

Aug 30, 2023

Related Insights

Blog

Integrated evidence planning and RWE: Realizing value throughout the development lifecycle

May 13, 2025

Article

Parexel principles for artificial intelligence (AI)

Mar 6, 2024

Article

Guide for Real-World Evidence

May 21, 2021

Article

Lessons from China and the United States on the use of RWE in regulatory submissions

Jul 19, 2021

Article

New FDA Guidance Addresses the Need for Data-Generation Strategies Across the Drug Development Lifecycle

May 10, 2022

Blog

Maintaining Data Integrity for Quality and Compliance – Essential Despite Pandemic Disruptions

May 16, 2022

Playbook

Are you using real-world evidence?

Feb 1, 2023

Video

The science and practice of ethnobridging

May 17, 2023

Webinar

Adaptive strategies for more efficient, data-rich and patient-friendly trials

May 28, 2023

Webinar

Assessing appropriate use of ECAs in clinical trials

May 28, 2023

Blog

A hybrid model supports globally diverse site participation for a retrospective cancer study

Jul 24, 2023

Blog

Studying rare cancer patient populations using integrated genomic and real-world data

Aug 30, 2023

Solutions

Biotech

Clinical Trialblazers Podcast

Therapeutic Expertise

Insights

Participate

Sites

About

Transforming evidence generation: How predictive AI can optimize clinical development

Defining AI

Using AI to streamline RWE generation

AI can be applied to evidence generation and analysis throughout the product lifecycle

Addressing regulatory concerns

Increasing transparency

Controlling for confounding and reducing bias

Optimizing clinical development for the good of all stakeholders