Human-in-the loop AI: a proven approach to streamlining PV case processing and review

AI-Graphic.jpgWith artificial intelligence (AI) reaching the mainstream, there is a great appetite for automation and transformation of existing processes using the new technologies. But these transformations are facing headwinds as people discover that it is not always possible to leap straight to full automation.

By taking a human-in-the-loop approach to many of our AI solutions, we can be more aggressive in our deployment of AI while maintaining safety. The seamless incorporation of both artificially intelligent systems and deep human expertise into a symbiotic platform allows us to leverage AI even in cases where full automation is not yet feasible, or sensitivity demands human accountability. It gives us a path from legacy approaches towards more and more automation without compromising on safety or quality, and in many cases improving those critical attributes of our offering. Here at Parexel, we have been using the human-in-the-loop paradigm to apply AI to pharmacovigilance for several years.

Defining human-in-the-loop AI 

Depending on whom you ask, human-in-the-loop AI can refer to any context in the training, development, or use of AI in which humans and AI systems or models interact. Within clinical development, we focus on how humans work collaboratively with AI in production systems. The alternative to this approach is total automation: an absence of human oversight and accountability that is often unacceptable and ineffective, particularly in a space carrying as much ethical responsibility and risk as clinical research.

Every machine learning (ML) model or artificially intelligent system that attempts to solve a problem of any complexity will produce uncertainty. In the case of the most sophisticated and rapidly growing approaches, most deep learning solutions are also somewhat opaque and behavioral guarantees are difficult to make. Consequently, to help AI systems succeed, we contextualize them in user-centered workflows that are augmented — but not wholly controlled — by AI. Whenever we deploy AI, the decision loop includes a human user who will review AI output, recognizing it as suggestion rather than fact. The ideal system empowers humans and AI to operate in a complementary way, augmenting each other’s strengths. This can be accomplished through:

  • Model and workflow design. Choosing what to model is often more important than exactly how models are created. Developers should make modeling choices based on appropriate user goals, creating a downstream workflow that allows humans to best complete their task in an AI-guided way.
  • Building ideal models. Because certain types of models are better suited to certain tasks, it’s important to understand the end system and user goals for which the model will be used. Developers can achieve this by balancing model characteristics, understanding and controlling weaknesses and strengths, and combining different techniques.
  • AI-first application design. Building AI into the application from the outset, this approach guides the creation of workflows that will help users leverage AI wisely, such as by linking predictions to evidence.
  • User training. As with any tool, user skill is a major determinant of outcomes. Even when solutions and models are designed to be as effective as possible, ideal outcomes also require domain users to collaborate well with AI in the context of their specific workflows.

AI models for PV activities

AI is at its best when users access it through systems designed with AI in mind. It is usually unproductive to embed AI in existing traditional systems without rethinking the end-to-end user experience. In creating PV software, Parexel’s developers considered interactive element design in the literature review platform, ensuring that the AI experience felt intuitive and easy to use. We also decided which types of information should be shown to users to best support decision making, as well as which models should be built and how those models should be presented to users to make them as intuitive and easily understandable as possible.

Building

To support processing and review, we built a variety of models relevant to PV workflows and deployed them in a fit-for-purpose workflow application. To prioritize the most critical cases, our AI models can assess the likelihood of the safety event in any citation, enabling the automatic ordering of full-text articles and initial AI translation of the most relevant articles. Because natural language processing (NLP) can understand which areas of an article are pertinent to a specific AE or product, the AI model can point human reviewers directly to applicable information, which dramatically reduces the cost of labor in time-intensive efforts.

Evaluating

AI developers need reliable ways to evaluate their models. We use a combination of historical examples, qualitative experiences, and standard quantitative machine learning metrics like precision, recall, and F1 score, to help ensure our AI systems are both accurate and comprehensive.

Every model is evaluated according to its nature and context. Models that must guarantee behavior under specific circumstances are tested against those situations and those guarantees are negotiated with the stakeholders in the solution. For example, a model used to precisely capture every mention of a brand or generic product name and in multiple different languages can be built using robust lexical approaches rather than machine learning, and then tested rigorously before each release. 

Models with more uncertain behavior are evaluated against gold test sets with standards based on process to ensure they meet certain thresholds. Acceptability of a model’s performance is highly contextual. For example, a model used for detecting adverse drug reactions may have a higher threshold for recall to ensure that all potential reactions are identified, while a model used for predicting MedDRA codes may prioritize precision to avoid false positives, meaning that the workflow for this model must include human intervention to address any missed codes.

Ultimately, the acceptability of a model’s performance is determined by its ability to meet the specific needs and requirements of stakeholders. Because there is no universal standard for accuracy, the choice of metrics and thresholds of acceptability are driven by consultation between AI experts and the subject matter experts (SME) with deep understanding of the project goals and downstream workflows.

Deploying and training

To deploy our solutions, Parexel develops situationoptimized tools for model serving and knowledge management. The process of rollout of models includes configuration, in which we customize the tool to meet the specific needs of the project and users, as well as user acceptance testing and user training. Because some users may be encountering AI systems for the first time, training includes establishment of clearly defined workflows and guidance on how to interpret and interact with model suggestions of different kinds throughout the product experience.

The impact of AI-first solutions

Through the combination of AI-first product experience and user training with new workflows, we operationalized an AI-powered PV literature case processing experience that supports dozens of customers. The solution vastly improves upon traditional methods and even more modern systems by embedding AI throughout the workflow: processing raw inbound documents for likelihood of a safety issue, priority-ranking AEs for review, and highlighting and extracting PV-relevant words and phrases. Using these AI-based literature solutions, we reduced median time to completion by more than 50 percent, increasing throughput by more than a factor of two across nearly 400,000 annual cases year over year — and those results continue to trend better. This translates into saving tens of thousands of person hours each year compared to legacy PV processing methods.

While pharmacovigilance was one of the first domains in which we deployed AI solutions, Parexel is applying the paradigm of custom human-in-the-loop AI experiences across many different verticals. With this strategy, we can deploy artificial intelligence in a transformative way even in contexts where full automation is infeasible and use technology to amplify and support the capabilities represented in Parexel’s exceptional human expertise that is so trusted in the industry. We also give ourselves the space in which to rapidly adopt more and more sophisticated solutions over time without disrupting the reliability of our services.

Return to Insights Center

Related Insights

Blog

Benedikt Egersdoerfer joins Parexel to lead Global Data Operations

Aug 28, 2024

Blog

Membership in Clinical Research Data Sharing Alliance enables Parexel to shape the future of clinical trial data

Sep 7, 2022

Blog

PBPK modeling solutions as a potential risk mitigation strategy for pH dependent DDIs

Mar 15, 2021

Blog

Putting AI to work in your safety program

May 21, 2021

Blog

RBQM is our future: Using holistic risk-based oversight, we can focus on a trial’s most critical factors

Jun 18, 2021

Article

Tencent-Backed Wedoctor Is Going Public: What Are the Next Steps for Chinese Digital Health Companies?

Jun 30, 2021

Article

Evidence Generation Strategy under Germany’s Digital Healthcare: Is More Better?

Aug 6, 2021

Blog

Highlights from ISMPP Asia Pacific Meeting 2021

Sep 20, 2021

Blog

Capitalizing on sensors in clinical trials

Sep 20, 2021

Blog

Actigraphy advances a patient-first approach: Reduce patient burden — and make the most of valuable data

Oct 15, 2021

Blog

Clinical pharmacology, modeling and simulation to support FIH study design

Oct 21, 2021

Blog

CNS Summit Recap: The Future is Collaborative

Nov 22, 2021

Related Insights

Blog

Benedikt Egersdoerfer joins Parexel to lead Global Data Operations

Aug 28, 2024

Blog

Membership in Clinical Research Data Sharing Alliance enables Parexel to shape the future of clinical trial data

Sep 7, 2022

Blog

PBPK modeling solutions as a potential risk mitigation strategy for pH dependent DDIs

Mar 15, 2021

Blog

Putting AI to work in your safety program

May 21, 2021

Blog

RBQM is our future: Using holistic risk-based oversight, we can focus on a trial’s most critical factors

Jun 18, 2021

Article

Tencent-Backed Wedoctor Is Going Public: What Are the Next Steps for Chinese Digital Health Companies?

Jun 30, 2021

Show more