IVDR for AI Diagnostics: Evidence, Performance, and Risk
Artificial intelligence systems designed for diagnostic purposes represent a paradigm shift in medical technology, moving from static tools to dynamic, learning algorithms. For manufacturers and deployers operating within the European Union, the regulatory framework governing these systems is the In Vitro Diagnostic Medical Device Regulation (IVDR) (EU) 2017/746. Unlike its predecessor, the IVD Directive, the IVDR introduces a risk-based classification system, stringent requirements for clinical evidence, and specific obligations for software, including Artificial Intelligence (AI) and Machine Learning (ML). This article provides a detailed analysis of how IVDR applies to AI diagnostic systems, focusing on the practical application of performance evaluation, evidence expectations, and the nuances of risk classification.
The Regulatory Landscape for AI as an In Vitro Diagnostic
The transition from the In Vitro Diagnostic Medical Device Directive (IVDD) to the IVDR fundamentally alters the regulatory pathway for AI diagnostics. Under the IVDD, many software-based diagnostics were self-certified under Annex II or III, often requiring only a “Notified Body” opinion for certain devices. The IVDR, fully applicable since May 26, 2022, mandates that the vast majority of in vitro diagnostic devices, including AI/ML software, undergo a conformity assessment by a Notified Body.
AI systems are generally classified as “Software intended to provide information which is used to make decisions with a view to diagnosing, preventing, monitoring, predicting, prognosing, curing or alleviating disease” (Rule 11, Annex VIII). This definition captures the core functionality of most AI diagnostic tools. Consequently, these systems are subject to the general safety and performance requirements outlined in Annex I, specifically the requirements regarding software verification and validation.
Scope and Applicability
It is crucial to determine whether an AI system qualifies as an In Vitro Diagnostic Medical Device (IVD). The system must be intended by the manufacturer for the examination of specimens derived from the human body. This includes software that processes data from IVDs (e.g., analyzing blood test results) or software that acts as a standalone IVD (e.g., analyzing medical images to detect pathology). If the AI is intended to monitor physiological processes, it falls under the IVDR. If it is intended for general wellness purposes, it may fall outside the scope, but the boundary is strictly defined by the manufacturer’s intended purpose.
Risk-Based Classification in Practice
The IVDR replaces the previous “List” system with a risk-based classification rule set found in Annex VIII. For AI diagnostics, Rule 11 is the primary determinant, but it must be read in conjunction with other rules if the device is intended for self-testing or near-patient testing.
Rule 11 and Software
Rule 11 classifies software intended to provide information for diagnosing, preventing, monitoring, or predicting disease progression. The classification depends on the decisions the software enables:
- Class C: Software that provides information used to make decisions regarding diagnosis or therapy that could lead to an immediate life-threatening or long-term detrimental health outcome for the patient. This is the most common classification for sophisticated AI diagnostics (e.g., AI analyzing histopathology slides for cancer diagnosis).
- Class D: Software that provides information used to make decisions regarding diagnosis or therapy that leads to a life-threatening or long-term detrimental health outcome, and the device is intended to determine the physiological, pathological, or immunological status of a person in relation to a transfusion or transfusion compatibility, or to determine the infectious disease status of a person in relation to a transfusion or transfusion compatibility. This is a high bar, typically reserved for blood grouping or infectious disease screening AI.
The “Self-Testing” Nuance
If an AI diagnostic is intended for self-testing (layperson use), the classification escalates. For example, an AI analyzing a user-uploaded photo of a skin lesion to assess cancer risk is a Class C self-testing device. If that same AI were used by a dermatologist (professional use), it might still be Class C, but the regulatory scrutiny regarding user instructions and safety is significantly higher for self-testing.
Practical Classification Challenges
Manufacturers often struggle with the “intended purpose.” A common pitfall is describing an AI as a “clinical decision support system” (CDSS) to avoid classification. However, if the CDSS provides specific diagnostic recommendations (e.g., “Probability of malignancy: 95%”), it is an IVD. If it merely presents data for the clinician to interpret (e.g., “Patient age: 45, Marker X: High”), it might be general software, but the line is thin. The IVDR requires that the intended purpose be specified in the technical documentation and the CE marking; ambiguity here invites regulatory scrutiny.
Performance Evaluation and Clinical Evidence
The core of IVDR compliance for AI lies in the demonstration of performance. The regulation mandates a “Performance Evaluation” plan, report, and ongoing update. This is analogous to the Clinical Evaluation Report (CER) for other medical devices but tailored to diagnostic performance.
The Triad of Performance Evaluation
Performance evaluation under IVDR rests on three pillars:
- Analytical Performance: The ability of the device to correctly detect or measure a specific analyte. For AI, this means the algorithm’s accuracy in identifying patterns in data.
- Clinical Performance: The ability of the device to yield results that are clinically relevant and beneficial to the patient management. It answers the question: “Does using this AI lead to better patient outcomes?”
- Scientific Validity: The association between the analyte and the clinical condition. For AI, this often refers to the underlying scientific basis of the algorithm’s logic.
Expectations for AI Evidence
Regulators and Notified Bodies expect a rigorous approach to evidence generation for AI, given the “black box” nature and potential for algorithmic drift.
Retrospective vs. Prospective Studies
While retrospective data analysis is often used for initial validation, the IVDR emphasizes that evidence should be generated in a manner representative of the intended use. For AI diagnostics, this increasingly means prospective clinical performance studies. Retrospective data is useful for training and initial validation, but prospective data is required to demonstrate real-world utility and safety.
Handling Bias and Data Diversity
AI systems are susceptible to bias based on the training data. The IVDR requires that the clinical evidence demonstrates the device’s performance across the intended target population. Manufacturers must provide detailed information on the training dataset (data on age, sex, ethnicity, comorbidities, and specimen types). If the AI is intended for a diverse European population, the training data must reflect that diversity. A lack of diversity in training data is a significant gap in technical documentation.
Algorithm Change Protocols (ACP)
AI systems often learn and adapt. The IVDR requires a plan for “Post-Market Surveillance” (PMS) and “Post-Market Clinical Follow-up” (PMCF). For AI, this includes an Algorithm Change Protocol (ACP). The manufacturer must define when a change to the algorithm constitutes a “substantial modification” requiring a new conformity assessment. If the AI learns from new data continuously, the manufacturer must have a locked-algorithm version or a validated self-learning protocol (which is currently very difficult to get approved).
Technical Documentation and the “State of the Art”
The technical documentation is the evidence locker. For AI, it must contain specific details regarding the software lifecycle and risk management.
Software Lifecycle and Verification
Manufacturers must comply with the general safety and performance requirements, specifically Annex I, Chapter II, 17.1, which requires software to be developed according to the state of the art. While the IVDR does not mandate specific standards, the harmonized standard EN ISO 13485:2016 (Quality Management Systems) and EN ISO 14971:2019 (Risk Management) are essential. Furthermore, the specific standard EN ISO 62304:2006/A1:2015 (Software Life Cycle Processes) is the benchmark for software development.
For AI, this means documenting:
- The architecture of the neural network.
- The training methodology (e.g., supervised vs. unsupervised learning).
- The optimization and hyperparameter tuning processes.
- The validation metrics used (Sensitivity, Specificity, ROC curves, F1 scores).
Risk Management (ISO 14971)
Risk management for AI diagnostics differs from hardware. The risks are not just physical; they are informational. A false negative (missing a diagnosis) or a false positive (causing unnecessary anxiety and treatment) are the primary harms.
The Risk Management File must analyze:
- Estimation of Probability: How likely is the AI to fail? This is derived from validation performance metrics.
- Severity of Harm: Defined by the clinical context (e.g., missing a sepsis diagnosis is severe; missing a mild allergy is less severe).
- Production and Post-Production Controls: How will the manufacturer monitor the AI once it is deployed? This includes monitoring for “concept drift” where the real-world data diverges from the training data, degrading performance.
Post-Market Surveillance (PMS) and Vigilance
Under IVDR, PMS is not a passive activity; it is a continuous data collection process. For AI, the PMS plan is critical because AI performance can change over time.
Periodic Safety Update Reports (PSUR)
Manufacturers of Class C and D devices must submit a PSUR to the Notified Body annually. For AI diagnostics, the PSUR must include data on:
- Real-world performance metrics (e.g., sensitivity/specificity in the field).
- User feedback regarding diagnostic accuracy.
- Any updates or retraining of the algorithm.
Vigilance and Field Safety Corrective Actions
If an AI system consistently produces erroneous results due to a software bug or data drift, this constitutes a “Field Safety Corrective Action” (FSCA). The manufacturer must notify the competent authorities and users. The challenge for AI is identifying these failures early. Manufacturers are expected to implement automated monitoring systems that flag anomalies in the AI’s output distribution.
Transitional Provisions and National Implementation
The implementation of IVDR has been staggered. While the regulation is applicable, transitional provisions allow certain legacy devices (under the IVDD) to remain on the market until 2027 or 2029, depending on the class. However, for new AI software, there is no transition; it must comply with IVDR immediately.
National Differences and EUDAMED
While the IVDR is a Regulation (directly applicable in all Member States), national implementation varies, particularly regarding the designation of Notified Bodies and the interpretation of “state of the art” for AI.
For example, the German competent authority (BfArM) and the French ANSM have been particularly active in issuing guidance on software as a medical device. Manufacturers may find that Notified Bodies in different countries have slightly different focuses during audits. Some may focus heavily on the mathematical validation of the AI, while others may focus on the clinical utility and the quality management system.
The EUDAMED database is central to the new framework. Manufacturers must register themselves and their devices. For AI, the UDI (Unique Device Identification) system allows for traceability. If a specific version of an AI algorithm is found to be faulty, UDI allows for precise tracking of affected patients or users.
Practical Steps for Manufacturers
To successfully navigate the IVDR for AI diagnostics, a multidisciplinary approach is required. The legal team must define the intended purpose precisely to determine the correct classification. The engineering team must adhere to EN ISO 62304 for software development. The clinical team must design performance studies that satisfy the requirements for clinical evidence.
Specifically, manufacturers should:
- Conduct a Gap Analysis: Compare current documentation against IVDR requirements, specifically focusing on Rule 11 classification and the need for Notified Body involvement.
- Update the QMS: Ensure the Quality Management System covers software lifecycle, risk management, and post-market surveillance specific to AI.
- Prepare Technical Documentation: Compile detailed documentation on the algorithm, training data, validation results, and intended clinical use.
- Engage a Notified Body Early: Due to capacity constraints, engaging a Notified Body designated for IVDR is critical for planning timelines.
The IVDR represents a significant hurdle for AI diagnostics, but it also provides a clear framework for demonstrating the reliability and safety of these transformative technologies. By adhering to the principles of rigorous performance evaluation and continuous monitoring, manufacturers can ensure their AI systems contribute effectively to European healthcare.
