< All Topics
Print

Learning from Medical and Biotech AI Failures

The European Union has positioned itself as a global standard-setter in the governance of artificial intelligence through the adoption of the AI Act (Regulation (EU) 2024/1689). While the regulation establishes a horizontal framework applicable across sectors, its most stringent requirements converge in high-risk domains such as healthcare, biotechnology, and medical devices. The practical implementation of this framework is not an abstract exercise; it is being shaped by the lived experience of real-world AI failures. From algorithmic bias in clinical decision support tools to the collapse of digital health platforms and the misuse of biometric data in diagnostic systems, the regulatory contours of the AI Act are being drawn against a backdrop of technical shortcomings, governance gaps, and, in some cases, legal breaches. For professionals working in AI, robotics, biotech, and public institutions, understanding these failures is not merely a matter of technical curiosity. It is essential to operationalize compliance, anticipate enforcement priorities, and build resilient systems that can withstand both technical and regulatory scrutiny.

Unlike sector-specific regimes such as the Medical Device Regulation (MDR) or the In Vitro Diagnostic Regulation (IVDR), which focus on safety and performance in a clinical context, the AI Act introduces a layered set of obligations centered on data governance, transparency, human oversight, and robustness. However, these two regimes are deeply intertwined. When an AI system is embedded in a medical device, it must satisfy the essential requirements of the MDR/IVDR while also meeting the conformity assessment procedures and risk management obligations of the AI Act. The failures we observe in the market often occur at the intersection of these regimes, where gaps in data quality, inadequate validation, or opaque design choices undermine both clinical efficacy and regulatory compliance. By examining these cases, we can derive actionable lessons for the design and deployment of high-risk AI systems in Europe.

The Anatomy of Failure: Bias, Drift, and Overreliance

Real-world failures in medical and biotech AI rarely stem from a single catastrophic error. More often, they emerge from a combination of technical limitations, flawed assumptions, and inadequate governance structures. One of the most pervasive issues is algorithmic bias, where the performance of an AI system varies systematically across different patient populations due to unrepresentative training data. This is not a hypothetical risk; it has been documented in systems used for skin cancer detection, where models trained predominantly on lighter skin tones performed poorly on darker skin, and in predictive models for healthcare resource allocation that inadvertently discriminated against certain demographic groups. Under the AI Act, such systems would be classified as high-risk, and their providers would be required to implement data governance measures that ensure training, validation, and testing datasets are relevant, representative, and free from biases that could lead to discriminatory outcomes. The regulation explicitly requires that data sets be “free from errors and complete” to the extent possible, a standard that is difficult to meet without rigorous pre-deployment auditing.

Another critical failure mode is model drift, where the performance of an AI system degrades over time as the real-world data distribution diverges from the training data. In a clinical setting, this can occur due to seasonal variations in disease prevalence, changes in diagnostic protocols, or the introduction of new medical equipment. A model that is not continuously monitored may continue to operate while providing increasingly unreliable outputs. The AI Act addresses this by mandating that high-risk systems be subject to a risk management system that includes “post-market surveillance” to proactively identify risks. For providers, this means implementing robust monitoring pipelines that track model performance, detect drift, and trigger retraining or decommissioning. The regulatory expectation is not merely a static certification at the point of sale but a lifecycle approach to risk management.

A third category of failure involves human factors and the erosion of human oversight. In several documented cases, clinicians became over-reliant on AI-generated recommendations, accepting outputs without critical evaluation. This phenomenon, often described as “automation bias,” can lead to a failure to detect errors and a degradation of professional skills. The AI Act seeks to mitigate this by requiring that high-risk systems be designed to enable human oversight, which includes the ability for a human to interpret the system’s outputs and to override or ignore them. However, the effectiveness of this oversight depends on the clarity of the interface, the explainability of the model, and the training provided to the user. A system that presents a probabilistic score without context or confidence intervals may technically satisfy the requirement for human oversight, but in practice, it may encourage blind trust.

Case Study: The Failure of a Diagnostic AI in a European Hospital

Consider the case of a diagnostic AI tool deployed in a European hospital to detect early signs of pneumonia from chest X-rays. The system was developed by a startup and integrated into the hospital’s radiology workflow. Initially, the tool showed high accuracy in internal tests. However, after deployment, it began generating a high rate of false positives, leading to unnecessary follow-up tests and patient anxiety. An investigation revealed that the training data had been sourced primarily from a single university hospital where the patient demographics and imaging equipment were different from those in the deploying hospital. The model had not been validated on external data, and the hospital lacked the technical capacity to conduct its own validation.

In this scenario, the provider failed to meet the AI Act’s requirements for data governance and robustness. The training data was not representative of the target population, and the lack of external validation meant that the model’s generalizability was unknown. Under the AI Act, the provider would be required to conduct a conformity assessment prior to placing the system on the market, which includes evaluating the data quality and the model’s performance across different subgroups. The hospital, as a deployer, would also have obligations, including ensuring that the system is used in accordance with its intended purpose and that staff are adequately trained. The failure here was systemic: a lack of shared responsibility between the provider and the deployer, and a misunderstanding of the regulatory obligations that apply at each stage of the lifecycle.

Case Study: The Collapse of a Digital Health Platform

Another instructive example is the failure of a digital health platform that offered AI-driven mental health support. The platform, which was used by several public health institutions across different Member States, relied on a chatbot to provide cognitive behavioral therapy (CBT) techniques. After a series of incidents where the chatbot gave inappropriate or harmful advice, the platform was suspended. Investigations found that the underlying natural language processing model had been trained on a narrow dataset that did not adequately capture the nuances of mental health language, particularly in different European languages and cultural contexts. The system also lacked proper safeguards to escalate to human therapists in crisis situations.

This case highlights the importance of linguistic and cultural adaptation in AI systems deployed across the EU. The AI Act’s requirement for training data to be “relevant, representative, free of errors and complete” must be interpreted in a multilingual and multicultural context. A model that works well in English may fail in German or Polish if not properly adapted. Furthermore, the platform’s failure to implement a robust escalation mechanism violated the requirement for human oversight. The provider had focused on technical performance metrics (e.g., response time, engagement) while neglecting the ethical and safety dimensions that are central to the AI Act’s risk-based approach. Public institutions that procured this platform also faced scrutiny for failing to conduct adequate due diligence, a reminder that deployers have active compliance responsibilities.

Regulatory Responses: From Enforcement to Guidance

The European regulatory response to these failures is evolving from a reactive to a proactive stance. While the AI Act is the primary horizontal instrument, other regulations and guidance documents play a crucial role in shaping the ecosystem. The Medical Device Regulation (MDR) and the In Vitro Diagnostic Regulation (IVDR) are particularly relevant, as they already impose strict requirements on the safety and performance of medical AI. The AI Act complements these by adding layers of transparency, data governance, and fundamental rights protection. For example, an AI system that is also a medical device will need to undergo a conformity assessment under both regimes, potentially involving a Notified Body for the MDR/IVDR and a conformity assessment body for the AI Act (depending on the level of risk and the involvement of the provider).

At the national level, regulatory sandboxes are emerging as a key tool for testing AI systems in a controlled environment. Countries like Spain, France, and Germany have established sandboxes that allow developers to experiment with AI applications under the supervision of competent authorities. These sandboxes are particularly valuable for medical and biotech AI, where the stakes are high and the regulatory path can be complex. They provide a space to test data governance models, validate algorithms against real-world data, and engage with regulators early in the development process. The AI Act encourages Member States to establish such sandboxes and provides a framework for their operation, including liability and data protection considerations.

Another important regulatory response is the development of harmonized standards. European standardization organizations (ESOs) are currently working on standards that will provide a presumption of conformity with the AI Act’s requirements. For example, standards are being developed for risk management systems, data quality metrics, and transparency disclosures. Compliance with these standards is voluntary, but it offers a clear path to meeting the regulation’s essential requirements. For professionals in the field, staying informed about the development of these standards is critical, as they will shape the technical specifications of AI systems in the coming years.

The Role of the European Data Protection Board (EDPB)

Data protection is a cross-cutting concern in medical and biotech AI, and the European Data Protection Board (EDPB) has been active in providing guidance on how GDPR interacts with the AI Act. In particular, the EDPB has emphasized the importance of data minimization and purpose limitation in the context of AI training. This creates a tension: AI systems often require large, diverse datasets to perform well, but GDPR restricts the use of personal data to the specific purpose for which it was collected. The EDPB has suggested that anonymization or pseudonymization may be necessary, but it has also warned that these techniques are not foolproof and must be evaluated in context.

For healthcare providers, this means that using patient data to train AI models requires careful legal and technical planning. Consent may be one legal basis, but it must be specific and informed. The AI Act’s data governance requirements align with this, as they mandate that data be collected and processed in a way that respects privacy. In practice, this is leading to increased interest in privacy-preserving techniques such as federated learning, where models are trained across multiple institutions without centralizing the raw data. While these techniques are promising, they are not a silver bullet; they still require robust governance to ensure that the resulting models are fair and unbiased.

Comparative Approaches Across Member States

While the AI Act is a Regulation (meaning it is directly applicable in all Member States), its implementation will vary. National competent authorities will be responsible for enforcement, and their interpretations may differ. For example, in Germany, the Federal Institute for Drugs and Medical Devices (BfArM) has significant experience in regulating digital health applications (DiGA), having established a fast-track pathway for reimbursement. This experience positions Germany well to integrate the AI Act’s requirements into its existing framework. In France, the National Agency for the Safety of Medicines and Health Products (ANSM) has focused on the clinical validation of AI systems, emphasizing the need for real-world evidence. In Spain, the Agency for Medicines and Health Products (AEMPS) has been involved in regulatory sandboxes and has emphasized the importance of interoperability and data sharing in the public health system.

These differences matter for developers and deployers. A company seeking to market an AI-based diagnostic tool across Europe must navigate not only the common requirements of the AI Act but also the specific expectations of national authorities. This includes understanding the local data protection authority’s stance on the use of health data, the national health technology assessment (HTA) body’s criteria for reimbursement, and the clinical evidence standards required by the competent authority. The AI Act introduces the concept of a “single point of contact” in each Member State to streamline this process, but the practical coordination between different authorities remains a challenge.

Operationalizing Compliance: Lessons from the Field

Translating regulatory requirements into operational practice is the core challenge for organizations working with medical and biotech AI. The AI Act is not a checklist; it is a framework for risk management and governance. To operationalize compliance, organizations need to build structures and processes that embed regulatory thinking into the AI lifecycle. This starts with documentation. The AI Act requires extensive technical documentation, including descriptions of the system’s capabilities, its intended purpose, the data used for training and testing, the risk management system, and the measures taken to ensure transparency and human oversight. This documentation is not just for regulators; it is a critical tool for internal governance, enabling teams to track decisions, justify design choices, and demonstrate compliance.

Another key operational requirement is conformity assessment. High-risk AI systems are subject to a conformity assessment procedure before they can be placed on the market. For medical AI, this may involve a Notified Body under the MDR/IVDR, and for certain high-risk AI systems (e.g., those used for biometric identification), it will involve a conformity assessment body designated under the AI Act. The process involves a review of the technical documentation, an audit of the risk management system, and, in some cases, testing of the system. Organizations should prepare for this by conducting internal audits and gap analyses well in advance.

For deployers (e.g., hospitals, research institutions), the obligations are different but equally important. Deployers must ensure that the AI system is used in accordance with its intended purpose, that human operators are trained, and that any incidents or malfunctions are reported to the provider and, in some cases, to national authorities. Deployers also have a role in ensuring data quality, particularly when the system is used in a new context or with new data. This requires establishing internal governance structures, such as AI ethics committees or data stewardship boards, to oversee the use of AI systems.

Building a Robust Data Governance Framework

Data governance is the foundation of compliant AI. Under the AI Act, providers must ensure that training, validation, and testing data are relevant, representative, and free from errors. This requires a systematic approach to data collection, labeling, and curation. Organizations should implement data provenance tracking to document the origin and processing history of datasets. This is particularly important in biotech, where data may come from multiple sources, including clinical trials, electronic health records, and genomic databases. Each source may have different legal and ethical constraints, and the data must be harmonized to avoid introducing bias.

Labeling is another critical area. In medical AI, labels are often derived from clinical annotations, which can be subjective or inconsistent. The AI Act requires that data be “free from errors,” but in practice, this means having clear guidelines for labeling, training for annotators, and mechanisms for resolving disagreements. For example, in a pathology AI system, the labels for cancerous cells may vary between pathologists. A robust data governance framework would include inter-annotator agreement metrics and a process for adjudicating difficult cases. This is not just a technical best practice; it is a regulatory requirement to ensure the reliability of the system.

Finally, organizations must address data minimization and privacy by design. The AI Act does not override GDPR, and the two regulations must be read together. This means that organizations should collect only the data that is strictly necessary for the AI system’s function and should implement technical measures to protect privacy, such as encryption, access controls, and anonymization. In practice, this may involve using synthetic data for initial model development or employing differential privacy techniques to add noise to datasets. These approaches can help balance the need for high-quality training data with the imperative to protect individual privacy.

Transparency and Explainability in Practice

Transparency is a core principle of the AI Act, but it is often misunderstood. Transparency does not mean revealing proprietary algorithms or training data; it means providing clear, meaningful information to users and regulators about how the system works, its limitations, and its intended use. For medical AI, this includes disclosing the system’s performance metrics (e.g., sensitivity, specificity), the clinical populations on which it has been validated, and any known limitations or failure modes. This information should be included in the technical documentation and made available to users through the system’s interface or user manual.

Explainability is closely related but distinct. It refers to the ability to understand and interpret the system’s outputs. For many AI systems, especially deep learning models, full explainability is technically challenging. However, the AI Act does not require perfect explainability; it requires that the system be designed and developed in a way that enables human oversight. In practice, this may involve providing feature importance scores, highlighting regions of interest in an image, or presenting confidence intervals alongside predictions. The goal is to give the human operator enough context to make an informed decision about whether to trust the AI’s output.

Organizations should be cautious about “explainability washing,” where superficial explanations are provided without real insight. A meaningful explanation should be actionable and relevant to the user’s task. For a radiologist, an explanation that highlights the specific regions of an X-ray that led to a diagnosis is more useful than a generic statement about model confidence. Developing these explanations requires collaboration between technical teams and domain experts, and it should be tested with real users to ensure it supports, rather than hinders, decision-making.

Looking Ahead: The Path to Effective Implementation

The AI Act is a landmark regulation, but it is only the beginning of a long process of implementation, interpretation, and refinement. The European Commission will issue guidelines on the application of the regulation, and the AI Office (a new body established by the Act) will coordinate enforcement and support the development of standards. National authorities will build their capacity and develop their own guidance. The courts will be called upon to interpret key concepts, such as what constitutes “high-risk” in a given context or how to balance transparency with intellectual property rights.

For professionals in the medical and biotech sectors, the path forward involves active engagement with this evolving landscape. This means participating in industry forums, contributing to standardization efforts, and collaborating with regulators through sandboxes and consultations. It also means investing in internal capacity: training staff on the requirements of the AI Act, establishing cross-functional governance teams, and building the technical infrastructure for robust data management and model monitoring. The failures of the past provide a roadmap for what can go wrong; the regulatory framework provides a blueprint for what must go right. The challenge now is to translate that blueprint into systems that are safe, effective, and worthy of public trust.

Table of Contents
Go to Top