< All Topics
Print

From Failure to Fix: How Regulators Expect You to Respond

When a high-risk AI system, a complex medical device, or a critical data processing operation fails, the immediate organisational instinct is often to contain the damage, fix the technical glitch, and restore normal operations as quickly as possible. While operational continuity is essential, this instinct can be dangerously incomplete from a regulatory perspective. European regulators, guided by frameworks such as the AI Act, the GDPR, and the Medical Device Regulation (MDR), view failure not merely as a technical anomaly but as a signal of potential systemic weakness in governance, risk management, and compliance culture. The regulatory expectation is a structured, transparent, and evidence-based journey from the detection of a failure to its resolution and prevention. This journey is not linear; it is a cycle of investigation, reporting, corrective action, communication, and fortification of preventive controls. Understanding this cycle is critical for any professional deploying technology within the European Union, as the difference between a manageable incident and a regulatory sanction often lies in the quality of the response.

The Regulatory Definition of “Failure”

Before dissecting the response, one must first understand what constitutes a “failure” in the eyes of European regulators. It is a broad term that extends far beyond a system crash or a data breach. It encompasses any situation where a system’s operation deviates from its intended purpose or regulatory obligations, resulting in or having the potential to result in harm.

Under the AI Act (Regulation (EU) 2024/1689), a “serious incident” is defined as an incident that directly or indirectly leads to the death of a person, serious harm to a person’s health, serious disruption of public services, or a breach of fundamental rights. For providers of high-risk AI systems, the discovery of such an incident triggers a cascade of legal duties. A “failure” can also be a non-technical one, such as a failure in the human oversight mechanisms required by Article 14 of the AI Act, where an operator is unable to understand or override the system’s logic.

Similarly, the General Data Protection Regulation (GDPR) refers to a “personal data breach,” defined as a breach of security leading to the accidental or unlawful destruction, loss, alteration, unauthorised disclosure of, or access to, personal data transmitted, stored, or otherwise processed. This is a specific type of failure, but the principles of response are analogous. The Medical Device Regulation (MDR) and the Product Liability Directive (PLD) also have their own specific triggers, but the underlying philosophy is consistent: the entity that places the system or product on the market is best placed to understand its risks and is therefore responsible for monitoring and responding to its failures.

From a practitioner’s standpoint, this means that a failure is not just a ticket for the IT department. It is a compliance event that must be escalated immediately to the legal, compliance, and executive levels. The threshold for reporting is not “certainty of harm” but “reasonable suspicion of a significant risk.”

The Immediate Triage: Investigation and Reporting Timelines

The first 24 to 72 hours after detecting a failure are critical. Regulators expect a pre-defined process to be activated instantly. This is not the time for ad-hoc meetings or debating who is responsible. It is the time for structured triage.

Internal Investigation: Preserving Evidence and Establishing Facts

The initial goal is to establish the “what, where, and when” without apportioning blame. Regulators expect organisations to conduct a root cause analysis (RCA). This is not a superficial “5 Whys” exercise but a deep, forensic investigation. For an AI system, this involves:

  • Log Analysis: Reviewing system logs, user interactions, and model inference records to trace the anomaly.
  • Data Provenance: Checking if the failure originated from a change in input data, data drift, or a corrupted training dataset.
  • Model Behaviour: For black-box models, using explainability tools (XAI) to understand why a specific decision was made that constituted the failure.

Crucially, the investigation must be documented in real-time. Regulators will ask for this documentation later. A lack of clear, contemporaneous records is often interpreted as a sign of poor governance. The investigation should be led by a cross-functional team, including technical experts, legal counsel, and a representative from the compliance or risk management department.

External Reporting: The Clock is Ticking

Once a failure meets the regulatory threshold, the clock starts ticking on reporting obligations. These deadlines are strict and non-negotiable.

Reporting under the AI Act

For providers of high-risk AI systems, Article 73 of the AI Act mandates the reporting of “serious incidents.” The timeline is severe:

Providers of high-risk AI systems shall report any serious incident to the market surveillance authorities of the Member States in which the incident occurred, without undue delay and in any event not later than 15 calendar days after the date on which they become aware of the serious incident.

This 15-day clock starts when the provider “becomes aware,” not when the investigation is complete. The initial report may be preliminary, but it must contain sufficient information for the authority to assess the risk. A follow-up report is expected once more details are known. Failure to meet this deadline can result in significant fines.

Reporting under the GDPR

The GDPR is even faster. Article 33 requires notification of a personal data breach to the supervisory authority “without undue delay and, where feasible, not later than 72 hours” after becoming aware of it. This is a well-known but frequently missed deadline. The notification must describe the nature of the breach, the categories and approximate number of data subjects and records concerned, and the likely consequences. It must also detail the measures taken or proposed to address the breach.

Reporting under the MDR

For medical devices, the timelines are based on severity. A “serious incident” must be reported via the EUDAMED database within 2 days for events requiring urgent remedial action, and 10 days for other serious incidents. The definition of “serious” here is specific to the device’s performance and patient safety.

The key takeaway for any organisation is that internal escalation procedures must be calibrated to these external deadlines. If your internal process takes 5 days to decide whether to report, you have already consumed a significant portion of your GDPR timeline and a third of your AI Act timeline. Regulators expect a “precautionary principle” approach: report first, and refine the details later.

Corrective and Preventive Actions (CAPA): The Core of the Fix

Reporting is only the beginning of the remedy. Regulators are ultimately interested in what you do to fix the problem and ensure it does not happen again. This is the domain of Corrective and Preventive Actions (CAPA), a concept borrowed from quality management systems like ISO 9001 and deeply embedded in sector-specific regulations like the MDR.

Corrective Actions: Addressing the Symptom and the Cause

Corrective actions are reactive. They address the specific failure that has already occurred. In the context of an AI system, this could involve:

  • Retraining the model: If the failure was caused by data drift or bias, the model must be retrained on a clean, representative dataset.
  • Patching the software: Fixing a bug in the codebase or a vulnerability in the infrastructure.
  • Updating documentation: If the failure revealed that the instructions for use were unclear or misleading, they must be revised.

  • Implementing new controls: Adding a new layer of human oversight or a rule-based check before the AI’s output is used.

Regulators will scrutinise the effectiveness of these actions. They will ask: “How do you know your fix worked?” This requires a validation process. For an AI model, this means testing its performance against a hold-out dataset and monitoring it closely in a staging environment before redeployment. Simply patching the code is not enough; you must prove the risk has been mitigated.

Preventive Actions: Strengthening the System

Preventive actions are proactive. They aim to stop similar failures from occurring in other systems or in the future. This is where an organisation demonstrates its maturity. A regulator seeing a CAPA report that only addresses the specific incident is likely to ask: “Have you reviewed your other AI systems for similar vulnerabilities?”

Preventive actions might include:

  • Enhancing the risk management file: The AI Act requires a risk management system for the entire lifecycle. A failure should trigger a review of this file to identify and mitigate new or previously unforeseen risks.
  • Improving data governance: Implementing stricter data validation pipelines or more frequent checks for data drift.
  • Revising training programs: Updating the training for human operators to ensure they are better equipped to spot anomalies.
  • Conducting a wider audit: Performing a conformity assessment review of similar products in the company’s portfolio.

The distinction between corrective and preventive is vital. A corrective action fixes the broken machine. A preventive action fixes the maintenance schedule so the machine doesn’t break again. Regulators expect both.

Communication and Transparency: Managing Stakeholders

Communication during a failure is a delicate balancing act. It must be transparent and timely without causing unnecessary panic or violating confidentiality obligations. Regulators have clear expectations for different audiences.

Communication with Authorities

As discussed, reporting to authorities is mandatory. The communication must be professional, factual, and cooperative. An adversarial or evasive stance is a major red flag. Authorities expect to see:

  • A clear timeline of events.
  • An honest assessment of what is known and what is still under investigation.
  • A commitment to corrective action.

In many cases, authorities will engage in a dialogue. They may issue formal information requests or even impose conditions on the continued use of the system. Cooperation is not just good practice; it can be a mitigating factor when sanctions are considered.

Communication with Affected Individuals

When a failure directly impacts individuals, communication becomes a legal duty. Under GDPR, if a personal data breach is likely to result in a high risk to the rights and freedoms of individuals, the controller must communicate the breach to the affected data subjects “without undue delay” (Article 34).

This communication must be clear and in plain language. It should describe the nature of the breach, the likely consequences, and the measures taken to mitigate the harm. It should also provide contact details for further information. Hiding behind legal jargon or downplaying the severity is ill-advised and can lead to further sanctions and reputational damage.

Communication with the Market and Public

For providers of high-risk AI systems or medical devices, public communication may be necessary, especially if the failure could affect other users. This is often coordinated with the market surveillance authority. The goal is to inform other users of the risk and the availability of a fix or mitigation strategy. This communication must be carefully worded to avoid causing market panic while fulfilling the duty of care. It should be factual, avoid speculation, and provide clear instructions for users.

The Role of Post-Market Surveillance and Monitoring

The regulatory expectation does not end with the fix. The AI Act, MDR, and GDPR all embed the concept of continuous lifecycle oversight. A failure is often a symptom that the post-market surveillance (PMS) system was not robust enough.

Post-Market Surveillance under the AI Act

Article 72 of the AI Act requires providers to establish a systematic, automated, and documented PMS system. This system must actively and systematically collect, document, and analyse data over the entire lifecycle of the high-risk AI system. A failure should trigger a review of the PMS plan. Was the monitoring frequency too low? Were the wrong metrics being tracked? Did the system lack the capability to detect the drift that led to the failure? Regulators will expect the PMS system to be updated based on the lessons learned from the incident.

Post-Market Clinical Follow-up (PMCF) under the MDR

For medical devices, the PMCF is a continuous process to confirm the safety, performance, and benefit-risk ratio of the device. A serious incident is a key input into the PMCF plan. It may trigger new clinical investigations or a re-evaluation of the clinical evidence.

Continuous Compliance under GDPR

The GDPR’s principle of “accountability” requires organisations to continuously demonstrate compliance. A data breach is a clear indication that a control has failed. The response must include a review of all related controls. For example, if a breach occurred due to an employee error, the response should include a review of staff training, access controls, and perhaps the implementation of technical measures like pseudonymisation or encryption to reduce the impact of any future breach.

National Nuances and Cross-Border Coordination

While the regulations are set at the EU level, their enforcement and the practicalities of response can vary across Member States. This is particularly relevant for organisations operating in multiple jurisdictions.

Under the AI Act, the “single point of contact” for reporting is the market surveillance authority of the Member State where the incident occurred. If the AI system is deployed across several countries, you may need to notify multiple authorities. However, the AI Act includes provisions for cooperation between authorities. A lead market surveillance authority may be designated, but this does not absolve the provider of notifying all relevant national authorities.

The GDPR also requires cross-border cooperation. If a controller is established in multiple Member States, the “lead supervisory authority” is the one in the main establishment. However, any supervisory authority where the breach has a substantial effect can take the lead in investigating and imposing fines. This can lead to complex, multi-jurisdictional investigations.

Practically, this means that an organisation’s response plan must be geographically aware. A single point of contact within the company should be able to coordinate with legal representatives in each relevant Member State to ensure that reports are filed correctly and in the local language if required. The tone and expectations of the French CNIL may differ subtly from those of the Irish DPC or the German BfDI, even though they all enforce the same GDPR.

Conclusion: From Reactive Compliance to Proactive Governance

The regulatory framework in Europe is designed to make the cost of a poor response higher than the cost of getting the response right. Fines, product recalls, market bans, and reputational damage are the consequences of treating failures as mere technical glitches. The expectation is a mature, structured, and transparent process that treats a failure as an opportunity to learn and strengthen the system. For professionals in AI, robotics, and data systems, building this capability is not a matter of avoiding punishment; it is a prerequisite for sustainable innovation and earning the trust of users and regulators alike.

Table of Contents
Go to Top