< All Topics
Print

When AI Becomes a System: Components, Data, and Decision Chains

When we discuss artificial intelligence in a regulatory context, we are rarely talking about a single algorithm or a discrete mathematical model. In practice, AI is a composite system: a network of components, data sources, processing pipelines, and interfaces that together produce outputs intended to inform or automate decisions. The European Union’s approach to AI regulation reflects this reality. The AI Act (Regulation (EU) 2024/1689) establishes obligations that attach to the provider of an AI system, and both terms are defined with an eye to the whole, functioning arrangement rather than a narrow piece of software. For professionals building, procuring, or deploying AI in robotics, biotech, public administration, or financial services, understanding how components and data flows assemble into a regulated system is essential to managing risk, assigning responsibility, and meeting documentation duties.

This article examines the architecture of AI systems from a regulatory perspective. It explains how components and data pipelines create responsibility chains, how risk is determined and managed across the stack, and what documentation is required to demonstrate compliance. It distinguishes EU-level frameworks from national implementations and highlights practical differences across European jurisdictions. The aim is to provide a working model for translating engineering reality into regulatory compliance.

Defining the AI System: From Component to Regulated Whole

The AI Act defines an AI system in Article 3(1) as a machine-based system that is designed to operate with varying levels of autonomy and that, for explicit or implicit objectives, infers, predicts, or generates outputs which can influence physical or virtual environments. The definition emphasizes inference and adaptation rather than simple automation. A spreadsheet macro is not an AI system; a statistical model that learns from data and adapts its outputs to optimize a goal likely is.

Crucially, the definition is functional. Whether a system qualifies as AI under the Act depends on what it does and how it operates, not solely on the techniques used. This means that a system combining deterministic logic with a learned model can be an AI system if the learned component materially drives the outputs. It also means that components that are not themselves AI (for example, a data ingestion service) can be part of an AI system when integrated into a pipeline that performs inference.

“AI system” means a machine-based system that is designed to operate with varying levels of autonomy and that, for explicit or implicit objectives, infers, predicts, or generates outputs which can influence physical or virtual environments.

In practice, a regulated AI system often comprises:

  • Models and algorithms: The learned functions that produce predictions, classifications, or generations. These may be provided by third parties or developed in-house.
  • Data pipelines: Ingestion, validation, transformation, and feature engineering steps that prepare inputs for models.
  • Software infrastructure: APIs, orchestration layers, and deployment environments that manage lifecycle, scaling, and access control.
  • Interfaces and integrations: User interfaces, dashboards, and system-to-system connectors that expose outputs to decision-makers or automated actuators.
  • Monitoring and feedback loops: Logging, drift detection, and post-market surveillance mechanisms that observe performance and risk over time.

Each of these can be sourced from different vendors or teams. Under the AI Act, the provider is the entity that places the system on the market or puts it into service under its own name or trademark. This may be the organization that assembled the components into a coherent product, even if they did not build every component themselves. The provider bears the primary obligations for conformity assessment, documentation, and ongoing monitoring. This is why component integration is not merely an engineering decision—it is a regulatory design choice.

Distinction from other software and automated decisions

Not all software that uses data is an AI system. The AI Act excludes purely preparatory tools or systems that perform a narrow set of defined tasks (Article 2(6)). For example, a data quality script that cleans and formats records without inferring outputs that influence environments is not in scope. However, once a system infers outcomes that guide decisions—such as triaging medical images, scoring credit risk, or routing autonomous robots—it likely crosses the threshold.

Importantly, the AI Act sits alongside the Product Liability Directive (revised as the Product Liability Regulation, EU/2024/xxx) and the General Data Protection Regulation (GDPR). The product liability framework addresses compensation for harm caused by defective products, including software and AI. The GDPR governs personal data processing. An AI system may trigger obligations under all three regimes, and a failure in data governance can create liability under both GDPR and product liability law.

Components and Pipelines: How Responsibility Flows

Responsibility in AI systems flows through the data and decision pipelines. When a model produces an output, it does so based on data that has been collected, transformed, and selected through a chain of operations. Each link in that chain can introduce bias, error, or security vulnerabilities. Regulators expect providers to understand and control these flows.

Data sources and provenance

Data provenance is foundational. The AI Act requires that high-risk AI systems be trained on data that is relevant, representative, and free of errors as far as possible. In practice, this means documenting:

  • Where data comes from (internal systems, third-party providers, public datasets).
  • How it was collected and consented to (if personal data).
  • What transformations were applied (cleaning, normalization, feature engineering).
  • How the dataset reflects the operational environment (coverage, edge cases, known gaps).

For medical AI, for example, data provenance includes the clinical sites, patient populations, and imaging equipment used to generate training data. In financial services, it includes the representativeness of historical credit performance across demographics and economic cycles. In robotics, it includes sensor calibration and environmental conditions during data capture.

Where data is sourced from third parties, the provider should obtain contractual assurances of quality and compliance. Under GDPR, if personal data is processed, there must be a lawful basis and data subject rights must be respected. Data protection impact assessments (DPIAs) may be required. The AI Act’s documentation duties complement these requirements by focusing on the suitability of data for the AI’s intended purpose and robustness against misuse.

Preprocessing, feature engineering, and model selection

Preprocessing decisions—such as how missing values are handled, which features are engineered, and how categorical variables are encoded—can materially affect model behavior. The AI Act expects providers to document these choices and evaluate their impact on bias and performance. For example, using a feature that proxies for a protected characteristic can lead to discriminatory outcomes even if the model itself does not explicitly use that characteristic.

Model selection is also part of the regulatory picture. The Act does not mandate specific techniques, but it does require that high-risk systems be robust against manipulation and error. If a provider chooses a complex model that is difficult to interpret, they must implement additional measures to ensure explainability or provide meaningful information to deployers. In some jurisdictions, such as Germany, regulators may expect stronger interpretability for high-stakes public sector uses.

Deployment and integration

Once trained, models are deployed behind APIs, embedded in edge devices, or integrated into enterprise workflows. The AI Act treats the deployed system as the regulated product. This means that the provider must ensure that the system behaves consistently across environments, that updates do not introduce new risks without review, and that access controls prevent misuse.

Integration choices also affect liability. If a provider supplies a model API and another party embeds it into a high-risk application without proper safeguards, the provider may argue they are not the provider of the final AI system. However, if the provider markets the API for high-risk uses and supplies configuration tools that make it easy to deploy in those contexts, regulators may consider them a provider. Contractual clarity is essential, but it does not override functional reality.

Monitoring, feedback, and updates

AI systems are not static. Data drift, concept drift, and changes in user behavior can degrade performance. The AI Act requires post-market surveillance systems to monitor experience and collect performance data. Providers must have processes to identify, triage, and remediate issues. Updates that change the intended purpose or performance characteristics may require a new conformity assessment.

For high-risk systems, serious incidents must be reported to national authorities within prescribed timelines. This creates a direct feedback loop between operational monitoring and regulatory compliance. In practice, providers should implement:

  • Event logging for inputs, outputs, and errors.
  • Drift detection and threshold-based alerts.
  • Incident classification and escalation procedures.
  • Version control for models and data pipelines.

These measures are not merely technical best practices; they are regulatory obligations that demonstrate ongoing conformity.

Risk Classification: How Components Influence Risk Levels

The AI Act’s obligations are triggered by risk classification. Most obligations apply to high-risk AI systems listed in Annex III, which include uses in critical infrastructure, education, employment, essential services, law enforcement, migration, and administration of justice. The classification is use-case driven: a model is high-risk because of the context in which it is used, not solely because of its technical complexity.

Components can elevate or mitigate risk. A data pipeline that ingests unverified third-party data increases the risk of non-representativeness. A model that is opaque and not accompanied by explanations increases the risk of misuse. An interface that does not clearly present uncertainty can lead to over-reliance. Conversely, robust validation, human-in-the-loop controls, and calibrated uncertainty estimates can reduce risk.

When a system is not high-risk: obligations and expectations

Systems not listed in Annex III or that do not meet the definition of an AI system are not subject to the high-risk regime. However, they must still comply with general obligations, such as transparency when interacting with humans (for example, deepfakes or chatbots). Even where obligations are lighter, product liability and GDPR still apply. In practice, many organizations adopt high-risk disciplines—such as documentation and testing—across their AI portfolio to ensure consistency and readiness.

Provider versus deployer responsibilities

The provider is responsible for design, conformity assessment, technical documentation, and post-market surveillance. The deployer (user) is responsible for using the system in accordance with its intended purpose, ensuring human oversight, and monitoring for issues in operation. In many public sector deployments, the deployer is a public body that must also conduct a fundamental rights impact assessment where required by national law. National implementations may add specific duties for public bodies, such as consultation with supervisory authorities or additional transparency measures.

For example, in the Netherlands, public sector AI use is guided by national policy frameworks that emphasize transparency and algorithmic accountability. In France, the CNIL provides guidance on data protection and AI, with expectations around explainability and data minimization. In Germany, the states (Länder) have their own supervisory structures for public sector algorithms, and some require registration or audits for certain uses. These national variations do not replace the AI Act but layer on additional expectations for deployers in specific contexts.

Documentation Duties Across the Lifecycle

Documentation is the primary means by which providers demonstrate compliance. The AI Act requires technical documentation, conformity assessment documentation, instructions for use, and post-market surveillance records. These documents must be maintained for a defined period and made available to authorities upon request.

Technical documentation

Technical documentation should cover the system’s design, development, and testing. It typically includes:

  • Description of the system’s intended purpose and the context of use.
  • Details of the components, including models, data sources, and preprocessing steps.
  • Training, validation, and test methodologies, including metrics and performance results.
  • Risk assessment and mitigation measures, including cybersecurity and robustness.
  • Information for deployers on oversight, input requirements, and interpretation of outputs.
  • Logging and monitoring capabilities.

For high-risk systems, the documentation must demonstrate compliance with the essential requirements. This includes evidence that the system is robust against reasonably foreseeable misuse, that data is representative, and that human oversight measures are effective.

Instructions for use and user information

Deployers need clear instructions to operate the system safely. Instructions should explain:

  • What the system can and cannot do.
  • What inputs are required and what quality they must meet.
  • How to interpret outputs, including confidence scores or uncertainty indicators.
  • When and how to override or not use the system.
  • How to report issues.

Transparency obligations also apply when the system interacts with individuals. For example, if a chatbot provides information to consumers, users should be informed they are interacting with an automated system. In some contexts, such as emotion recognition or biometric categorization, specific labeling requirements may apply depending on national implementation and sector rules.

Post-market surveillance and incident reporting

Providers must establish a post-market surveillance system proportionate to the risk level of the AI system. For high-risk systems, this includes:

  • Continuous collection of performance and incident data.
  • Analysis of root causes and corrective actions.
  • Reporting of serious incidents to national authorities within 15 days of becoming aware of them, or without undue delay in cases of death or serious harm.

Timelines matter. Authorities expect providers to have triage procedures that distinguish minor issues from reportable serious incidents. Documentation should show how decisions are made and what evidence supports them.

Data Governance Under GDPR and the AI Act

Data governance is where GDPR and the AI Act intersect most directly. The AI Act requires that data used to train, validate, and test high-risk systems be relevant, representative, free of errors, and complete. GDPR requires lawfulness, fairness, transparency, purpose limitation, data minimization, accuracy, storage limitation, integrity, and confidentiality.

For personal data, providers must identify a lawful basis under Article 6 of GDPR. If special category data is used, an Article 9 exemption must apply. Even when using non-personal data, the principles of representativeness and quality align with GDPR’s accuracy principle. A model trained on biased personal data may produce discriminatory outputs, triggering both GDPR and AI Act concerns.

Practical steps include:

  • Conducting a DPIA where processing is likely to result in high risk to individuals.
  • Applying data minimization and pseudonymization where feasible.
  • Documenting data lineage and retention policies.
  • Establishing processes to correct inaccurate personal data and propagate corrections to models.

Where data is sourced from multiple jurisdictions, cross-border transfer rules under GDPR must be respected. This can be complex in federated learning settings or when using cloud services hosted outside the EU. The AI Act does not relax GDPR requirements; it reinforces the need for rigorous data governance.

Conformity Assessment and CE Marking

High-risk AI systems must undergo a conformity assessment before being placed on the market. Depending on the risk class and sector, this may be:

  • Internal control: The provider carries out the assessment themselves, documenting compliance.
  • Third-party assessment: A notified body evaluates the system, particularly where another regulation (such as medical device regulations) requires it.

Once conformity is declared, the provider affixes the CE marking and issues an EU declaration of conformity. The system can then be placed on the market. If the AI system is part of a product already covered by other EU legislation (for example, medical devices or machinery), the conformity assessment may be integrated. Providers should coordinate with their legal and regulatory teams to ensure the correct route.

It is important to note that the AI Act’s conformity assessment concerns the AI system as a product. It does not assess the broader societal impact of the use case. That said, deployers—especially public bodies—may need to conduct additional assessments, such as fundamental rights impact assessments, under national law.

National Implementation and Enforcement: Practical Differences

While the AI Act is a regulation directly applicable across the EU, its enforcement and some aspects of implementation depend on national authorities. Each Member State must designate a market surveillance authority for AI. In many countries, this will be the same body that oversees product safety or data protection, but some will create new agencies.

There are practical differences to watch:

  • Public sector oversight: Some countries (e.g., the Netherlands, Germany) have established registers or oversight bodies for public sector algorithms. Deployers in these jurisdictions should expect additional scrutiny and reporting.
  • Data protection authorities: DPAs such as France’s CNIL and Germany’s various Landesdatenschutzbeauftragte are active in AI guidance. They may expect detailed DPIAs and explanations for automated decisions under GDPR Article 22.
  • Sector regulators: Financial supervisors (e.g., BaFin, ACPR) and healthcare regulators may issue AI-specific guidance that aligns with the AI Act but adds sectoral expectations.
  • Liability regimes: The revised Product Liability Regulation harmonizes civil liability for defective products, including software. National courts will interpret this in light of AI-specific risks. Expect early cases focusing on data quality, model updates, and adequacy of instructions.

Organizations operating across multiple jurisdictions should harmonize their compliance programs while remaining alert to local expectations. A consistent documentation set, risk classification methodology, and incident management process can satisfy both EU and national requirements.

Practical Compliance: Building Regulated AI Systems

To translate regulatory expectations into engineering practice, providers should adopt a lifecycle approach that integrates compliance from the outset.

1. Intake and classification

Table of Contents
Go to Top