< All Topics
Print

Cross-Border Data Transfers for AI: What Usually Breaks

Cross-border data transfers for artificial intelligence systems are a foundational operational reality for most organisations deploying advanced analytics, machine learning, or generative AI across the European Union. The legal framework governing these transfers is mature but intricate, and it is at the intersection of technical architecture, contractual design, and governance discipline where most compliance programmes falter. In practice, failures rarely stem from a single catastrophic decision; they accumulate through small, unmanaged gaps in documentation, subprocessor oversight, and the misalignment of technical safeguards with legal expectations. This article examines the common failure modes in cross-border data transfers for AI, focusing on the practical mechanics of contracts, safeguards, subprocessors, logging, and governance, while distinguishing between EU-level obligations and national implementations.

Understanding what “fails” requires a clear view of what the law demands. The General Data Protection Regulation (GDPR) does not prohibit international data transfers; it conditions them on specific legal mechanisms designed to ensure an essentially equivalent level of protection. The European Commission’s Decision 2023/707 on the EU-U.S. Data Privacy Framework (DPF) added a new adequacy path for transfers to certified U.S. organisations, while the Standard Contractual Clauses (SCCs) adopted in 2021 and the updated Recommendations on supplementary measures by the European Data Protection Board (EDPB) remain central for most non-adequate destinations. For groups of companies, Binding Corporate Rules (BCRs) continue to be a viable but demanding option. The Schrems II judgment of the Court of Justice of the European Union (CJEU) remains the interpretive lens through which all safeguards are evaluated, emphasising that contractual and technical measures must be effective in practice, not merely on paper.

Legal Foundations and the Practical Meaning of “Adequacy”

At the EU level, adequacy decisions recognise that a third country ensures an adequate level of protection, essentially comparable to the GDPR. The Commission has adopted adequacy decisions for countries such as the United Kingdom, Japan, South Korea, Canada (commercial organisations), and, more recently, the United States under the DPF for participating organisations. It is crucial to note that adequacy is not universal: it applies only to specific recipients and, in the case of the U.S., only to organisations certified under the DPF and subject to the oversight of the U.S. Federal Trade Commission or the Department of Transportation. Transfers to non-certified U.S. entities or to entities in countries without an adequacy decision require SCCs, BCRs, or another approved safeguard.

National implementations can matter. Some EU Member States have introduced additional notification or authorisation requirements for transfers to non-adequate countries, particularly for public bodies or sensitive sectors. For example, Germany’s Federal Data Protection Act (BDSG) includes specific rules on transfers to third countries and can impose stricter conditions in certain contexts. France’s CNIL has issued guidance on the use of cloud services and the conditions under which data may be stored outside France or the EU, including sector-specific expectations. While the GDPR harmonises the core principles, practitioners should expect local supervisory authorities to scrutinise the practical effectiveness of safeguards, especially where large-scale AI training or high-risk processing is involved.

Failure Mode 1: Contracts That Do Not Reflect Technical Reality

Contracts are the first line of defence, but they are also the most common source of failure. Many organisations adopt the SCCs as a boilerplate and treat signature as the completion of compliance. In practice, the SCCs require a transfer impact assessment (TIA) that maps the data flows, identifies the risks arising from the laws and practices of the destination country, and documents the supplementary measures deployed to mitigate those risks. The TIA is not a one-off exercise; it must be revisited when the AI system’s data processing changes, when new subprocessors are onboarded, or when the legal environment in the third country evolves.

Common contractual failure points include:

  • Incomplete annexes: The SCCs require precise descriptions of the transfer’s purpose, duration, types of data, and categories of data subjects. For AI, this is often underspecified. “Training and improvement of AI models” is not sufficiently precise if the training data includes special category data or if the model will be used in contexts beyond the original purpose.
  • Unclear liability chains: When multiple entities are involved (data exporter, processor, sub-processor, and onward recipients), the SCCs must reflect the actual allocation of responsibilities. If a cloud provider’s standard terms conflict with the SCCs, the conflict must be resolved explicitly.
  • Missing or weak technical schedules: The SCCs’ technical schedule should describe encryption, key management, access controls, and logging in operational terms. Vague statements like “industry-standard encryption” do not meet the standard set by the EDPB when the destination country’s laws enable access to data in a manner that undermines the safeguard.

From a practitioner’s perspective, the contract must be a live document tied to the system’s architecture. If the AI pipeline involves training in one jurisdiction and inference in another, the SCCs must address each stage and the associated risks. If the model is fine-tuned on data transferred back to the EU, the contract should specify the conditions under which that occurs and how the model’s outputs are governed.

Failure Mode 2: Misunderstanding the Role of the DPF and Its Limitations

The EU-U.S. Data Privacy Framework has simplified transfers to certified U.S. organisations, but it is not a panacea. The DPF’s effectiveness depends on the recipient’s certification status and the scope of the processing covered by that certification. A common failure is assuming that a vendor’s DPF certification covers all products and services. In many cases, only specific offerings are certified, or the certification applies only to certain data categories.

Additionally, the DPF does not eliminate the need for diligence. The U.S. surveillance legal landscape, including Section 702 of FISA and Executive Order 12333, remains relevant. The DPF provides redress mechanisms and requires participating organisations to maintain compliance, but it does not immunise them from lawful government access. If the AI system processes sensitive data or involves large-scale monitoring, organisations must still assess whether the DPF’s protections are sufficient in light of the EDPB’s Recommendations. In practice, many organisations fail to document why they consider the DPF adequate for a particular processing activity, relying instead on the existence of the adequacy decision without contextual analysis.

Failure Mode 3: Supplementary Measures That Are Not Actually Supplementary

The EDPB’s Recommendations describe both technical and organisational supplementary measures that may be necessary to ensure essential equivalence when using SCCs or BCRs. The failure here is often conceptual: organisations implement measures that look good on paper but do not neutralise the specific risks identified in the TIA.

Technical measures include:

  • Strong encryption in transit and at rest, with robust key management. The critical point is who holds the keys. If the cloud provider or a third party in the destination country can be compelled to provide access to keys or to decrypt data, encryption may not be an effective supplementary measure.
  • Pseudonymisation that reduces identifiability to the point that re-identification is not reasonably likely, given the state of the art and available resources. For AI training, pseudonymisation must be applied in a way that persists across the pipeline and is not trivially reversible.
  • Data minimisation at the model level, such as training on aggregated or synthetic datasets where feasible, or using privacy-preserving techniques like differential privacy or federated learning. These are not silver bullets; they must be implemented with measurable parameters and validated performance.

Organisational measures include:

  • Transparency obligations towards data subjects and supervisory authorities regarding the transfer and the safeguards applied.
  • Government access request policies that require the data importer to challenge unlawful requests and to notify the data exporter promptly.

A frequent failure is the use of “in-transit only” encryption without addressing the risk of compelled access to data at rest or to decryption keys. Another is relying on contractual confidentiality obligations in the destination country without assessing whether those obligations would prevail over national security laws.

Failure Mode 4: Subprocessor Governance Gaps

AI supply chains are deep. A data exporter may contract with a platform provider that relies on a cloud infrastructure provider, which in turn uses a managed database service, which may rely on a third-party analytics tool. Each layer is a potential subprocessor. The GDPR requires that processors use only subprocessors with prior written authorisation, and that the same data protection obligations flow down through the chain. In practice, subprocessor governance often breaks down in three ways:

  1. Incomplete inventories: Organisations maintain a list of approved subprocessors but fail to capture “shadow subprocessors” introduced by the primary processor, such as monitoring agents, logging services, or model evaluation partners.
  2. Static contracts: The data processor’s obligations to its subprocessors may be documented once and never revisited, even when the subprocessor changes its own safeguards or is acquired by a company in a different jurisdiction.
  3. Weak audit rights: The SCCs require the processor to provide the exporter with evidence of compliance, but many organisations accept attestations without testing them. For AI, this is particularly risky because subprocessors may process training data in ways that are not visible in standard logs.

From an AI systems perspective, subprocessor risk is not limited to data storage. It includes:

  • Model training partners who may use the data for their own purposes unless explicitly prohibited.
  • Data labelling services that may be located in non-adequate countries and handle raw or pseudonymised data.
  • Observability and monitoring tools that capture prompts, outputs, or user interactions, potentially creating new transfer scenarios.

Effective governance requires dynamic subprocessor management: pre-approval lists, change notification obligations, and the right to object within defined timeframes. It also requires that the data exporter be able to verify, at least on a risk-based basis, that subprocessors implement appropriate technical and organisational measures.

Failure Mode 5: Logging and Audit Trails That Do Not Prove Compliance

Logging is often treated as an operational concern rather than a compliance asset. This is a mistake. In the event of an investigation or a complaint, the ability to demonstrate what data was transferred, when, to whom, and under what safeguards is essential. For AI systems, the logging requirements are more nuanced than for traditional databases.

Common logging failure modes include:

  • Lack of transfer metadata: Logs may show that data moved between systems but not the legal basis (e.g., SCCs vs. DPF), the specific annex version used, or the TIA reference.
  • Insufficient retention: Logs are kept for short periods, making it impossible to reconstruct historical transfers for long-running training projects.
  • No linkage to model versions: AI training often involves iterative experiments. If logs do not tie data batches to model versions, it is impossible to respond to data subject requests for erasure or to demonstrate that a particular model was not trained on unlawful transfers.
  • Weak access logging: If data is accessed by personnel in a third country, logs must capture who accessed it, from where, and for what purpose. Without this, you cannot demonstrate control.

From a regulatory perspective, the absence of robust logs undermines the accountability principle. The EDPB Recommendations emphasise the need for technical measures that are verifiable. Logging is part of that verifiability. In practice, auditors and supervisory authorities will ask for evidence that transfers occurred under the correct legal instrument and that supplementary measures were in place at the time of transfer.

Failure Mode 6: Confusing Data Controller and Processor Roles in AI Pipelines

AI deployments often blur the lines between controller and processor. A company may use a third-party AI platform to build models on its own data, but the platform may also use customer data to improve its general models. The GDPR’s definitions are precise: the controller determines the purposes and means of processing; the processor processes on behalf of the controller. In many AI arrangements, the platform provider acts as a controller for the data used to improve its own models and as a processor for the customer’s specific use case. This dual role must be clearly delineated in the contract, and the transfer mechanisms must reflect the role for each processing activity.

Failure to separate these roles leads to:

  • Incorrect legal basis: Using SCCs intended for processor-to-processor transfers when the provider is a controller for its own purposes.
  • Unclear data subject rights: When a data subject requests erasure, it may be unclear whether the request applies to the customer’s model or the provider’s general model improvements.
  • Confusion over subprocessors: Subprocessors engaged for general model improvement may not be covered by the customer’s authorisation.

Practically, organisations should map each AI processing activity, identify the parties’ roles, and ensure that the transfer mechanism (SCCs, DPF, or BCRs) matches the role and the data flow.

Failure Mode 7: Overreliance on Anonymisation or Pseudonymisation

Anonymised data is not personal data and therefore falls outside the GDPR’s scope. However, the bar for anonymisation is high. The EDPB has made clear that anonymisation must be irreversible given the means reasonably likely to be used. In AI, this is difficult to achieve because models can memorise and re-identify. Pseudonymisation, while valuable, does not remove the data from GDPR’s scope. A common failure is to treat pseudonymised training data as “safe” for transfer without assessing whether the pseudonymisation can be reversed or circumvented, especially when combined with other datasets.

In practice, organisations should:

  • Document the pseudonymisation technique and its resilience to re-identification attacks.
  • Assess whether the pseudonymised data can be re-identified by the importer or by third parties, including government entities with legal powers.
  • Consider differential privacy or synthetic data generation, with measurable privacy budgets and validation.

Failure Mode 8: Ignoring Government Access Risks in the Destination Country

The CJEU in Schrems II made clear that laws in the destination country that enable indiscriminate or disproportionate access to data can undermine the effectiveness of SCCs. The TIA must assess the specific legal framework of the destination country, including surveillance laws, and the practical likelihood of access. Many TIAs are generic, stating that “the destination country has laws that may allow access” without analysing the scope, safeguards, and oversight.

For AI systems, the risk is not theoretical. Training datasets may be large and contain sensitive information. Inference logs may reveal user queries that are personal or sensitive. If the destination country’s laws permit access to such data without independent judicial oversight and without necessity and proportionality, the SCCs alone may be insufficient. Supplementary measures may include:

  • Storing data in the EU and providing remote access only under strict controls.
  • Using encryption with keys held exclusively in the EU and not accessible to the importer.
  • Architecting the system to minimise the transfer of raw data, using on-premise or EU-based processing where feasible.

In some cases, the risk may be unmitigable, and the transfer should not proceed. This is a difficult conclusion but one that the regulatory framework requires.

Failure Mode 9: Misunderstanding the Role of AI Model Weights and Outputs

Transfers of AI model weights or parameters are often treated as non-personal data. This is not always correct. If the model has been trained on personal data and the weights encode patterns that could be used to infer personal information, the model may be personal data. Similarly, outputs from an AI system may contain personal data. The transfer of the model or its outputs to a third country may therefore be a regulated transfer. Organisations frequently fail to assess these scenarios.

Practical steps include:

  • Assess whether the model or its outputs contain or could reveal personal data.
  • Document the legal basis for transferring the model or outputs.
  • Apply technical measures such as differential privacy or model compression that reduces the risk of re-identification.
  • Ensure that contracts cover the transfer of models and outputs explicitly.

Failure Mode 10: Governance and Accountability Gaps

Even where contracts and technical measures are sound, governance gaps can cause compliance to fail. The GDPR’s accountability principle requires organisations to demonstrate compliance. For cross-border AI transfers, this means:

  • Clear ownership: A designated person or team responsible for transfer compliance, with authority to block transfers that do not meet requirements.
  • Documented policies: Written policies for subprocessor approval, TIA updates, supplementary measures, and incident response.
  • Training and awareness: Engineers and product teams must understand the compliance constraints and be able to design systems accordingly.
  • Regular review: Periodic audits of transfers, subprocessors, and logs, with remediation plans for gaps.

From an AI systems practitioner’s perspective, governance must be embedded in the development lifecycle. Data protection by design and by default should be applied to the AI pipeline, including transfer controls. For example, a model training job should check whether the data it uses is approved for transfer and whether the destination meets the required safeguards.

Comparing Approaches Across European Countries

While the GDPR sets a common baseline, national implementations and supervisory authority practices vary. In Germany,

Table of Contents
Go to Top