< All Topics
Print

Biotech Data Rules Worldwide: Genomics, Privacy, Cross-Border Transfers, Secondary Use

The governance of genomic and health data represents one of the most complex intersections of fundamental rights, scientific progress, and international commerce. For entities operating within the biotechnology and AI sectors, the legal landscape is not a monolith but a fragmented mosaic of regional directives, national statutes, and conflicting interpretations of what constitutes personal data versus anonymized information. Understanding these distinctions is not merely a compliance exercise; it is a prerequisite for viable cross-border research and the development of robust, generalizable AI models. This analysis navigates the regulatory frameworks of the European Union, the United States, the United Kingdom, China, Japan, Australia, and Canada, focusing on the practical implications for data processing, consent, and international transfers.

The European Union: A Rights-Based Ecosystem

The European Union approaches health data through a dual lens of fundamental rights and economic utility. The General Data Protection Regulation (GDPR) establishes the baseline, but it is increasingly supplemented by sector-specific legislation designed to unlock the potential of health data while maintaining strict control.

The GDPR Foundation and Special Categories

At its core, the GDPR classifies data concerning health, including genetic data, as a special category of personal data (Article 9). Processing such data is prohibited by default unless a specific condition is met. For biotech research, the most relevant conditions are explicit consent, vital interests (rarely applicable to research), public interest in the area of public health, or scientific research purposes subject to suitable safeguards.

A critical distinction in the EU is the concept of pseudonymization. Under GDPR, pseudonymized data—where identifiers are replaced by a token—remains personal data if the token can be reversed using additional information held separately. This means that even if a genomic dataset is stripped of names, it remains subject to GDPR if the researcher holds the key to re-identification. True anonymization, where the data is no longer identifiable by any means reasonably likely to be used, is the only escape route from GDPR. However, the bar for anonymization is exceptionally high. Regulators and courts generally accept that genomic data, being inherently unique and immutable, is practically impossible to anonymize effectively because it constitutes a permanent identifier.

Consent Models and the Research Exemption

The tension between individual autonomy and the collective good of research is palpable in the EU’s consent requirements. GDPR Article 9(2)(a) requires explicit consent for processing special categories. However, Recital 33 acknowledges that research purposes often cannot be fully specified at the time of data collection. Consequently, the EU allows for broad consent for scientific research, provided it is not contrary to national law.

Member States have implemented this flexibility differently. For instance, Germany (through the Federal Data Protection Act) and France (through the CNIL) have historically required specific consent for each research project, though recent updates are aligning closer to the GDPR’s allowance for broad consent. Conversely, Estonia and Finland operate on a model of “presumed consent” or “opt-out” systems for biobank participation, where citizens are included in research unless they explicitly object. This national divergence creates friction for pan-European research consortia, requiring a patchwork of legal bases depending on where the data originates.

The European Health Data Space (EHDS)

To harmonize this fragmented landscape, the EU is finalizing the European Health Data Space (EHDS) regulation. This is a game-changer for biotech. The EHDS creates a framework for the secondary use of electronic health data (EHD) for research, innovation, and policy-making. It establishes a “Health Data Access Body” (HDAB) in each Member State, which will act as a trusted intermediary. Researchers or AI developers will apply to the HDAB for access to anonymized or pseudonymized data. The EHDS aims to standardize the conditions for access across the EU, effectively creating a single market for health data. For AI practitioners, this means that eventually, accessing training data from multiple EU countries may be streamlined through a single procedural interface, though the technical standards for data formats remain a work in progress.

AI Act and High-Risk Systems

While the GDPR regulates the data, the Artificial Intelligence Act (AI Act) regulates the system processing that data. AI systems used in biotech for selecting patients in clinical trials, or for profiling based on genomic data, are often classified as “High-Risk.” This triggers obligations regarding data governance, data quality, and the mitigation of bias. The AI Act explicitly requires that training, validation, and testing data sets be relevant, representative, free of errors, and complete. In the context of genomic data, this raises the bar: an AI model trained on a dataset that lacks diversity (e.g., predominantly European ancestry) may fail the “representativeness” requirement, potentially barring it from the EU market.

The United States: A Sectoral and State-Based Patchwork

The United States lacks a comprehensive federal privacy law equivalent to the GDPR. Instead, it relies on a sectoral approach, resulting in a complex environment for biotech.

HIPAA and the “De-identification” Loophole

The Health Insurance Portability and Accountability Act (HIPAA) is the primary federal regulation. It protects “Protected Health Information” (PHI). However, HIPAA applies mainly to “Covered Entities” (healthcare providers, insurers) and their “Business Associates.” Biotech startups, direct-to-consumer genetic testing companies, and academic researchers often fall outside these definitions unless they are working directly with a healthcare provider.

Crucially, HIPAA allows for the “Safe Harbor” method of de-identification. If 18 specific identifiers (including names, dates, and all geographic subdivisions smaller than a state) are removed, the data is no longer considered PHI. This is a stark contrast to the EU’s strict stance on the identifiability of genomic data. A US-based AI company can often acquire genomic datasets that have undergone HIPAA Safe Harbor de-identification and use them for training without triggering HIPAA. However, this does not exempt them from other laws.

The Rise of State Laws and Genetic Privacy

The vacuum left by federal inaction is being filled by aggressive state legislation. California (CCPA/CPRA) treats genetic data as “sensitive personal information,” requiring opt-in consent for its use. Washington and Virginia have passed specific genetic privacy acts that prohibit the collection, use, or sharing of genetic data without explicit consent. These laws often apply to any entity collecting data from residents of that state, regardless of where the company is headquartered. For AI developers, this means that a dataset compliant with HIPAA might still violate Washington State law if used without specific consent.

FTC Enforcement and “De-identified” Data

The Federal Trade Commission (FTC) has recently signaled that it will treat “de-identified” data that can be re-identified as a deceptive practice. This brings the US approach closer to the EU reality, even if the statutes differ. If an AI company buys genomic data claiming it is de-identified, but the vendor retains a re-identification key, the FTC may intervene. This creates a due diligence burden for AI practitioners: verifying the technical methods of anonymization used by data vendors.

The United Kingdom: Post-Brexit Divergence

Following Brexit, the UK retained the GDPR in domestic law as “UK GDPR,” largely maintaining continuity. However, the direction of travel suggests a divergence aimed at fostering innovation.

The Common Law Duty of Confidentiality

Under the UK’s common law and the Common Law Duty of Confidentiality, patient data is confidential. Consent is the primary legal basis for processing, but it is not the only one. The UK has a robust framework for “section 251” support, allowing the use of patient data without consent for medical research if approved by the Confidentiality Advisory Group (CAG). This is a practical mechanism that facilitates large-scale genomic studies (like UK Biobank) without requiring individual opt-in for every specific study.

The Data Protection and Digital Information (DPDI) Bill

The UK is currently reforming its data protection laws through the DPDI Bill. A key proposal is the introduction of “recognised legitimate interests” for research. This would codify and potentially expand the ability to use data for research without relying on complex balancing tests. Furthermore, the UK is exploring “Digital Verification Services” and a “National Data Library” to facilitate secure data sharing for research. For AI developers, the UK aims to be a “pro-innovation” jurisdiction, potentially allowing broader secondary use of data than the EU, provided the research is for a “reasonable purpose.”

China: State Sovereignty and Security

China’s approach is defined by national security and data sovereignty. The regulatory framework is dense and rapidly evolving.

PIPL and the “Human Genetic Resources” Regime

The Personal Information Protection Law (PIPL) is China’s GDPR equivalent. It requires separate consent for processing sensitive personal information (which includes biometric and genetic data). However, PIPL operates alongside the Regulation on Human Genetic Resources (HGR). This specific regime governs the collection, preservation, and provision of human genetic resources (HGR) to foreign entities.

Any collaboration between a Chinese entity and a foreign entity (including EU or US subsidiaries) regarding HGR triggers strict reporting obligations to the Ministry of Science and Technology (MOST). Failure to comply can result in severe penalties, including the voiding of contracts. This effectively creates a “data airlock” for international AI research involving Chinese genomic data.

Cross-Border Transfer Mechanisms

Transferring personal data out of China requires passing a “Standard Contract” filing or a security assessment by the Cyberspace Administration of China (CAC). For genomic data, which is deemed “sensitive,” the threshold for these assessments is low. This makes it extremely difficult to pool Chinese genomic data into global AI training sets. Consequently, many global biotech firms are building isolated data silos for China, training models locally that cannot be exported.

Japan: Amakudari and the My Number System

Japan’s Act on the Protection of Personal Information (APPI) is generally considered business-friendly but rigorous in its enforcement.

De-identification Standards

Japan defines “pseudonymously processed information” (analogous to pseudonymized data) and “anonymized information.” The APPI allows for the use of anonymized information without restrictions. However, the standards for anonymization are defined by government ordinances and are technically strict. Unlike the US HIPAA Safe Harbor, Japan requires a risk assessment to ensure that re-identification is not reasonably possible.

My Number and Data Linkage

Japan is aggressively promoting the use of its “My Number” system (a unique social security ID) to link health insurance data with other social data for research purposes. The government has established “Special Zones for Regulatory Reform” to allow for easier data sharing in specific regions (like Fukuoka) to accelerate AI development in healthcare. This top-down approach contrasts with the EU’s bottom-up, rights-focused model.

Australia and Canada: The “Consent” vs. “Opt-Out” Divide

Australia: The Privacy Act Review

Australia is currently reforming its Privacy Act 1988. A major point of contention is the definition of “personal information.” Currently, the Office of the Australian Information Commissioner (OAIC) takes a strict view: if there is a “real possibility” of identification, the data is personal. This aligns with the EU.

However, Australia has a unique mechanism: the Opt-Out System for the Australian Health Identifiers (IHI). While not a biobank per se, the legislative framework allows for the use of de-identified data for public health research without explicit consent, provided it is approved by a Human Research Ethics Committee (HREC). The proposed reforms aim to clarify the conditions for “secondary use” of data, moving towards a model similar to the EU’s EHDS but with distinct Australian characteristics.

Canada: PIPEDA and the “Real Risk” Standard

Canada’s Personal Information Protection and Electronic Documents Act (PIPEDA) governs the private sector. Like the EU, Canada treats genetic information as sensitive. However, the interpretation of “de-identification” is currently under review by the Office of the Privacy Commissioner (OPC).

The OPC has argued that “de-identified” data should still be considered personal information if there is a “serious possibility” of re-identification. This creates a high standard similar to the EU. Canada is also developing a Consumer Privacy Protection Act (CPPA) (Bill C-27), which includes the Artificial Intelligence and Data Act (AIDA). AIDA proposes obligations for high-impact AI systems, potentially requiring risk mitigation for AI trained on Canadian health data. Canada also has specific Tri-Council Policy Statement (TCPS2) guidelines for research ethics, which govern consent for secondary use of data in research, often allowing for broad consent if the original consent covered “future research.”

Comparative Analysis: Implications for AI-Driven Biotech

For AI practitioners, the divergence in these frameworks creates specific operational challenges.

The Anonymization Fallacy

The most significant risk is relying on the US definition of “de-identified” data for global AI development. An AI model trained on HIPAA-compliant de-identified data may be legally sound in the US but illegal to process in the EU or Canada if the data can be reasonably linked back to an individual. Genomic data is the ultimate identifier. Even if names are removed, the DNA sequence itself is unique. Therefore, for any AI system intended for the global market, the strictest standard (EU/Canada) should be the baseline for data governance.

Cross-Border Transfer Mechanisms

Transferring data out of the EU requires an “adequacy decision” (like the EU-US Data Privacy Framework) or Standard Contractual Clauses (SCCs). However, China requires a separate security assessment. The US has no federal transfer mechanism, but state laws (like California) effectively restrict transfers that violate consumer rights.

Practical Strategy:

Biotech firms are increasingly adopting a “Data Embassy” model. Instead of attempting to harmonize data flows across borders, they establish data processing environments within each jurisdiction (e.g., a server cluster in Germany for EU data, a cluster in Shanghai for Chinese data). AI models are trained locally, and only the model parameters (not the raw data) are shared globally. This is the only viable path to compliance under the strictest interpretations of PIPL and GDPR.

Secondary Use and AI Training

The “Secondary Use” of data—using data collected for one purpose (e.g., clinical diagnosis) for another (e.g., AI training)—is the legal minefield.

  • EU: Secondary use is permitted under “scientific research” provisions, but the definition of research is broad. The EHDS will eventually streamline this, but currently, it requires a legal basis per dataset.
  • UK: Moving towards “recognised legitimate interests,” which may allow commercial AI training without explicit consent, provided it does not override individual rights.
  • China: Secondary use of HGR by foreign entities is strictly prohibited without separate approval.
  • US: If the data is de-identified under HIPAA, secondary use is generally free of restriction, unless a specific state law (like the Illinois Biometric Information Privacy Act) applies.

Consent Fatigue vs. Dynamic Consent

Traditional “broad consent” is becoming legally fragile. Regulators are increasingly demanding that individuals have the ability to withdraw consent or receive updates on how their data is used. This has led to the development of “Dynamic Consent” technologies, often using blockchain or secure APIs, allowing participants to toggle permissions for specific research projects. In the EU and UK, this aligns with the principle of autonomy. In the US, it is a best practice to mitigate FTC scrutiny regarding deceptive data practices.

Technical Compliance: The Role of Privacy-Enhancing Technologies (PETs)

Given the legal complexity, the industry is turning to technical solutions to bridge the gap between data utility and privacy compliance.

Federated Learning

Federated Learning allows an AI model to be trained across multiple decentralized edge devices or servers holding local data samples, without exchanging the data itself. This is particularly effective in the EU and China. A pharmaceutical company can train a model on genomic data held in a Chinese hospital and data held in a German biobank without ever moving the raw data across borders. This satisfies the “data localization” requirements of China and the “transfer restrictions” of the EU.

Differential Privacy

Differential privacy adds statistical “noise” to query results, ensuring that the inclusion or exclusion of a single individual’s data cannot be detected. This is increasingly viewed by regulators (including the CNIL in France) as a valid method to achieve anonymization. For AI training, differential privacy guarantees that the model does not memorize specific training examples, mitigating the risk of “membership inference attacks” where an attacker can determine if a specific person’s DNA was used to train the model.

Homomorphic Encryption

Homomorphic encryption allows computations to be performed on encrypted data without decrypting it first. While computationally expensive, it is the gold standard for cross-border collaboration where data must remain encrypted at all times. This is particularly relevant for collaborations involving US and EU entities, where the US CLOUD Act (which allows US law enforcement to access data stored by US companies abroad) creates legal uncertainty for EU data stored in US clouds. Homomorphic encryption ensures that even if the data is accessed, it remains unintelligible.

Summary of Regulatory Postures

To navigate this landscape, professionals must categorize jurisdictions based on their posture toward data mobility and AI innovation.

Table of Contents
Go to Top