< All Topics
Print

Genetic Data in Europe: Governance Beyond ‘Sensitive Data’

Genetic data occupies a singular and challenging position within the European data protection landscape. While Article 9 of the GDPR explicitly classifies it as a special category of personal data, triggering heightened protection requirements, the practical governance of this data type extends far beyond the simple application of a prohibition on processing. The unique characteristics of genomic information—its inherent link to identity, its familial nature, and its near-perpetual utility for identification and profiling—create compliance complexities that strain standard data protection models. For professionals in biotechnology, healthcare, research, and technology development, understanding these nuances is not merely a matter of legal diligence; it is fundamental to designing viable and ethical systems. This analysis moves beyond the textbook definition of sensitive data to explore the operational realities of managing genetic data under the General Data Protection Regulation (GDPR) and its interplay with national implementations, the upcoming AI Act, and sector-specific legislation.

The Nature of Genetic Data: More Than Just Information

At its core, the challenge stems from the fundamental nature of genetic data. Unlike a name, an address, or even a biometric fingerprint which can be changed or, in some cases, obscured, genetic data is a unique and unchangeable identifier that is shared between biological relatives. This creates a dual-identity problem: an individual’s genetic data is simultaneously their own personal information and a reflection of information about their parents, siblings, children, and wider kinship group. This inherent “familial implication” means that an individual’s decision to share or process their genetic data has direct and unavoidable consequences for the privacy rights of others who have not consented, and may not even be aware of the data processing.

Furthermore, the identifiability of genetic data is not a static state. A sequence of nucleotides, even when anonymized or aggregated, carries the potential for re-identification. As computational power grows and genomic databases expand, the risk of linking pseudonymized genetic data back to a specific individual increases. This reality challenges the traditional GDPR concepts of anonymization and pseudonymization. What might be considered sufficiently anonymized for one purpose today could be re-identified tomorrow with a different dataset or a more powerful algorithm. This perpetual risk of re-identification means that the “residual risk” associated with genetic data processing is exceptionally high and requires a dynamic, forward-looking approach to risk assessment and mitigation.

Defining the Scope: From DNA Sequence to Health Inference

Regulatory definitions are critical for determining the applicable legal basis. The GDPR itself does not provide an exhaustive definition of “genetic data,” but Recital 34 offers guidance, stating it is “personal data relating to the inherited or acquired genetic characteristics of a natural person which give unique information about the physiology or the health of that natural person and which result, in particular, from an analysis of a biological sample from the natural person in question.” This definition is intentionally broad. It encompasses not only the raw DNA sequence from a blood or saliva sample but also the results of genotyping, the analysis of gene expression, and even inferred data such as polygenic risk scores that predict predisposition to certain diseases.

In practice, this means that a wide array of data points generated in research, clinical diagnostics, and even direct-to-consumer wellness or ancestry services fall under the special category. The key determinant is whether the data can be linked to an individual and whether it reveals information about their physiological or health characteristics. This is where the line can become blurred. For instance, data about a specific gene variant associated with a higher risk of breast cancer is clearly genetic data. However, a statistical correlation derived from a large genomic dataset that is then applied to an individual might also be considered to be derived from genetic data, and thus subject to the same stringent rules.

Identifiability and the Pseudonymization Paradox

A common strategy in genomic research is to pseudonymize data by removing direct identifiers like name and address and replacing them with a unique code. While pseudonymization is a key data protection technique encouraged by the GDPR (Recital 26), it is rarely sufficient on its own to remove genetic data from the scope of the regulation. Because a DNA sequence is, in itself, a unique identifier, it can often be used to re-identify an individual by cross-referencing with public or private databases (e.g., genealogical databases). This creates a paradox: the data is still considered personal data under GDPR because of the re-identification risk, yet it is often processed in a “pseudonymized” state for research efficiency.

Consequently, controllers processing pseudonymized genetic data must still comply with nearly all GDPR obligations, including providing a legal basis under Article 9 (such as explicit consent), ensuring data subject rights, and implementing robust security measures. The European Data Protection Board (EDPB) has consistently taken the view that the “single identifier” that is the genome itself means that true anonymization is exceptionally difficult to achieve in a way that is irreversible. This has profound practical implications, as it means that even long-term archival of genetic data for research purposes cannot simply be “anonymized” and placed outside the scope of GDPR.

The Legal Basis Labyrinth: Processing Special Category Data

Processing genetic data is, in principle, prohibited by Article 9(1) of the GDPR unless a specific condition in Article 9(2) is met. This is a significant departure from the general rule for personal data, which can be processed on the basis of a legitimate interest. For genetic data, the controller must find an explicit exception. The most common of these are explicit consent (Art. 9(2)(a)), vital interests (Art. 9(2)(c)), reasons of substantial public interest (Art. 9(2)(g)), and scientific research (Art. 9(2)(j)). Each of these grounds comes with its own set of stringent requirements and practical challenges.

Explicit Consent: The Gold Standard with Caveats

For many commercial and research applications, explicit consent is the go-to legal basis. However, the bar for valid explicit consent under Article 9 is much higher than for standard consent under Article 6. It must be a “freely given, specific, informed and unambiguous indication” of the data subject’s wishes, and it must be given by a statement or a clear affirmative action. This means pre-ticked boxes or implied consent are invalid. The information provided to the individual must be exceptionally clear, explaining the nature of the data, the purposes of processing, the risks involved, and the fact that they can withdraw consent at any time.

A significant practical challenge is the concept of “broad consent” for scientific research. The GDPR allows for consent to be given for specific, explicitly stated purposes, but research often involves open-ended exploration. Recital 33 acknowledges this, stating that consent may be given for “specific research purposes” and that it should be possible to give consent to certain areas of scientific research. However, this has been interpreted strictly by many national Data Protection Authorities (DPAs). In Germany, for example, the Federal Commissioner for Data Protection and Freedom of Information (BfDI) has emphasized that consent for genetic research must be as specific as possible, and broad, all-encompassing consent for future, undefined research is generally not considered valid. This creates a tension between the need for scientific flexibility and the principle of purpose limitation.

Vital Interests and Public Interest: A High Bar

Processing genetic data without consent on the basis of “vital interests” (Article 9(2)(c)) is a narrow exception. It applies only where the data subject is physically or legally incapable of giving consent and the processing is necessary to protect the life of the data subject. This might be relevant in an emergency medical situation where a patient is unconscious and their genetic data is needed to inform treatment. It would not, however, apply to most research or non-urgent clinical scenarios.

The “public interest” ground (Article 9(2)(g)) is another important exception, particularly in the healthcare and public health sectors. This requires a basis in Union or Member State law which must be “proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and the interests of the natural person.” This is the legal basis used for national biobanks and large-scale population genomic initiatives like the 1+ Million Genomes initiative. The national laws that establish these biobanks are critical. For instance, France’s *Loi de bioéthique* provides a framework for the reuse of biological samples and data for research, often based on a form of non-opposition from the patient, which is distinct from explicit consent and is grounded in the public interest. Similarly, the UK’s Biobank operates under a robust governance framework approved by national ethics committees, relying on broad consent from participants for a wide range of research, which is then overseen by the UK Information Commissioner’s Office (ICO).

The Research Exemption: A Balancing Act

Article 9(2)(j) provides an exemption for processing for scientific research purposes, subject to Union or Member State law. This is a crucial provision for the biomedical research community. However, it is not a blanket exemption. The processing must be compatible with the original research purpose, and it must be subject to appropriate safeguards, including pseudonymization and data minimization. The “right to be forgotten” (Article 17) is limited in this context, as erasure may render scientific research impossible. However, data subjects retain the right to object to processing for direct marketing, which is absolute.

Crucially, the research exemption does not absolve the controller from ensuring the security of the data or from conducting a Data Protection Impact Assessment (DPIA). In fact, the processing of genetic data for research is almost always considered “high-risk,” triggering the mandatory requirement for a DPIA under Article 35. This assessment must detail the risks to the rights and freedoms of individuals and the measures taken to mitigate them. National interpretations vary. The Spanish Data Protection Agency (AEPD), for example, provides detailed guidance on the processing of health and genetic data for research, emphasizing the need for robust governance and ethics committee oversight as part of the “suitable safeguards” required by the GDPR.

National Implementations and Cross-Border Divergence

While the GDPR provides a harmonized framework, Member States have significant discretion in several areas, particularly concerning the processing of special category data. This has led to a patchwork of national laws that can complicate cross-border research and international business operations. Professionals must be aware of these differences when operating in multiple European jurisdictions.

Germany: The Strict Approach to Consent

Germany is known for its particularly stringent interpretation of data protection law, rooted in its strong constitutional right to informational self-determination. The German Federal Data Protection Act (*Bundesdatenschutzgesetz* – BDSG) supplements the GDPR, and its provisions on genetic analysis under Section 22 are highly prescriptive. This section governs the processing of genetic data outside of the pure medical context and sets a very high bar for consent. It emphasizes that consent must be given in writing or, in exceptional cases, electronically, and requires specific information about the risks and consequences of the analysis. The German approach often requires a clear, affirmative, and well-documented act of consent for each specific processing activity, making “broad consent” for future research projects particularly difficult to justify.

France: The Role of the CNIL and Bioethics Laws

In France, the *Commission Nationale de l’Informatique et des Libertés* (CNIL) is the primary DPA. The processing of genetic data is heavily influenced by the French *Loi de bioéthique* (Bioethics Law), which is periodically revised. This law governs the collection and use of human biological materials and related data. A key concept in the French system is the distinction between consent for sample collection and the possibility of re-contacting the individual for future research. The law often allows for the reuse of samples and data for research purposes provided the individual has not explicitly opposed it (*opposition*). This is a different model from the explicit consent required by the GDPR for special category data, and its validity under the GDPR is a subject of ongoing discussion. The CNIL provides guidance on how to reconcile these national provisions with the GDPR’s strict requirements, often emphasizing transparency and the provision of clear, accessible information to individuals.

United Kingdom: Post-Brexit Clarity and Research Focus

Following Brexit, the UK has retained the GDPR in domestic law as “UK GDPR.” The principles are largely identical, but the UK has more flexibility to diverge over time. The UK’s approach has historically been supportive of research, with the ICO providing guidance that acknowledges the practicalities of research while upholding data protection principles. The UK’s National Health Service (NHS) has extensive legal frameworks governing the use of patient data for research and planning, including the *Health and Social Care Act 2012*. A key feature is the concept of “patient confidentiality” and the “duty of care,” which are woven into the legal justifications for processing health and genetic data. The UK has also been a leader in the use of “common law” duty of confidentiality as a legal basis, which operates alongside the GDPR. For professionals, this means that in the UK, the ethical and confidentiality-based arguments for processing genetic data are often as important as the explicit GDPR consent provisions.

Governance in Practice: From DPIAs to Privacy by Design

Compliance with the GDPR for genetic data is not a checklist exercise; it requires a holistic governance framework embedded within the organization’s operations. This involves moving from a reactive compliance model to a proactive, risk-based approach that integrates data protection from the very beginning of any project or system design.

The Indispensable Data Protection Impact Assessment (DPIA)

As mentioned, a DPIA is mandatory for any processing of genetic data that is “likely to result in a high risk to the rights and freedoms of natural persons.” A DPIA for a genetic database, for instance, must go beyond a simple description of the data. It must systematically assess:

  • Necessity and Proportionality: Why is genetic data necessary for this purpose? Could a less intrusive type of data suffice?
  • Risks to Individuals: What are the specific risks? These include discrimination (by insurers or employers), psychological distress, familial conflict, and re-identification. The assessment must consider both the likelihood and severity of these risks.
  • Mitigation Measures: What technical and organizational measures are in place to address these risks? This includes pseudonymization, encryption, access controls, data retention policies, and staff training. It also includes procedural measures like ethics committee review and clear data sharing agreements.

The DPIA is a living document. It must be reviewed and updated regularly, especially when new risks emerge (e.g., new re-identification techniques) or when the purpose of processing changes.

Privacy by Design and by Default

The principle of “Privacy by Design and by Default” (Article 25 GDPR) is particularly relevant for genetic data. It requires that data protection considerations be integrated into the design of any system, service, or product from the outset. For a developer building a platform for genomic analysis, this means:

  • Architecting the system to use the highest possible level of pseudonymization by default.
  • Implementing strict access controls so that researchers only see the data they absolutely need for their specific task (data minimization).
  • Designing user interfaces that provide clear, understandable information to individuals and facilitate the exercise of their rights (e.g., a simple mechanism to withdraw consent).
  • Ensuring that data is not stored for longer than necessary, with automated deletion processes in place.

For biotech companies, building privacy into the product is not just a compliance measure; it can be a competitive differentiator, building trust with participants, customers, and regulators.

Managing Family Implications and Third-Party Rights

The familial nature of genetic data presents one of the most difficult governance challenges. How can a company or research institution respect the rights of relatives who are not data subjects but whose privacy is impacted by the processing of an individual’s data? There is no single, easy answer under the GDPR, but best practices are emerging:

  1. Proactive Transparency: The individual providing the sample should be explicitly informed during the consent process about the potential for their data to reveal information about relatives and the limited ability to protect their relatives’ privacy.
  2. Guidance on Sharing: Provide clear guidance to individuals about how they should handle their own results and the potential consequences of sharing them with family members.
  3. Policy on Incidental Findings: Have a clear, pre-defined policy on how to handle “incidental findings” – genetic information discovered during research that may have health implications for the individual or their family. This policy should be developed with input from ethicists and clinicians.
  4. Limit Data Sharing: When sharing data with third parties, use strong contractual agreements (Data Processing Agreements) that explicitly restrict the use of the data and prohibit any attempts to re-identify individuals or contact relatives.

While the GDPR does not grant rights directly to non-data subjects, the controller has an ethical and, arguably, a legal responsibility to mitigate the foreseeable impact of its processing on the privacy of others. This is an area where the “spirit” of the law and ethical best practice must guide interpretation.

The Converging Landscape: AI, Health Data, and Future Regulations

The governance of genetic data is not static. It is being actively shaped by new technologies and new legislation, most notably the EU AI Act and the European Health Data Space (EHDS) regulation. These frameworks will interact with the GDPR to create a more complex, but also potentially more coherent, regulatory environment.

The AI Act and High-Risk AI Systems

The EU AI Act classifies AI systems used for biometric identification, categorization, and those used as safety components in critical areas like healthcare as “high-risk.” Many AI systems that process genetic data will fall into this category. For example, an AI tool that analyzes genomic data to predict disease risk or recommend personalized treatments would be considered a high-risk AI system. This triggers a separate set of obligations under the AI Act, including:

  • Risk management systems.
  • High-quality data governance practices (to avoid bias).
  • Technical documentation and record-keeping.
  • Transparency and provision of information to users.
  • <

Table of Contents
Go to Top