< All Topics
Print

What to Automate vs What to Keep Human in Institutional Work

Institutional work across Europe, spanning public administration, finance, healthcare, and critical infrastructure, is undergoing a profound transformation driven by automation and artificial intelligence. The promise of efficiency, scalability, and data-driven precision is compelling, yet the path to implementation is fraught with regulatory complexity and operational risk. The core challenge is not merely technical feasibility but a strategic and legally compliant allocation of tasks between automated systems and human professionals. A misstep in this allocation can lead to regulatory breaches under the GDPR, the AI Act, or sector-specific directives, and can erode public trust and institutional integrity. This analysis provides a decision framework for navigating this complex terrain, focusing on the practical interplay of technology, law, and human judgment.

The Regulatory Landscape: A Foundation for Decision-Making

Any framework for automation in a European context must be built upon a solid understanding of the prevailing regulatory architecture. This is not a monolithic structure but a layered system of principles, specific prohibitions, and risk-based obligations. The General Data Protection Regulation (GDPR) established foundational principles for data processing, many of which are directly applicable to automated decision-making. The Artificial Intelligence Act (AI Act) then provides a comprehensive, risk-based framework specifically for AI systems. Understanding how these two regimes interact is the first step in determining what can be automated.

The Principle of Human Oversight

Both the GDPR and the AI Act place significant emphasis on the role of human judgment. This is not a mere suggestion; it is a core legal requirement for many high-stakes applications. Under GDPR Article 22, individuals have a right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning them or similarly significantly affects them. This immediately establishes a baseline: any automated process that directly impacts an individual’s rights, obligations, or access to services must include meaningful human intervention. The human reviewer must have the authority and competence to investigate the system’s output, and they must be able to reverse the decision.

The AI Act codifies and expands this concept. For high-risk AI systems (e.g., those used in hiring, credit scoring, or critical infrastructure), the Act mandates specific requirements for human oversight. This is defined as “effective and appropriate human oversight through measures to be identified by the provider.” The goal is to prevent or minimise the risk to health, safety, or fundamental rights. The human-in-the-loop is not simply a rubber stamp; they must be in a position to understand the capacities and limitations of the AI system, monitor its operation, and have the ability to intervene, including the power to override a decision. When considering automation, the first question must be: does the task involve significant risk to fundamental rights? If so, human oversight is not an option, but a legal prerequisite.

Distinction Between EU-Level Regulations and National Implementations

While the AI Act and GDPR are directly applicable EU Regulations (meaning they apply uniformly across member states), their implementation and interpretation are influenced by national laws and regulatory bodies. For instance, the GDPR allows member states to specify certain rules, such as the legal age for a child to consent to information society services (between 13 and 16). Similarly, the AI Act will be implemented through national market surveillance authorities. A financial institution in Germany (BaFin) may interpret the “human oversight” requirements for a credit scoring AI differently than a similar institution in France (ACPR), even though they are both subject to the same EU regulation. Therefore, a robust automation strategy must be adaptable to national supervisory practices and guidance. It is not enough to be compliant with the letter of the EU law; one must also be prepared for the specific enforcement culture of the relevant national authority.

A Decision Framework for Automation

To move from abstract principles to practical implementation, institutions need a structured framework. This framework should be applied during the design phase of any project involving automation, before significant investment is made. It is a multi-stage process that assesses the task, the data, the system, and the context.

Stage 1: Task Decomposition and Criticality Analysis

The first step is to break down a business process into its constituent tasks. Not all tasks within a single workflow are equal. For each task, ask the following questions:

  • Is the task deterministic or probabilistic? Deterministic tasks, such as data entry validation against a known schema or routing a document based on keywords, are strong candidates for automation. Probabilistic tasks, such as assessing the credibility of a claim or interpreting the nuance of a legal text, require more caution.
  • What is the impact of an error? A minor error (e.g., a slight delay in an internal report) may be acceptable. A critical error (e.g., incorrectly denying a social benefit or misdiagnosing a medical image) is not. The potential for harm must be quantified.
  • Does the task involve judgment, discretion, or empathy? Tasks that require understanding context, applying ethical considerations, or communicating with emotional intelligence are fundamentally human domains. For example, a chatbot can answer FAQs, but it cannot replace a social worker assessing a complex family situation.

Key Distinction: Automation is best suited for tasks that are high-volume, repetitive, and rule-based, where the criteria for success are clearly defined and measurable. It struggles with ambiguity, context-dependence, and tasks where the “right” answer is subjective or requires ethical deliberation.

Stage 2: Data and Algorithmic Risk Assessment

The quality and nature of the data are paramount. The AI Act explicitly requires that high-risk AI systems be trained on “relevant, representative, free of errors and complete” data sets. This is not just a technical best practice; it is a legal obligation. Before automating a task, you must assess:

  • Data Bias: Does the historical data reflect societal biases? If a hiring algorithm is trained on past hiring decisions that favoured a particular demographic, it will perpetuate and even amplify that bias. This is a direct violation of EU non-discrimination law and the AI Act’s requirements for robustness and accuracy.
  • Data Provenance and Quality: Where does the data come from? Is it legally obtained? Is it accurate and up-to-date? Automating a process with poor-quality data will simply produce poor-quality results at scale.
  • Explainability: For many automated decisions, especially those with legal or significant effects, you must be able to explain the logic behind the outcome. Highly complex “black box” models may be technically superior for some tasks but legally unusable for others if their reasoning cannot be made transparent to a human reviewer or a data subject exercising their right to explanation under GDPR.

Stage 3: The “Meaningful Human Intervention” Test

For any task that is not fully automated, you must define the nature of human involvement. The AI Act and GDPR require “meaningful” intervention. This means the human role must be substantive. Consider these levels of human involvement:

  1. Human-in-the-Loop (HITL): The system cannot make a final decision without human approval. The human reviews and validates every output. This is appropriate for the highest-risk decisions, such as approving a loan, making a medical diagnosis, or issuing a fine.
  2. Human-on-the-Loop (HOTL): The system operates autonomously in real-time, but a human actively monitors its operation and can intervene at any time. This is common in industrial settings or for real-time fraud detection systems. The human is a supervisor, not a direct approver of each action.
  3. Human-in-Command (HIC): The human designs the system, sets the parameters, and is responsible for its overall governance and deployment, but does not oversee every individual decision. This is appropriate for lower-risk automation where the system’s parameters are well-understood and its potential for error is low.

Your framework must specify which level of human involvement is required for each automated task and ensure that the system is designed to support that level of interaction effectively. A human reviewer who is simply presented with a final score and a “recommendation” without context or supporting evidence is not providing meaningful oversight.

Practical Examples and Risk Flags

Applying this framework to real-world scenarios clarifies its utility and highlights common pitfalls.

Example 1: Human Resources (Recruitment and Performance Management)

What to Automate: The initial screening of CVs for non-negotiable, objective criteria (e.g., “must hold a specific professional certification,” “must have a minimum of 5 years of experience in a defined role”). Scheduling interviews. Collating feedback from standardised interview scorecards. These tasks are repetitive, rule-based, and low-risk if the criteria are non-discriminatory.

What to Keep Human: The entire assessment of “soft skills,” cultural fit, and potential. The evaluation of a candidate’s portfolio or work samples that require subjective judgment. The final hiring decision. Any process that involves analysing video interviews with emotion recognition software is a high-risk activity under the AI Act and is fraught with ethical and legal peril due to the lack of scientific consensus and high potential for bias.

Risk Flags for Over-Automation:

  • Using an algorithm to rank candidates based on a “suitability score” derived from historical hires, leading to a homogenisation of the workforce and potential discrimination against non-traditional candidates.
  • Automating performance management by relying solely on quantitative metrics (e.g., number of calls handled, lines of code written) without human context for team dynamics, project complexity, or personal circumstances.
  • Failure to provide a human-mediated explanation to a rejected candidate who requests it under GDPR.

Example 2: Public Administration (Social Benefit Allocation)

What to Automate: Verification of applicant data against official databases (e.g., tax records, residency status). Calculation of standard benefit amounts based on clearly defined rules and declared income. Flagging applications with missing or inconsistent information for human review. These tasks are administrative and data-intensive, making them suitable for automation.

What to Keep Human: The assessment of complex or exceptional cases where rules do not apply cleanly (e.g., a sudden change in life circumstances, caring for a disabled relative not formally documented). The decision to suspend or revoke benefits, especially in cases of suspected fraud. Any communication regarding the outcome of an application that requires empathy or nuanced explanation. The investigation of flagged inconsistencies.

Risk Flags for Over-Automation:

  • Implementing a “black box” fraud detection system that automatically suspends benefits without a clear, investigable reason, leading to wrongful deprivation of essential resources. This is a clear violation of the principle that automated decisions with legal effects must be explainable and contestable.
  • Using predictive models to assess the “risk” of an applicant committing fraud based on demographic or geographic data, which is a form of prohibited social scoring under the AI Act and a violation of fundamental rights.
  • Reducing human caseworker capacity to a point where meaningful review of flagged cases is impossible, effectively making the “human review” a rubber-stamp exercise.

Example 3: Finance (Credit Scoring and Loan Approval)

What to Automate: The aggregation of data from various sources (credit bureaus, bank statements). The calculation of a base score based on established financial metrics (e.g., debt-to-income ratio, payment history). The pre-qualification of applicants for standardised loan products.

What to Keep Human: The final approval decision for any non-standard or high-value loan. The assessment of applications that fall into a “grey area” where the automated score is borderline. The handling of appeals from rejected applicants. The design and regular auditing of the scoring model itself to ensure it does not perpetuate bias.

Risk Flags for Over-Automation:

  • Using alternative data sources (e.g., social media activity, online browsing history) in an automated scoring model without clear transparency to the applicant and a robust assessment of fairness and relevance. The use of such data is under intense scrutiny by regulators.
  • Allowing an automated system to make the final decision on a loan application without a clear and easily accessible path for the applicant to have the decision reviewed by a human. This is a direct conflict with GDPR Article 22.
  • Failing to conduct regular, independent audits of the algorithm to check for discriminatory outcomes against protected groups (e.g., based on gender, ethnicity, or origin), which is a requirement under the AI Act for high-risk systems.

Implementing the Framework: Governance and Continuous Monitoring

A decision to automate is not a one-time event. It requires a robust governance structure and a commitment to continuous monitoring and adaptation. The AI Act formalises this through requirements for a risk management system, data governance, and post-market monitoring for high-risk AI systems.

Establishing an AI Governance Body

Organisations should establish a cross-functional team or committee responsible for overseeing the automation lifecycle. This body should include representatives from legal, compliance, ethics, IT, and the relevant business units. Their responsibilities should include:

  • Reviewing and approving proposals for new automation projects based on the decision framework.
  • Setting standards for data quality and algorithmic transparency.
  • Overseeing the training of staff who will be working with or overseeing automated systems.
  • Managing the incident response process for when an automated system fails or produces an unexpected outcome.

The Importance of Documentation

Under the AI Act, providers of high-risk AI systems must maintain extensive technical documentation. Even for lower-risk systems, good practice dictates that you must be able to demonstrate compliance. This documentation should include:

  • The results of the initial risk assessment and the decision on which tasks to automate.
  • The characteristics of the datasets used for training, validation, and testing.
  • The logic and algorithms used, in a way that is sufficiently clear to enable oversight.
  • The measures taken for human oversight and the specific instructions for the human operators.
  • A record of all decisions made by the system that have legal or significant effects, including the input data and the human reviewer’s actions.

This documentation is not just for regulators. It is a vital tool for internal auditing, for troubleshooting, and for defending the institution’s decisions if they are challenged.

Post-Deployment Monitoring and Feedback Loops

An AI system’s performance can degrade over time as the real-world data it encounters diverges from its training data (a phenomenon known as “model drift”). Furthermore, its impact on individuals and society must be continuously assessed. A robust automation strategy includes:

  • Performance Monitoring: Tracking key metrics for accuracy, error rates, and operational performance.
  • Fairness Monitoring: Regularly analysing outcomes across different demographic groups to detect the emergence of new biases.
  • Human Feedback Mechanisms: Creating clear channels for human operators to report issues, question system outputs, and suggest improvements. The insights from the humans on the loop are invaluable for identifying systemic problems.
  • Regular Re-assessment: Periodically re-evaluating the initial decision to automate. Has the task changed? Have the risks evolved? Is the system still fit for purpose?

Conclusion: The Symbiotic Relationship

The question is not whether to automate, but how to automate responsibly. The European regulatory framework does not prohibit automation; it channels it towards applications that are safe, ethical, and respectful of fundamental rights. The decision framework outlined here—based on task decomposition, risk assessment, and the definition of meaningful human oversight—provides a practical path forward. It moves the conversation away from a simple binary of human versus machine and towards a model of symbiotic collaboration. The most effective and compliant institutional systems will be those where automation handles the scale and speed of data processing, while human professionals provide the judgment, context, and ethical stewardship that machines cannot. This approach not only mitigates legal and operational risk but also builds the public trust that is essential for the long-term adoption of transformative technologies.

Table of Contents
Go to Top