Evidence in Liability Disputes: Logs, Versions, and Records
When an automated system causes harm, the immediate question is often deceptively simple: who is responsible? Yet, beneath this question lies a complex evidentiary landscape where digital traces, version histories, and procedural records become the central objects of dispute. In the context of European liability law, the ability to reconstruct the operational state of a system at a specific moment in time is not merely a technical exercise; it is a legal necessity. The parties involved—be they claimants seeking redress, manufacturers defending their products, or deployers managing operational risk—must navigate a framework where the absence of evidence can be as determinative as its presence. This article examines the types of evidence that are critical in liability disputes involving automated systems, focusing on the practical interplay between technical documentation, data records, and the evolving regulatory requirements under the EU’s AI Act and the Product Liability Directive.
The evidentiary challenge is amplified by the nature of modern AI systems. Unlike deterministic software, where a given input reliably produces a known output, machine learning models are characterized by probabilistic outputs and emergent behaviours shaped by vast datasets and complex architectures. This inherent opacity, often referred to as the “black box” problem, does not absolve parties from establishing a causal link between a system’s operation and a resulting damage. Instead, it shifts the burden towards meticulously documented development and operational processes. European legal doctrine, particularly in the realm of product liability and tort law, is increasingly reliant on a concept known as the “state of the art.” This principle, embedded in the Product Liability Directive, dictates that a producer is not liable if the state of the scientific and technical knowledge at the time of placing the product on the market was not such as to allow the existence of the defect. Proving what the state of the art was, and demonstrating that a developer adhered to it, is an evidence-intensive endeavour.
The Primacy of Operational Logs and System Records
In the immediate aftermath of an incident involving an automated system, the first and most crucial source of evidence is the system’s own record of its operations. Logs are the digital equivalent of a flight recorder, capturing the sequence of events, inputs received, decisions made, and actions taken. For a liability analysis, these records are indispensable for establishing a timeline and verifying the system’s behaviour. A dispute may hinge on whether a robotic arm in a manufacturing plant received a correct sensor reading or if an autonomous vehicle’s perception module failed to classify an object. Without granular, time-stamped logs, such determinations devolve into speculative arguments.
Technical Composition and Integrity of Logs
From a technical and legal standpoint, not all logs are created equal. A robust logging framework must capture multiple layers of information. This includes system-level logs detailing hardware status and resource usage, application-level logs detailing the logic flow and key decision points, and, for AI systems, model-specific logs detailing inference parameters, confidence scores, and the specific data vectors used for a given decision. Critically, the integrity of these logs is paramount. In a liability dispute, the opposing party will inevitably scrutinize the logs for signs of tampering, deletion, or selective recording. This is where established standards for data integrity, such as cryptographic hashing and write-once-read-many (WORM) storage, become relevant. A court or regulator will need to be convinced that the logs presented are an authentic and complete record of the system’s operation.
A log that has been altered, even with good intention, loses its evidentiary value and can lead to a presumption of fault against the party responsible for its custody.
Furthermore, the retention period for these logs is a critical compliance point. While the AI Act does not specify a universal retention period for logs, it mandates that high-risk AI systems be designed to enable the automatic recording of events (logs). The obligation to retain these logs falls upon the deployer, who must ensure they are available for scrutiny for a period appropriate to the system’s risk profile and potential statute of limitations for liability claims. In sectors like healthcare or finance, where the impact of an error can be long-lasting, retention policies may need to extend for many years. National laws on product liability may also influence these periods, as the time limit for bringing a claim can vary.
Interpreting Logs in the Context of AI Behaviour
A significant challenge arises when interpreting logs from non-deterministic systems. A log entry stating that an AI model classified an image as a “pedestrian” with a 60% confidence score presents an immediate analytical problem. Was the system operating within acceptable parameters? Does a 60% confidence score in a safety-critical application represent a design defect or a foreseeable operational limitation? The answer does not lie within the log itself but in the contextual evidence surrounding it. This includes the system’s design specifications, its intended purpose, and the performance metrics established during its testing and validation phase. The log is the “what,” but the “why” must be constructed from a broader evidentiary base, including test reports and risk assessments.
Version Control: Reconstructing the Digital Provenance
Liability disputes often extend beyond the moment of the incident to the entire lifecycle of the system. A key question is whether a defect was introduced during development, deployment, or a subsequent update. This is where version control systems, typically used for software and model development, become a form of legal record. Versioning provides an immutable history of changes, allowing an analyst to reconstruct the exact state of the code, algorithms, and datasets that constituted the system at any given point in time.
The Role of Git and Model Registries
Standard software development practices, such as using Git for code versioning, are now essential for legal defensibility. Each commit, with its associated message and author, creates a breadcrumb trail of development decisions. For AI systems, this practice must be extended to the models themselves. Model registries serve this purpose, tracking different versions of a trained model, the data it was trained on, and the hyperparameters used. In a dispute, this allows for precise questions to be answered: Was the version of the model deployed at the time of the incident the one that had been approved for production? Did a developer push an untested, experimental model into a live environment? The absence of a rigorous versioning system creates a significant evidentiary vacuum, which a court may interpret unfavourably for the system’s developer or deployer.
Connecting Code, Data, and Configuration
The evidentiary value of versioning is only realized when it is possible to draw a clear line from a specific version of a model back to the exact dataset it was trained on and the configuration it was running. This “digital provenance” is a core requirement for reproducibility and auditability. For example, if a high-risk AI system used for credit scoring is accused of discriminatory outcomes, an audit would require linking the specific model version in use to the training data and feature engineering steps. If the developer cannot provide this linkage, it becomes difficult to defend against claims of bias. The AI Act reinforces this by mandating that high-risk systems be designed to ensure that automatically generated logs are of sufficient quality to facilitate the traceability of the system’s outputs. This implicitly requires a coherent versioning and logging strategy that spans the entire AI stack.
Dataset Records and the Provenance of Training Data
For many AI systems, the root cause of a failure or biased outcome lies not in the code but in the data used to train it. Consequently, records pertaining to the datasets used for training, validation, and testing are of central importance in liability disputes. These records must provide a clear picture of the data’s origin, composition, and processing history. The legal concept of “data quality” is becoming a cornerstone of both liability and regulatory compliance.
Documenting Data Lineage and Curation
Provenance records should detail where the data came from, under what legal basis it was collected or licensed, and what transformations were applied to it. This includes documenting the removal of personally identifiable information, the handling of missing values, and the methods used for labelling. In a dispute, a claimant may argue that a system’s failure was a direct result of a non-representative or biased dataset. The developer’s ability to demonstrate due diligence in data sourcing and curation is a primary line of defense. This involves maintaining records of data collection agreements, data sheets (such as “Datasheets for Datasets”), and documentation of any data augmentation or synthetic data generation techniques used. The AI Act’s requirements for data governance are a direct reflection of this evidentiary need, obligating providers to ensure that training, validation, and testing data are relevant, representative, free of errors, and complete.
Comparing National Approaches to Data in Liability
While the AI Act provides a harmonized framework at the EU level, national tort law still plays a significant role in how evidence is weighed. In some jurisdictions, the burden of proof for a defect may be partially reversed if the claimant can demonstrate a high degree of probability that the harm stemmed from a lack of safety. In such cases, the defendant’s inability to provide clear records of their data governance practices can be fatal to their case. For instance, German courts, applying principles from the Product Liability Act (ProdHaftG), may be particularly rigorous in demanding that a producer demonstrate they have taken all necessary measures to prevent a defect. This includes proving that the data used was fit for purpose. In contrast, other legal systems may place a heavier initial burden on the claimant to directly link the harm to a specific defect in the data or the model. Understanding these nuances is crucial for any entity operating across multiple European markets.
Test Reports, Validation Records, and Conformity Assessments
Before a high-risk AI system is placed on the market or put into service, it must undergo a series of evaluations to verify its safety and compliance. These pre-deployment records are a vital form of ex ante evidence, demonstrating that the provider fulfilled their duty of care. They serve as a benchmark against which the system’s operational performance can be judged.
Beyond Performance Metrics: The Content of a Test Report
A comprehensive test report for an AI system goes far beyond simple accuracy scores. It should detail the testing methodology, the environments in which the tests were conducted (e.g., simulation, sandboxed real-world), the specific scenarios covered, and the results of stress testing and adversarial robustness checks. Crucially, it must also document the system’s limitations and failure modes. A test report that transparently states, “System X fails to operate correctly in conditions of heavy rain,” provides a powerful defense for a producer if an incident occurs in such conditions. Conversely, a test report that is silent on known limitations, or one that overstates the system’s capabilities, can be used as evidence of a design defect. The AI Act mandates that high-risk AI systems be subjected to rigorous testing to ensure they are robust against reasonably foreseeable errors and adversarial attacks.
The Significance of EU-Type Examination and CE Marking
For certain high-risk AI systems, conformity assessment by a third-party notified body is required. This process results in an EU-type examination certificate and the right to affix the CE mark. This is a powerful piece of evidence. It signifies that an independent expert has reviewed the technical documentation and found the system to be in compliance with the relevant EU legislation. In a liability dispute, a CE certificate creates a strong, though not irrebuttable, presumption of conformity. It shifts the evidentiary burden significantly. A claimant seeking to establish liability would typically need to prove that the system did not actually conform to the specifications reviewed by the notified body, or that it was used in a way that deviated from its intended purpose. The distinction between EU-level regulation and national implementation is key here: while the CE mark is recognized across the Union, the specific liability consequences of non-conformity can be interpreted through national law.
Approvals, Authorizations, and Operational Mandates
Finally, in certain highly regulated sectors, an AI system’s operation is contingent upon specific approvals or authorizations from public authorities. These documents are not merely administrative hurdles; they are formal determinations of fitness-for-purpose and form a critical part of the evidentiary record. Examples include the approval of an AI-based diagnostic tool by a national medical device authority or the authorization of an autonomous vehicle for road testing by a transport ministry.
The existence of such an approval can be a decisive factor in a liability case. It can serve as evidence that the system was state-of-the-art at the time of its authorization and that the risks were deemed acceptable by the competent authority. This can engage legal principles such as the “state of the art” defense or the “development risk” defense. However, it is crucial to understand that an administrative approval does not grant absolute immunity. If a system is operated outside the scope of its authorization, or if it is later discovered that the approval was based on flawed information provided by the applicant, the protective value of that approval is severely diminished. The evidentiary record must therefore show not only that approvals were sought and obtained, but that the system was operated in strict accordance with the conditions of those approvals. This requires meticulous record-keeping of operational parameters, maintenance schedules, and any incidents that occurred during the approval process.
