The Information Commissioner’s Office (ICO) recently closed its third and final call for evidence on data protection issues in relation to generative artificial intelligence (AI) models. This time, the focus was on the application of the General Data Protection Regulation (GDPR) accuracy principle to the output of AI models, and how the accuracy of the training data influences this.
This follows two previous calls for evidence: one exploring the lawful basis for training AI models using data scraped from the web, and another examining the application of the purpose limitation principle at various stages of the AI lifecycle. A draft chapter was released alongside the call to evidence. It outlines the obligation under the GDPR for personal data to be accurate and provides guidance on how developers and users of generative AI models should comply with this accuracy principle.
Differing concepts of accuracy
The draft chapter differentiates between accuracy a principle of data protection law and statistical accuracy. Under Article 5 of the UK GDPR, organisations are required to ensure that personal data is “accurate and, where necessary, kept up to date”. Organisations are also required to take “every reasonable step … to ensure that personal data that are inaccurate, having regard to the purposes for which they are processed, are erased or rectified without delay”. This differs from the concept of statistical accuracy, often used by AI engineers when modelling to refer to the comparison between the output of an AI system against correctly labelled test data.
Breaching the accuracy principle
The data protection accuracy principle helps prevent misinformation and incorrect decision-making about individuals. However, the ICO points out that it’s not essential for personal data to be perfectly up to date at all times, as it depends on the purpose of processing. Similarly, data doesn’t have to be 100% statistically accurate.
In the draft chapter, the ICO emphasises the need for developers and deployers of AI models to consider how accuracy in both training data and outputs can affect individuals. Inaccuracies can result in harmful consequences. Therefore, if inaccurate training data leads to inaccurate results that cause harm, it’s likely that both the developer and the deployer of the AI model are failing to comply with the accuracy principle.
Purpose and accuracy
The draft chapter suggests that the importance of accuracy in the design and testing of an AI model is largely dependent on the purpose for which it will be used. Accuracy is more important when the model is used for decision-making or as a source of information.
For example, an AI model designed to generate creative content, like inventing storylines involving real people, doesn’t necessarily need to be accurate. On the other hand, a model used to summarise customer complaints needs to be highly accurate to function properly. This underlines the importance of clear communication between developers, deployers, and end-users of models. Ensuring that everyone understands the intended use of the model and the level of accuracy required is essential for the model to serve its intended purpose effectively.
ICO’s Expectations for controls and communication
Developers
The ICO stresses that developers can improve their compliance with their data protection accuracy responsibilities by curating training data to ensure sufficient accuracy for the purpose of the model. Developers are expected to understand the accuracy of their AI training data, including its composition and impact on model outputs. For any models that don’t meet required statistical accuracy for their purpose, developers should put in place technical and organisational controls. This could mean placing usage restrictions in customer contracts or keeping tabs on how customers are using the model.
Developers also need to watch out for and communicate the risk of incorrect/unexpected outputs or “hallucinations”. Without these controls, there’s a risk that users might rely too much on the AI tool for information it can’t accurately provide. The ICO suggests this would be particularly important for consumer-facing tools.
Communication is key here. Developers need to be clear about what users can expect in terms of output accuracy. They also need to keep an eye on whether users’ interactions with the model are in line with these expectations. This will reassure both the ICO and users that developers are making sure the model is being used appropriately for its level of accuracy.
Deployers
For those deploying AI tools, the ICO advises careful consideration of potential risks associated with inaccurate training data and outputs. They should address these risks proactively, for example by putting restrictions on user queries or applying output filters. The ICO also highlights the importance of clear communication about the application’s statistical accuracy and its intended use. Regular monitoring of the tool’s use is encouraged to improve the information shared with the public and to revise any usage restrictions if necessary.
The consultation
The ICO was seeking feedback from organisations on the analysis presented in draft chapter, aiming to understand how organisations deploying generative AI models can test the statistical accuracy of their models and how they communicate this to their stakeholders. The ICO also wanted to hear about any procedures that could be used by AI developers to label outputs as AI-generated.
In addition to this, input was sought on technical and organisational measures that organisations could use to improve the statistical accuracy of generative AI models, and whether the proposed regulatory approach will be positive or negative for organisations.
The call for evidence ran from 23 April until 10 May 2024. It was open to developers, users, legal advisors and consultants as well as public bodies and civil rights groups. Revised chapters incorporating the findings from all three consultations are expected to be published in the coming months.