Data Protection & AI: Key ICO Insights

Following completion of its five-part consultation series on data protection and generative AI in 2024, the Information Commissioner's Office has published its Outcomes Report to evaluate the numerous responses received and to summarise the ICO's current position. In doing so, the ICO has confirmed its previously stated position on most matters whilst updating its position on (i) the use of the legitimate interests lawful basis for web scraping to train generative AI models and (ii) engineering individual rights into generative AI models.

In this e-update we will outline the key takeaways from the ICO's report for UK businesses who are developing or otherwise using generative AI models.

Lawfulness of using web scraped data to train generative AI models

The ICO's view is that legitimate interests is the only available lawful basis for training generative AI models using personal data collected through web scraping. However, in order to rely on legitimate interests, the AI developer/user must be able to pass a three-part test:

Purpose test: are they pursuing a legitimate interest?
Necessity test: is this processing necessary for that purpose?
Balancing test: do the individual’s interests override the legitimate interest?

The ICO consultation has shown that there are alternative data collection methods available to developers other than web scraping. Therefore, the ICO considers that if AI developers/controllers wish to use web scraping to train generative AI then they have to be able to show why these other methods of data collection are not suitable.

The ICO noted that web scraping likely fails the balancing test in most instances, as the lack of transparency means that data subjects will not be aware that their data is being used, and therefore cannot exercise their rights e.g. objecting to the use of their data. With that in mind, the ICO expects generative AI developments to take material steps to improve their approach to transparency.

If businesses wish to continue using web scraping to train generative AI models, they must consider whether web scraping is completely necessary. If there are other methods available (e.g. collecting personal data directly from people and licensing this data in a transparent manner) these methods should be instead otherwise they will likely fail the necessity and balancing aspects of the legitimate interest test.

Purpose limitation in the generative AI lifecycle

The chosen purpose for training a generative AI model must be "explicit" and "specific". This is to allow data subjects to understand why and how personal data is being processed. The ICO expects that developers using personal data to train generative AI must ensure that this is compatible with the original purpose for which the data was collected. Using personal data to develop a generative AI model will likely not fall within the reasonable expectations of individuals. This important clarification from the ICO comes as a result of many responses to the consultation proceeding on the basis that reusing data for training generative AI was related or ancillary to the original purpose for which the data had been collected.

Accuracy of training data and model outputs

The ICO makes it clear that all data collectors (including developers of generative AI models) must ensure the personal data they are using is accurate and up-to-date. The ICO also noted that the appropriate level of accuracy required for the personal data used will be determined by the specific purpose for which the developer uses the generative AI model. For example, using generative AI models to create non-factual outputs as a source of inspiration will have different accuracy requirements than models which are used as a source of factual information. Developers must assess the risk of incorrect and unexpected outputs from their generative AI models, and take action to minimise this inaccuracy e.g. by providing clear information about the model's statistical accuracy or labelling outputs as generated by AI or as potentially inaccurate.

Engineering individual rights into generative AI models

The ICO expects developers to design generative AI models that implement data protection principles effectively and put in place necessary safeguards to ensure the proper handling and transfer of personal data. Developers must ensure that they publish easily accessible information on how they handle people's personal data. The ICO has noticed a serious lack of transparency in the generative AI sector and is alarmed by how many developers are using personal data without data subjects being aware of this. Developers must clearly justify any exemptions they are trying to use, for example the ICO does not see it as competent for an AI developer to argue that the data they are using is not personal data simply because they cannot identify who the data in the model relates to. Instead, the developer must make reasonable efforts to identify the person and offer them easy ways to obtain further information about them to do so, before relying on an exemption. The ICO felt that many responses were attempting to rely on exemptions too broadly in order to undermine people's data rights.

Allocating controllership across the generative AI supply chain

The ICO is clear that a contract does not determine whether an organisation is a controller, joint controller or processor. Instead this status is determined by how the organisation deals with processing in practice. Additionally the ICO noted that the relationship between developers and third-party deployers in the "closed-source" generative AI field often involves shared objectives and influence from both parties for the processing. Therefore these relationships are more likely to represent joint controllership rather than a processor-controller relationship. The ICO also noted that in "closed-access" generative AI models the developer is unlikely to be a processor at the deployment stage, since this developer has large scale influence over data processing decisions and is more likely a controller or joint-controller.

Overall, the ICO's consultation has shown that the ICO is scrutinising generative AI models far more closely from a data protection perspective. Generative AI models (particularly those using web scraping) process and use vast amounts of data, usually without the consent or knowledge of the data subject. As the generative AI sector continues to evolve rapidly, it is clear that the ICO is attempting to lay down the ground rules for the compliance processing of personal data. UK businesses who are using or developing generative AI models should follow the ICO's guidance closely.

Should you require advice on the use on the development or use of generative AI models, please contact David Gourlay or another member of our Data Protection and Cyber Security team.

This article was co-authored by Calum Chrystal, a trainee Solicitor in our Commercial team.

Data Protection and Generative AI: Key takeaways for businesses from the ICO consultation

Lawfulness of using web scraped data to train generative AI models

Purpose limitation in the generative AI lifecycle

Accuracy of training data and model outputs

Engineering individual rights into generative AI models

Allocating controllership across the generative AI supply chain

Related Insights

What is a Data Subject Access Request (DSAR), and why do they matter?

Hey influencers, #stayhonest

Changes to data protection and marketing laws are coming: The Data (Use and Access) Bill passed

Edinburgh

Glasgow

Make an Enquiry