Purpose limitation and generative AI: UK Information Commissioner’s Office launches call for evidence

March 8, 2024

The Information Commissioner’s Office (“ICO”) has launched a second public call for evidence on generative AI. The first call (previously reported by Wiggin), which closed on 1 March 2023, focused on how developers may establish a lawful basis for processing personal data as part of AI development as required under UK GDPR. The second call explores the requirement under UK GDPR that personal data must be collected for “specified, explicit and legitimate purposes” (Art 5(1)(b)).

The ICO defines generative AI as AI models that can create new content (e.g. text, computer code, audio, music, images and videos). Typically, they are trained on extensive datasets which allows them to exhibit a broad range of general-purpose capabilities. Developing generative AI involves the collecting and pre-processing of data, which is used to train the base model. The base model is then fine-tuned for deployment in a specific context and further improved post-deployment. Most developers rely on publicly accessible information for the training data, usually through web scraping.

In this new call for evidence, the ICO explains that each stage of development may involve processing different types of data for different purposes and by different entities. For example, a developer collects training data and trains a generative AI model on that data but, after model training, the developer decides to develop an application with which to deploy the model to serve some business objective. The ICO states that the organisation(s) doing the model development and deployment must understand and document those two purposes separately.

The ICO states that it considers that collating repositories of web-scraped data, developing an AI model, and developing an application based on such a model each constitute different purposes under data protection law. Further, developers who want to re-use training data need to consider whether such re-use is compatible with the original purpose for collecting the data. Models may also give rise to many different applications (e.g. a large language model may be used to answer customer emails or draft contracts).

The requirement for purposes to be defined specifically and clearly enables the developer to assess how it can meet the other data protection principles, such as in relation to data protection by design and default, data minimisation, lawfulness, transparency and fairness, and to allocate controller and processor responsibility for the different stages of the AI lifecycle. The ICO acknowledges that purposes in the earlier stages of the generative AI lifecycle such as the initial data collection may be less easy to define precisely than those closer to the deployment end. Nevertheless, defining a purpose at the initial stages of the generative AI lifecycle involves considering what types of deployments the model could result in, and what functionality the model will have.

For more information and to respond to the consultation, which closes on 12 April 2024, click here.

Expertise

Wiggin's expertise, delivered direct to you