Insights Generative AI: UK Information Commissioner’s Office launches Call for Evidence

Contact

The UK Information Commissioner’s Office (“ICO”) has launched the first in a series of public consultations on generative AI. The ICO defines generative AI as AI models that can create new content (e.g. text, computer code, audio, music, images, and videos), and which are typically trained on extensive datasets which allows them to exhibit a broad range of general-purpose capabilities.

The ICO begins by explaining that developing generative AI involves the collecting and pre-processing of data, which is used to train the base model, the base model is then fine-tuned for deployment in a specific context and, finally, the model is improved post-deployment. Most developers rely on publicly accessible information for the training data, usually through web scraping, which involves the use of automated software to “crawl” webpages and to gather, copy, extract and store information (text, images, videos etc.) from those pages.

This first consultation focuses on how developers may establish a lawful basis for generative AI development as required under UK GDPR. The ICO states that five of the six lawful bases under UK GDPR are unlikely to be available for training AI on web-scraped data. The consultation therefore focuses on the legitimate interest basis for which the controller must satisfy a three-part test:

  • The purpose of the processing is legitimate. The purpose must be specific. The developer’s business interest could be a legitimate purpose, whereas, if the developer does not know how its AI will be used downstream, this may be hard condition to satisfy.
  • The processing is necessary for that purpose. The ICO states that, according to its understanding, most generative AI training is only possible using data obtained via large-scale scraping.
  • The individual’s interests do not override the interest being pursued. Invisible processing such as web scraping can result in the data subject losing control of their data, preventing them from exercising their rights, and AI models can potentially be used to generate inaccurate information causing distress (e.g. deepfake images ending up on a porn sit) or used by hackers. The more control the controller has over deployment of the AI, such as by using it exclusively on its own platform or giving access to third parties via APIs, the more likely it will be able to satisfy this condition. Where the developer is unable to restrict or monitor how the AI is used downstream, the harder it will be to satisfy this condition.

Further consultations will be published in the coming months on purpose limitation, accuracy and the rights of data subjects.

For more information and to respond to the Call, which closes on 1 March 2024, click here.