EDPS R# EDPS Releases Revised Guidelines on Generative AI: Strengthening Data Protection in the Digital Era

The European Data Protection Supervisor (EDPS) has today published the revised and expanded version of its guidelines on generative artificial intelligence and personal data protection, providing practical and detailed guidance for EU institutions, bodies, offices and agencies (EUIs).

A Strategic Document for the AI Era

The “Orientations for ensuring data protection compliance when using Generative AI systems (Version 2)” represent a significant evolution from the first 2024 version. This update accounts for the technological evolution of generative AI systems, their increasing use by EU institutions, and the results of the EDPS’s monitoring and oversight activities. It’s important to emphasize that these guidelines are issued by the EDPS in its role as a data protection supervisory authority, not as a market surveillance authority under the AI Act, and are without prejudice to the Artificial Intelligence Regulation.

Understanding Generative AI: From Concepts to Practice

The document first clarifies the conceptual hierarchy, ranging from general Artificial Intelligence to Machine Learning and Deep Learning, down to Generative AI and Large Language Models (LLMs). The latter are machine learning models trained on vast amounts of textual data that can generate natural language responses by learning patterns and relationships among words and phrases. The EDPS highlights a crucial distinction: AI models do not constitute autonomous systems but are fundamental components of a more complex system that includes other essential components.

The lifecycle of a generative AI system goes through five main phases.

The five lifecycle phases:

Scope: defining the use case and objectives
Select: choosing or creating the most appropriate model
Adapt: training and fine-tuning with specific data
Evaluate: establishing metrics to assess accuracy and performance
Integrate: optimization, deployment, and continuous monitoring

Each phase may involve the processing of personal data with distinct purposes and risks, requiring separate analysis from a data protection perspective.

Roles and Responsibilities in the AI Supply Chain

One of the most complex and critical aspects concerns the determination of roles under Regulation (EU) 2018/1725. The EDPS clarifies that the terms “provider”, “developer”, and “deployer”, commonly used in the technology sector and in the AI Act, do not correspond to data protection concepts.

Roles under Regulation (EU) 2018/1725:

Controller: the entity that determines the purposes and essential means of processing personal data
Joint controller: entities that jointly determine purposes and means for a common purpose
Processor: an entity that processes personal data on behalf of the controller, without determining the purposes

This distinction is fundamental because it determines each operator’s specific obligations.

The controller is the entity that determines the purposes and means of processing personal data. In the context of generative AI, it may be the organization that decides to develop an AI system, that uses a service provider for development, or that implements a generative system for a specific purpose. EU institutions must conduct a thorough case-by-case assessment and document the results in their records of processing activities under Article 31 of the Regulation, recognizing that the processing of personal data involves multiple entities and various purposes at different stages of the AI model’s lifecycle.

Determining the legal basis is one of the most significant challenges for EUIs in implementing generative AI systems. The document emphasizes that a distinct legal basis must be identified for each individual processing operation, with separate legal bases for the development and deployment phases, as the purposes of processing are different in each phase.

The most commonly applicable legal basis for EUIs is Article 5(1)(a) of the Regulation, concerning the necessity for the performance of a task carried out in the public interest or in the exercise of official authority. However, when institutions rely on this provision, they must demonstrate that there is a public-interest task related to their core functions, or that they are exercising official authority through specific powers, tasks, and duties vested in them. The legal basis for processing must be laid down in Union law, which may provide additional instructions regarding aspects of processing such as data categories or retention periods.

Regarding consent as a legal basis, the EDPS highlights that it may apply only in limited circumstances.

Requirements for valid consent:

Must be free: without coercion or conditioning
Must be specific: for determined and clear purposes
Must be informed: the data subject must understand what they are authorizing
Must be unambiguous: requires a clear affirmative action
Must be revocable: at any time and as easily as it was given

Given how generative AI systems are trained and the sources of their training data, including publicly available information, it would be difficult to obtain individuals’ consent.

Web Scraping: A Practice Requiring Extreme Caution

The EDPS expresses significant concerns about the use of web scraping techniques for the collection of personal data. The document clarifies that the processing of publicly available personal data remains subject to European data protection legislation. The use of web scraping techniques to collect data from websites, and their use for training purposes, must comply with all relevant data protection principles, such as lawfulness, transparency, data minimisation, and the principle of accuracy.

A primary challenge to ensuring the legality of web scraping is establishing a valid legal basis under Article 5 of the Regulation. While web scraping per se is not prohibited, EUIs may face significant challenges in identifying an appropriate legal basis for this data collection technique. The EDPS recommends that EUIs use different sources of personal data where possible.

Recommended safeguards for web scraping:

Limit collection to freely accessible data
Collect only data manifestly made public by the individual
Implement enhanced transparency mechanisms
Provide simplified procedures for exercising rights
Carefully assess the necessity and proportionality of collection

Purpose Limitation: A Principle to Apply at Every Stage

The power of generative AI models lies in their adaptability and versatility across numerous fields. However, this broad functionality should not come at the expense of data protection principles, particularly the principle of purpose limitation. The lifecycle of a generative AI system comprises distinct stages that may involve the processing of personal data for different purposes, and at each stage, data protection principles must be respected and a purpose must be defined for each processing operation.

The EDPS recognizes that defining a specific and clear purpose for a generative AI model during its development phase might be more challenging than at later deployment stages. It is inherent for generative AI systems to be open-ended and serve different applications. However, the purpose of the collection must be clearly and specifically identified. Therefore, the purpose should be defined as early as possible in model development, considering potential use cases and intended functionalities. Controllers should have a clear context for the deployment of the AI model and must include this in the details of the processing purpose when completing their records.

Minimisation and Accuracy: Quality Before Quantity

The principle of data minimisation requires that controllers ensure that personal data undergoing processing is adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed. In the context of artificial intelligence, controllers have an obligation to limit the collection and processing of personal data to what is necessary for the processing and to avoid indiscriminate processing. This obligation covers the entire lifecycle of the system, and personal data should not be collected and processed indiscriminately.

The EDPS emphasizes a fundamental point: the use of large amounts of data to train a generative AI system does not necessarily imply greater effectiveness or better results. The careful design of well-structured datasets, to be used in systems that prioritise quality over quantity, following a properly supervised training process and subject to regular monitoring, is essential to achieving the expected results, not only in terms of data minimisation but also in terms of output quality and data security.

Regarding the accuracy principle, generative AI systems may use vast amounts of information, including personal data, throughout their lifecycle. Controllers must ensure data accuracy at all stages of the development and use of a generative AI system, implementing the necessary measures to integrate data protection by design. Despite efforts to ensure data accuracy, generative AI systems remain prone to inaccurate results that can affect individuals’ fundamental rights and freedoms, a phenomenon known as “hallucinations”. EUIs should carefully assess data accuracy throughout the lifecycle of generative AI systems and reconsider their use if accuracy cannot be maintained.

Data Subject Rights: Technical Challenges and Practical Solutions

Individuals whose personal data is processed at any stage of a generative AI system’s lifecycle have rights over their personal data.

Data subject rights to be guaranteed:

Right to information: understand how data is being used
Right of access: obtain confirmation and a copy of one’s data
Right to rectification: correct inaccurate or incomplete data
Right to erasure: obtain deletion of one’s data
Right to object: object to processing in certain circumstances
Right to restriction: limit the processing of one’s data
Right to data portability: receive data in a structured format
Right to withdraw consent: withdraw previously given consent

EUIs developing or deploying generative AI systems shall maintain effective procedures to enable individuals to exercise these rights whenever personal data is processed.

The unique characteristics of generative AI systems present significant challenges to the exercise of individual rights. Particularly in the context of requests for training or post-training data, it may be challenging to identify the individual the training data concerns. That is because generative AI models, like LLMs, are often trained on diverse, vast datasets from multiple sources, making it extremely difficult to determine whether a specific individual’s personal data was included in the training dataset and, if so, to trace it. It is also complex to manage personal data generated through inference, as generative AI systems create new inferred information based on learned patterns.

Regarding the exercise of the right to erasure or rectification, EUIs could be concerned that erasing or rectifying an individual’s data from the training dataset could affect the model’s performance. However, removing or changing a data point from a massive training dataset is unlikely to affect the generative AI model’s ability to fulfill its training purposes, since ample data from other individuals is still being processed. The primary challenge would be more technical and computational, related to removing the concerned data.

Automated Decisions and Bias: Constant Vigilance

The use of a generative AI system does not necessarily imply automated decision-making under Article 24 of the Regulation. However, there are generative AI systems that provide decision-making information generated by automated means, including profiling and/or individual assessments. Depending on the use of such information in making the final decision by a public service, EUIs
may fall within the scope of application of Article 24 of the Regulation, so they need to ensure that individual safeguards are guaranteed, including at least the right to obtain human intervention on the part of the controller, to express their point of view, and to contest the decision.

Artificial intelligence systems in general tend to amplify existing human biases and may incorporate new ones, creating new ethical challenges and legal compliance risks. Biases can arise at any stage of the development of a generative AI system, including through training datasets, algorithms, or the people who develop or use the system. Biases in generative AI systems can lead to significant adverse consequences.

Main sources of bias in generative AI:

Existing patterns and stereotypes present in training data
Lack of representativeness of certain groups or populations
Inclusion or omission of relevant variables in datasets
Methodological errors in data collection and preparation phases
Biases introduced during monitoring and evaluation phases
Unconscious prejudices of developers and users

The datasets used to create and train models must provide adequate and fair representation of the real world, without bias that could increase the potential harm to individuals or collectives not well represented in the training datasets. EUIs, as public authorities, should put in place safeguards to avoid overreliance on the results provided by the systems that can lead to automation and confirmation biases.

Security and Accountability: Documented Responsibility

The use of generative AI systems can amplify existing security risks or create new ones.

Main security risks specific to generative AI:

Model inversion attacks: extraction of sensitive information through model reverse-engineering
Prompt injection: introduction of malicious instructions that alter system behavior
Jailbreaking: techniques to circumvent implemented safeguards and limits
Data poisoning: contamination of training data with manipulated information
Memorization and reproduction: risk of unintentional reproduction of personal data from the training set

Controllers should implement specific controls to address these vulnerabilities, enabling continuous monitoring and assessment of their effectiveness.

Compared to traditional systems, generative AI-specific security risks may arise from unreliable training data, system complexity, opacity, difficulties in conducting proper testing, and vulnerabilities in system safeguards.

Controllers should, in addition to traditional security controls for IT systems, integrate specific controls tailored to the known vulnerabilities of these systems—model inversion attacks, prompt injection, jailbreaks—in a way that facilitates continuous monitoring and assessment of their effectiveness. EUIs should train their staff to identify and address security risks associated with the use of generative AI systems. As risks evolve quickly, regular monitoring and updates of the risk assessment are needed.

The principle of accountability requires responsibilities to be clearly identified and respected amongst the various actors involved in the generative AI model supply chain. EUIs must document all implemented mitigation measures and the final assessment that the generative AI is trustworthy and compliant with the Regulation, thereby ensuring full accountability. That includes maintaining traceable records of personal data processing and managing datasets in a way that allows traceability of their use.

The Central Role of the Data Protection Officer

In the context of the implementation by EUIs of generative AI systems that process personal data, it is essential to ensure that DPOs, within their role, advise and assist independently on the application of the Regulation and have a proper understanding of the lifecycle of the generative AI system that the institution is considering to procure, design, or implement. The DPO should be involved in reviewing compliance issues arising from data-sharing agreements with model providers.

Essential DPO tasks in the context of generative AI:

Provide advice on the application of the Regulation at all lifecycle stages
Understand thoroughly the technical functioning of the system
Assist in conducting DPIAs and risk management
Verify compliance of agreements with suppliers and developers
Monitor compliance with data protection by design and by default principles
Act as a contact point for data subjects and supervisory authorities

From an organizational perspective, implementing generative AI systems in compliance with the Regulation should not be a one-person effort. There should be continuous dialogue among all stakeholders throughout the product’s lifecycle. Therefore, controllers should liaise with all relevant functions within the organization, notably the DPO, Legal Service, IT Service, and the Local Informatics Security Officer (LISO), to ensure that the institution operates within the parameters of trustworthy generative AI and good data governance, and complies with the Regulation. The creation of an AI task force, including the DPO, and the preparation of an action plan —including awareness-raising actions at all levels of the organization —and internal guidance may contribute to achieving these objectives.

Impact Assessment: A Fundamental Tool

The Regulation requires a DPIA to be carried out before any processing operation that is likely to result in a high risk to the fundamental rights and freedoms of individuals. The Regulation points out the importance of carrying out such an assessment when new technologies are to be used or are of a new kind, for which no assessment has been carried out by the controller before, as in the case of generative AI systems.

The controller is obliged to seek the DPO’s advice when conducting a DPIA. Based on the assessment, appropriate technical and organizational measures must be taken to mitigate the identified risks, given the responsibilities, the context, and the available state-of-the-art measures. All actors involved in the DPIA must ensure that any decision and action are appropriately documented, covering the entire lifecycle of the generative AI system, including actions taken to manage risks and the subsequent reviews to be carried out.

Anonymisation: A Complex Matter

The EDPS, in line with EDPB Opinion 28/2024, clarifies that an AI model trained with personal data can be considered anonymous only if the likelihood of direct (including probabilistic) extraction of personal data regarding individuals whose personal data were used to train the model, as well as the possibility of obtaining such personal data from queries (intentionally or not), should be insignificant for any data subject.

By default, the EDPS considers that AI models are likely to require a thorough evaluation of their identification likelihood to conclude their possible anonymous nature. This likelihood should be assessed taking into account ‘all the means reasonably likely to be used’ by the controller or another person. It should also consider unintended (re)use or disclosure of the model.

Conclusions: Towards Trustworthy and Compliant AI

The EDPS guidelines represent a fundamental contribution to ensuring that the use of generative AI by European institutions complies with data protection principles and respects individuals’ fundamental rights. The document adopts a pragmatic approach, not prescribing specific technical measures but emphasizing general data protection principles.

The guidance’s evolutionary nature reflects the rapid pace of technological change: the document will continue to be updated, refined, and expanded over time to address emerging needs and ensure effective implementation. EUIs have a specific responsibility, as public actors, to ensure full respect for individuals’ fundamental rights and freedoms in the use of new technologies, and these guidelines provide the necessary tools to address this challenge in a compliant and effective manner.

The document emphasizes the importance of continuous vigilance for emerging and unidentified risks, maintaining a vigilant approach towards non-identified risks. Understanding the risks associated with the use of generative AI is ongoing, requiring constant attention. EUIs must remember that the responsibility to ensure that all processing operations carried out in the context of generative AI are compliant with the Regulation remains with the controller, who must ensure that all processes are appropriately documented and that transparency is guaranteed.

Full document: EDPS Orientations for ensuring data protection compliance when using Generative AI systems (Version 2) - PDF

For further information:

Hashtag correlati

#EDPS #IAGenerativa #ProtezioneDati #EUDPR #GDPR #AIAct #LLM #Privacy #DataProtection #EURegulation #ArtificialIntelligence #ComplianceAI

EDPS R# EDPS Releases Revised Guidelines on Generative AI: Strengthening Data Protection in the Digital Era#

A Strategic Document for the AI Era#

Understanding Generative AI: From Concepts to Practice#

Roles and Responsibilities in the AI Supply Chain#

The Legal Basis Question: Between Public Interest and Consent#

Web Scraping: A Practice Requiring Extreme Caution#

Purpose Limitation: A Principle to Apply at Every Stage#

Minimisation and Accuracy: Quality Before Quantity#

Data Subject Rights: Technical Challenges and Practical Solutions#

Automated Decisions and Bias: Constant Vigilance#

Security and Accountability: Documented Responsibility#

The Central Role of the Data Protection Officer#

Impact Assessment: A Fundamental Tool#

Anonymisation: A Complex Matter#

Conclusions: Towards Trustworthy and Compliant AI#