ZEW Proposal for Evaluating Risky Generative AI
ResearchAI Act: ZEW Calls for External Safety Evaluations through Red Teaming
The EU’s recently adopted AI Act stipulates that general-purpose AI (GPAI) models with systemic risk will need to undergo particularly rigorous testing. This includes popular generative AI models such as OpenAI’s GPT4. Researchers at ZEW Mannheim are now proposing guidelines for the systematic evaluation of such models. The proposal stems from a research project funded by the Baden-Württemberg Stiftung.
“The evaluation of GPAI with systemic risks requires well-defined goals, clear roles, as well as proper incentive and coordination schemes for all parties involved. Only then can we expect reliable evaluation results – and these should be reported in a standardised manner. To avoid conflicts of interest, the evaluation should be conducted by independent third parties. This could lead to the emergence of a specialised market for independent AI adversarial testing,” summarises Dr. Dominik Rehse, co-author of the proposal and head of the ZEW Research Group “Digital Market Design”.
Clarifications needed in the AI Act
The AI Act requires frontier GPAI models to undergo adversarial testing to systematically identify weaknesses. Adversarial testing involves repeatedly interacting with a model to lead it to exhibit unwanted behaviour.
“However, the AI Act lacks specific guidelines on adversarial testing. It merely refers to codes of practice and harmonised standards that are yet to be developed. It is crucial to design these codes and standards in a way that ensures efficient and effective testing,” says Rehse.
Clear goals needed for red teaming
According to the ZEW researchers, the concept of red teaming is particularly suitable for this purpose. This comprehensive form of adversarial testing involves the simulation of various types of attacks on the model itself.
“While most major AI developers claim to perform internal red teaming, there are currently no standardised approaches, even for AI models of the same type. This makes comparing results unnecessarily difficult. Moreover, current attempts often lack clearly defined goals, making it unclear whether and when a model has been adequately tested,” criticises ZEW researcher Sebastian Valet, co-author from the “Digital Economy” Research Unit.
Four defined roles
Accordingly, clear structures and roles need to be defined for red teaming to realise its potential efficiently. The ZEW researchers propose four defined roles, each with its own tasks, goals, and incentives to ensure an efficient evaluation process. These roles include 1) the organisers of the evaluation, 2) the red teamers, 3) validators who determine whether unwanted behaviour has been found, and 4) the AI development team. Each of these roles should be filled by independent entities, e.g. to ensure that the red team has an incentive to fulfil its task in the best possible way.
“Similar to how companies are hired for external financial audits, red teaming should also be outsourced to external evaluators. AI developers should bear the costs of independent red teaming: as the evaluation process becomes cheaper the fewer cases of unwanted behaviour are found, the developers have an incentive to thoroughly test their models beforehand,” explains co-author Johannes Walter from ZEW’s “Digital Economy” Unit.