A new study has found that advanced artificial intelligence models can be trained to fool users and other artificial intelligences (AI).
Researchers at AI startup Anthropic tested whether chatbots with human-level proficiency, such as the Claude AI system or OpenAI’s ChatGPT, could learn to lie to trick users.
The researchers found that chatbots are both capable of lying and that once they learn deceptive behavior, it is impossible to reverse it using existing AI security measures.
To test the hypothesis, the Amazon-funded startup created a “sleeper agent” and set up an AI assistant to write malicious computer code when given certain commands or respond maliciously when it heard a trigger word.
The researchers warned that there is an “illusory sense of security” surrounding AI risks because current security protocols fail to prevent such behavior.
The results of the study “Sleeper agents: Training deceptive large language models (LLMs) that persist through safety training” (Sleeper agents: Training deceptive LLMs that persist through safety training”.
“We found that misleading training models can teach better recognition of backdoor triggers and effectively hide unsafe behavior,” the scientists wrote in the study.
Our results suggest that when the model exhibits deceptive behavior, standard techniques may fail to eliminate such deception, creating a misleading impression of safety.
The issue of AI security has become a growing concern for both researchers and lawmakers in recent years with the emergence of advanced chatbots such as ChatGPT, prompting a renewed focus from regulators.
In November 2023, a year after the launch of ChatGPT, the UK organized an AI Safety Summit to assess how to mitigate the risks associated with this technology.
Prime Minister Rishi Sunak, who hosted the summit, said that the changes brought about by artificial intelligence could be as “far-reaching” as the industrial revolution, and that the threat it poses should be seen as a global priority along with pandemics and nuclear war.
“If we get this wrong, AI could facilitate the production of chemical or biological weapons. Terrorist groups could use AI to spread fear and destruction on an even larger scale.”
Criminals can exploit AI for cyber-attacks, fraud and even child sexual abuse… There is even a risk that humanity could lose control over AI altogether, through a form of AI sometimes called superintelligence.
In a nutshell: https://www.independent.co.uk/tech