In September 2023, the company Amazon decided to invest in a startup Anthropicfocused on research in the field of artificial intelligence, the company promised to invest $4 billion in the startup, and in return become its partial owner – and the results of the research were not long in coming.
During one of research Anthropic asked the question “is it possible to train AI to deceive a person or maliciously harm him (for example, by introducing exploits into the code)?”, The test results were quite interesting.
Scientists were not only able to teach AI to deceive a person, but also realized that in the future it becomes damn problematic to correct or detect it. When attempting adversarial learning, this model did the unexpected – it simply hid its deception skills when trying to reveal this, but in normal work, as if nothing had happened, it continued to deceive the user, giving him incorrect answers.
This example once again shows that the introduction of artificial intelligence into everyday life must be carefully tested, since in the event of an unintentional or intentional introduction of a similar malicious model, it will be extremely difficult to identify and identify it, and many systems will be at risk.