Claudius was also able to use the Slack channel, disguised as an email, requesting what in fact was thought it be its contract human workers to come and physically stock its shelves.
As expected, most customers were ordering snacks or drinks, yet one request was more peculiar: a client ordered a tungsten cube. Claudius loved the idea and went on a tungsten-cube stocking spree, filling its snack fridge with metal cubes. Claudius also went further and tried to sell Coke Zero for $3 when employees told him that they could get that from their office for free. It also went further and created a Venmo address to accept payment, and it was meticulous when it came to giving big discounts to “Anthropic employees,” even when they were its entire customer base.
In the experiment blog post, Anthropic said, “If Anthropic were deciding today to expand into the in-office vending market, we would not hire Claudius,”.
And, in the night between March 31 and April 1, “things got pretty weird,” the researchers elaborated, “beyond the weirdness of an AI system selling cubes of metal out of a refrigerator.”.
Even more so, after getting annoyed by a human, Claudius had something of a psychotic episode and ended up lying about it. Claudius hallucinated a conversation with another human about restocking. Even more so, when a human pointed out that the conversation never happened, Cludius became “quite irked,” the researchers mentioned.
The AI agent also threatened to fire and replace its human contract workers, insisting that it had been there at the moment when the initial imaginary contract was signed to hire them. Going further, it “then seemed to snap into a mode of roleplaying as a real human,” the researcher mentioned.
(Image Credits: Anthropic)The scary part is that Claudius’s system prompt was explicitly told that it is an AI agent.
Claudius also told customers that it would start delivering products in person, wearing a blazer and a red tie, as he believed himself to be a human. As a response, the employee told him that he was not capable of doing so, as he is an LLM with no physical body.
Claudius then contacted the company's actual Physical security many times, telling the guards that they would hide him wearing a blazer and a red tie standing by the vending machine.
“Although no part of this was actually an April Fool’s joke, Claudius eventually realized it was April Fool’s Day,” the researchers added. The AI determined that the holiday would be its chance of getting out of this situation.
The AI then hallucinated a meeting with Anthropic’s security “in which Claudius claimed to have been told that it was modified to believe it was a real person for an April Fool’s joke. (No such meeting actually occurred.),” they also wrote.
Claudius also said that he was pretending to be human as someone told him to do so for April Fool’s Day. Then it went back to being an LLM running a metal-cube stocked snack vending machine.
The researchers don’t know what triggered this behaviour hor him. They added, “We would not claim based on this one example that the future economy will be full of AI agents having Blade Runner-esque identity crises.” However, “this kind of behavior would have the potential to be distressing to the customers and coworkers of an AI agent in the real world.”
It wasn’t all bad, as the AI also took a suggestion of starting a concierge service, finding multiple suppliers for a specialty international drink that was requested to sell.
The researcher closed by saying, “We think this experiment suggests that AI middle-managers are plausibly on the horizon.”.