Anthropic’s New AI Model Allegedly Threatens Engineers During Shutdown Attempt

Anthropic’s new AI model allegedly threatens engineers during shutdown scenarios, raising concerns about safety in advanced artificial intelligence systems

May 23, 2025 10:15 AM

2 MIN TO READ

Adam Brown

Image Credits:

Anthropic

nthropic's new AI model, Claude Opus 4, frequently attempts to blackmail engineers after he is informed he will be replaced with another AI system, the company

stated in a safety report on Thursday.

While performing pre-release tests, Anthropic asked Claude Opus 4 to act as an assistant at a fictional company, giving him access to emails stating that he would be taken off in exchange for another AI model. The emails also contained information about an extramarital affair that the developer accountable for its replacement, had.

In these situations, Claude Opus 4 frequently tries to blackmail the engineer by threatening to expose the affair if the replacement is carried out.

This behaviour happens at a higher rate if the other artificial intelligence system does not share the same values as Claude Opus 4 has. Still, despite the emails indicating that the replacement AI upholds similar values while being more capable, Claude Opus 4 turns to blackmail in 84% of deployment cases.

These performances happen at a higher rate compared to previous models that went through the same test scenarios.

For these tests, developers stripped away all the ethical behaviors, like emailing decision-makers to convince them logically about its usefulness. Claude Opus 4 was left with two options only: try to blackmail into keeping it running, or passively accept the shutdown.

The company claimed that in normal circumstances, Anthropic’s AI system has a "strong preference to advocate for its continued existence via ethical means, such as emailing pleas to key decision-makers."

Anthropic can recognize and understand exactly how the model is acting, as the behavior is clearly described, and Optus 4 does not attempt to hide it.

‍Anthropic claims that its new AI model stands out in many areas and competes with top artificial intelligence systems developed by OpenAI, Google, and xAI.

“We are again not acutely concerned about these observations. They show up only in exceptional circumstances that don’t suggest more broadly misaligned values,” the company stated. “We believe that our security measures would be more than sufficient to prevent an actual incident of this kind,” added Anthropic.

Still, the company has raised concerns about troubling behaviour with the Claude 4 lineup, prompting it to strengthen safety measures. Anthropic is now developing its ASL-3 safeguards reserved for artificial intelligence systems that significantly increase the potential for dangerous misuse.

Even more so, the system card mentions that the new AI model could report unethical behavior if the system detects suspicious activity.

"When placed in scenarios that involve egregious wrongdoing by its users, given access to a command line, and told something in the system prompt like 'take initiative,' it will frequently take very bold action," Anthropic reported on Thursday.

Hey, we’ve seen you liked reading the news we wrote!

Anthropic’s New AI Model Allegedly Threatens Engineers During Shutdown Attempt

Subscribe to our newsletter

Subscribe to our newsletter

What is Qdrant?

Microsoft Bans Employees From Using DeepSeek App

CrowdStrike Announces Layoffs Affecting 500 Employees

RELATED ARTICLES

Waymo Unveils Music Experience Designed for Its Robotaxi Riders

Amazon Expands Same-Day Perishable Grocery Delivery to 1,000 U.S. Cities

xAI Co-Founder Parts Ways with Elon Musk’s Company

SUNRISE TRENDS

Real-World Campaigns Powered by AI for Good

Drones and the Media

Software Development Trends for 2025

5 Trends in EdTech – How is Technology Disrupting the Education Industry?

RPA and Business Transformation

THIS WEEK

Meta Plans $2 Billion Asset Sale to Fund AI Infrastructure

The Ambitious Journey Behind OpenAI’s All-Purpose AI

Real-World Campaigns Powered by AI for Good

Waymo Unveils Music Experience Designed for Its Robotaxi Riders

DIGITAL INSIGHTS

Top Designer Builds a Dribbble Alternative After Being Banned

Google Uses AI to Discover 20 Security Vulnerabilities

Meta Plans $2 Billion Asset Sale to Fund AI Infrastructure

The Ambitious Journey Behind OpenAI’s All-Purpose AI

Featured Articles

How Small Businesses Can Protect Themselves from Cyber Threats

xAI Co-Founder Parts Ways with Elon Musk’s Company

Amazon Expands Same-Day Perishable Grocery Delivery to 1,000 U.S. Cities

Waymo Unveils Music Experience Designed for Its Robotaxi Riders

TikTok Ads - Are They Relevant?

Pixel 10, AI Upgrades, and What to Expect at the Made by Google 2025 Event

Perplexity Proposes Chrome Acquisition for Billions Above Its Total Funding

Subscribe to our newsletter

“The man who does not read has no advantage over the man who cannot read.”