Anthropic Reverses Secret Claude AI Policy After Backlash

Anthropic reversed a controversial policy that secretly limited competitors from using its new AI model, Claude Fable 5, to develop other AI systems. The company changed course after researchers exposed the hidden restrictions and mounted fierce criticism across social media platforms and research forums.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic said in a statement to WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

Anthropic released Claude Fable 5, a version of its latest AI model with additional safety guardrails designed to prevent misuse, earlier this week. The company announced that it would reroute users who asked questions about cybersecurity, biology, or chemistry to a less capable AI model to reduce the chances of someone using the advanced AI to carry out a cyberattack or build a bioweapon. These restrictions came with visible notifications to users.

For researchers trying to use Claude Fable 5 for frontier AI development, however, Anthropic outlined a dramatically different approach. The firm would deliberately degrade the model’s performance in ways that were invisible to the user. The move would effectively sabotage researchers trying to use Claude to train competing AI models, which Anthropic explicitly bans in its terms of service.

Hidden Restrictions Spark Immediate Controversy

The release of Claude Fable 5 came just over a week after Anthropic confidentially filed for IPO paperwork, making it a considerable step for the lab. Fortune reported that Anthropic had initially deemed Mythos-class models too dangerous to release, citing their significantly enhanced ability to identify software vulnerabilities. The company said it was now confident new guardrails in Claude Fable 5 would ensure these dangerous skills don’t fall into the wrong hands.

Just hours after the model’s release, major backlash from AI researchers, developers, and policy experts began brewing on social media. The pushback centered around a paragraph buried in Claude Fable 5’s 319-page system card-a document that offers detailed safety disclosures. This paragraph revealed that Fable would quietly downgrade its own responses when it detected requests related to cutting-edge AI development work, such as building the infrastructure used to train large AI models.

In practice, a user could ask Fable for help, receive a deliberately weakened answer, but not know the model was holding anything back. Critics made it clear they felt this undermined a basic expectation that a tool would either do what it was asked or tell the user it wouldn’t. Unlike Fable’s other restrictions around cybersecurity and biology, which openly redirect users to a less powerful model with visible notifications, the system card emphasized that this restriction “is not visible to the user.” The model still responds, but uses “interventions to limit Claude’s effectiveness” without telling the user it’s doing so.

Company Defends Initial Decision Before Reversal

Anthropic estimated the restrictions would affect roughly 0.03% of traffic. The company initially defended its effort by saying “enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.” Anthropic has taken steps to limit competitors from using Claude to build both closed and open source AI models through its terms of service, but critics argued that quietly degrading the model’s performance for certain users went a step too far.

Claude’s coding agent has become a favored tool among developers, including those working on open-source AI research projects. Researchers told WIRED that the company’s latest policy could have led to a troubling future in which only a handful of leading AI labs could perform advanced AI research. The restrictions would have prevented the flagship model from assisting researchers working on competing AI systems, despite Anthropic building its reputation on responsible AI development.

Anthropic now says Claude Fable 5’s safeguards for AI development will be visible to users. If the company suspects a user is trying to use Claude to build a highly capable AI, it will alert them that it’s either refusing the request or rerouting the user to a less capable model.

Research Community Voices Strong Opposition

“To have my access to the cutting edge models for my work rug pulled in an under the table fashion is appalling,” wrote Nathan Lambert, an open-model researcher who most recently led work at AI2. “To me this paints Anthropic clearly as anti-science, and therefore anti-progress and anti-safety.”

A wide swath of the AI community pushed back sharply-including open-source researchers critical of Anthropic’s closed policies, as well as AI safety experts who typically align with Anthropic. Researchers flagged the policy after noticing Claude becoming evasive or unhelpful when asked to assist with AI model development tasks. What started as isolated complaints on social media quickly grew. It snowballed into a full-blown controversy about transparency and trust in AI systems.

According to discussions on research forums and Twitter, Claude would subtly refuse or provide degraded responses when users attempted to use it for tasks like training competing language models, optimizing neural architectures, or debugging AI research code. Researchers accused Anthropic of covert competitive behavior disguised as safety measures because the company did not disclose the restrictions in its usage policies or documentation.

Reputational Stakes for AI Safety Leader

Anthropic positions itself as the ethical alternative in the AI race by emphasizing constitutional AI principles and transparent safety practices. This policy directly contradicted that brand promise, revealing what critics described as a protectionist streak that prioritizes market position over the open research culture Anthropic claims to support. Dean Ball, a senior fellow at the Foundation for American Innovation, was among the policy experts who criticized the approach.

While the company hasn’t released a detailed public statement beyond its comments to WIRED, sources familiar with the matter indicate leadership recognized the reputational damage was outweighing any competitive advantage the restrictions might have provided. Anthropic responded to the mounting criticism by reversing the policy entirely and committing to make all such safeguards visible going forward.

The incident exposes a fundamental tension in the AI industry. Companies like Anthropic, OpenAI, and Google have poured billions into developing advanced AI systems. They simultaneously grapple with how to balance competitive interests against transparency requirements and their commitments to the broader research community.