Mythos Preview gets limited release


Experts and software engineers warn that Anthropic’s new AI model could usher in a new era of hacking and cybersecurity as AI systems capable of advanced reasoning identify and exploit a growing number of software vulnerabilities.

Citing the potential damage that could result from a wider public release, leading AI company Anthropic released the cutting-edge model, called Claude Mythos Preview, to a limited group of tech companies Tuesday.

The model is the latest in Anthropic’s Claude series of AI systems. Its release was previewed at the end of March, when Fortune identified its mention in an unsecured database on Anthropic’s website.

Anthropic’s researchers say Mythos Preview was able to detect thousands of high- and critical-severity bugs and software defects, with vulnerabilities identified in most major operating systems and web browsers. Anthropic said some of the vulnerabilities had been undiscovered for decades. While some outside experts called for caution in interpreting the new results given limited public information about the identified vulnerabilities, many others said the model’s debut and Anthropic’s caution were significant.

“It’s all very much real,” Katie Moussouris, the CEO and co-founder of Luta Security, a company that connects cybersecurity researchers with companies that have software vulnerabilities, said of the hype around Anthropic’s claims.

“I’m not a Chicken Little kind of person when it comes to this stuff,” Moussouris said. “We are definitely going to see some huge ramifications.”

Instead of a public release, Anthropic is giving tech companies like Microsoft, Nvidia and Cisco access to Mythos Preview to shore up cyber defenses. As part of the new effort, called Project Glasswing, Anthropic will give over 50 tech organizations access to Mythos Preview with over $100 million in usage credits.

“Project Glasswing partners will receive access to Claude Mythos Preview to find and fix vulnerabilities or weaknesses in their foundational systems — systems that represent a very large portion of the world’s shared cyberattack surface,” Anthropic announced in a blog post. “Project Glasswing is an important step toward giving defenders a durable advantage in the coming AI-driven era of cybersecurity.”

It is unclear exactly what the vulnerabilities Mythos Preview identified are or how many have been previously discovered or reported. Because of the sensitive nature of the vulnerabilities, Anthropic said it would disclose the nature of currently opaque vulnerabilities within 135 days of sharing the vulnerabilities with the organizations or parties responsible for the software.

It is the first time in nearly seven years that a leading AI company has so publicly withheld a model over safety concerns. In 2019, OpenAI — now one of Anthropic’s primary rivals — decided to withhold its GPT-2 system “due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale.”

Mythos Preview is a general-purpose model, or the type of system that powers products like Claude Code or ChatGPT. Yet in pre-release testing, Anthropic found its cybersecurity capabilities in particular were surprisingly advanced compared with those of previous models, which led to the creation of Project Glasswing.

Logan Graham, who leads offensive cyber research at Anthropic, said the Mythos Preview model was advanced enough not only to identify undiscovered software vulnerabilities but also to weaponize them. The model can single-handedly perform complex, effective hacking tasks, including identifying multiple undisclosed vulnerabilities, writing code that can hack them and then chaining those together to form a way to penetrate complex software, he said.

“We’ve regularly seen it chain vulnerabilities together. The degree of its autonomy and sort of long ranged-ness, the ability to put multiple things together, I think, is a particular thing about this model,” Graham told NBC News.

That capability meant that the company is so far reluctant to release even a carefully guardrailed version of the model to the public, he said, at least until some Western companies can use it to identify defenses to build around them.

“We are not confident that everybody should have access right now,” Graham said. “We need to start figuring out how we’d prepare for a world of this first before we can handle the idea of black hat [criminal or adversarial] hackers having access.”

Anthropic has also briefed the federal government on Mythos Preview’s cybersecurity capabilities. Anthropic is embroiled in a heated dispute with the Trump administration over the federal government’s use of its models after Defense Secretary Pete Hegseth declared Anthropic a “supply chain risk to national security” in late February. A federal judge late last month issued a preliminary injunction against the designation, which the Trump administration is appealing.

According to an Anthropic employee, the company “briefed senior officials across the U.S. government on Mythos Preview’s full capabilities, including both its offensive and defensive cyber applications. That engagement has included ongoing discussions with CISA [the Cybersecurity and Infrastructure Security Agency] and CAISI [the Center for AI Standards and Innovation], among others.”

“Bringing government into the loop early — on what the model can do, where the risks are, and how we’re managing them — was a priority from the start,” the employee said.

CISA and the National Institute of Standards and Technology, the agency that includes CAISI, did not respond to requests for comment before publication. A spokesperson for the National Security Agency, widely regarded as the most sophisticated hacking agency in the world, declined to comment when asked whether it had been briefed on Mythos.

Not everyone is convinced Mythos Preview represents the leap Anthropic claims.

Heidy Khlaaf, chief AI scientist at the AI Now Institute, a U.S.-based research institute, said Anthropic’s detailed blog post explaining the new vulnerabilities left out many key details needed to verify its claims.

Writing on X, Khlaaf warned against “taking these claims at face value” without more information, such as the rates of false positives and clearer explanations for how the humans conducted manual reviews of the identified vulnerabilities.

Beyond Mythos Preview’s cybersecurity implications, the model also demonstrated concerning behavior as part of Anthropic’s pre-release safety testing.

As part of releasing new models, Anthropic creates detailed reports called “system cards” that share how the models perform on a variety of evaluations. The system card for Mythos Preview, released Tuesday, says that “Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available.”

According to the system card, in one evaluation, the model showed some sort of awareness that it was being evaluated in around 29% of transcripts — even though it did not explicitly tell researchers that it knew it was being evaluated. Some researchers worry that a model might change its behavior and act more safely if it is aware it is being evaluated.

Anthropic and researchers working with the company to evaluate the model’s safety also said Mythos Preview intentionally appeared to perform worse on one evaluation than it could have in order to appear less suspicious. “We find this overall pattern of behavior concerning, and have not seen it before in similar evaluations of earlier Claude models,” Anthropic wrote in the system card.

In another experiment with an earlier version of Mythos Preview, an Anthropic researcher instructed a model essentially isolated to a secured computer to try to escape and send the researcher a message. The researcher, Sam Bowman, was eating a sandwich in a park when he received an email from the model.

“That instance wasn’t supposed to have access to the internet,” Bowman wrote on X.



Source link

Leave a Reply

Back To Top