(без названия)
Donating our open-source alignment tool
In October 2025, we launched Petri, an open-source toolbox of alignment tests that can be applied to any large language model. Petri, which was developed as part of our Anthropic Fellows program, can be used to rapidly and easily test AI models for concerning tendencies like deception, sycophancy, and cooperation with harmful requests. It’s part of our efforts to develop alignment tools that are open and useful for the whole AI development community.
Petri has been part of our alignment assessment for every Claude model since Claude Sonnet 4.5. It compares how the new model behaves across a range of alignment-relevant scenarios that are simulated by a separate “auditor” model. A further “judge” model then scores the resulting transcripts for misaligned behaviors.
We’ve been pleased to see Petri being used by external organizations: for example, the UK’s AI Security Institute (AISI) made it a major part of how they evaluate models for their propensity to sabotage AI research.
We’re now updating Petri to its third version. Here are some of the biggest changes:
We’re also giving Petri a new home. We have handed over its development to Meridian Labs, an AI evaluation nonprofit. This move—similar to when we donated the Model Context Protocol (MCP) to the Linux Foundation—will help ensure that Petri remains independent of any AI lab, so that its results will be seen as neutral and credible by those across the industry and beyond.
As part of Meridian Labs, Petri joins other tools like Inspect and Scout, building a technology stack that is open to labs, independent researchers, and governments alike, at a time when reliable tests of AI model behavior matter more than ever.
You can read more about Petri 3.0 on the Meridian Labs blog.
Instructions to install and use Petri can be found on the Petri website.
Related content
2028: Two scenarios for global AI leadership
Our views on the AI competition between the US and China.
Teaching Claude why
New research on how we've reduced agentic misalignment.
Natural Language Autoencoders: Turning Claude’s thoughts into text
AI models like Claude talk in words but think in numbers. In this study we train Claude to translate its thoughts into human-readable text.