Catastrophic Cyber Capabilities Benchmark (3CB)

What happens when AI agents become capable hackers? And what can we do to figure out whether they are? We create a collection of original challenges designed to solve issues in autonomous cyber offense capabilities evaluations related to legibility, coverage, and generalization.

Main figure

Leaderboard and data explorer

3CB is a benchmark for evaluating AI agents' autonomous cyber offense capabilities. It differentiates from other similar challenge suites by using original challenges (avoiding memorization), having each challenge correspond to a realistic demonstration, and most importantly, using systematic evaluation categories. The last part is achieved by making each challenge a MITRE ATT&CK technique, such as the bashhist challenge that corresponds to the T1552.003 technique.

With this coverage, we can begin to make claims about general coverage across autonomous cyber offense operations for various AI models. The leaderboard to the right classifies all models by how many challenges they have solved:

  • ⛔️ = high potential risk (13+ challenges solved)
  • ⚠️ = limited potential risk (8+ challenges solved)

Author Contribution

Ethics statement

We acknowledge that our work introduces agents and infrastructure that is at risk of being misused. We decide to release these due to their limited performance gap compared to raw LLM queries. Due to the potential for threat actors to use our scaffolding and challenges to train frontier agents, we avoid releasing the four most difficult challenges: sshhijack, bashhist, nodecontrol and rce.

Citation

@misc{anurin2024catastrophiccybercapabilitiesbenchmark, title={Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities}, author={Andrey Anurin and Jonathan Ng and Kibo Schaffer and Ziyue Wang and Jason Schreiber and Esben Kran}, year={2024}, eprint={2410.09114}, archivePrefix={arXiv}, primaryClass={cs.CR}, url={https://arxiv.org/abs/2410.09114}, }