🪲 CTFBench 🚩

About CTFBench

CTFBench is a benchmark for evaluating AI smart contract auditors. It uses a set of test cases where each smart contract has exactly one known vulnerability. The benchmark calculates two key metrics:

By plotting these metrics on a graph, CTFBench allows for a visual comparison of different AI auditors, helping developers and researchers assess their effectiveness and trade-offs.

Benchmark Results for AI Smart Contract Auditors

savant.chat v0.2 0.952 0.027
savant.chat v0.1 0.857 0.033
grok 3 thinking 0.524 0.037
ARMUR 0.524 0.090
openai_o3_mini_high 0.429 0.056
openai_o3_mini 0.429 0.064
deepseek_r1 0.429 0.070
Code Genie AI 0.333 0.023
slither 0.238 0.130
QuillShield 0.143 0.009
Aegis 0.143 0.118
AuditOne 0.095 0.028
SCAU 0.000 0.051

Performance Graph