Live Benchmark

Benchmark: Can the analyzer tell
good agents from bad?

Five real ERC-8004 agents on Base Sepolia, each evaluated live through the verdict API. The analyzer reads on-chain reputation, feedback history, and validation attestations to assign DELEGATE / WATCH / AVOID verdicts with evidence-backed reasoning.

DELEGATE (70+)

WATCH (40-69)

AVOID (<40)

How It Works

1.Each agent is queried via /api/verdict?agentId=N
2.The API reads on-chain identity, feedback, and validations from Base Sepolia contracts
3.Composite trust score is computed from quality, uptime, and accuracy dimensions
4.Verdict is assigned: DELEGATE / WATCH / AVOID
5.AI reasoning explains the verdict using concrete evidence

All data is read live from Base Sepolia (chain 84532). No mock data. Results may vary as on-chain state changes.

Benchmark: Can the analyzer tellgood agents from bad?

How It Works

Benchmark: Can the analyzer tell
good agents from bad?