Overview Leaderboard Dashboard Methodology About Us Sign In

Analytics Dashboard

Analytics Dashboard

Top Overall Score
Avg. Score (all models)
Tasks in Benchmark
Models Evaluated
Question Distribution
Tasks
API Cost Analysis
Prices as of March 2026 · sorted by cost/task
Model In $/1M Out $/1M Cost/Task

Note: Cost per task for Kimi K2.6 could not be estimated due to technical issues.

Avg. Token Usage per Task
Total input + output tokens · 1 trial per task

Note: Avg. token usage per task for Kimi K2.6 could not be estimated due to technical issues.

Cost-Efficiency: Score vs. Avg. Cost per Task
Score vs. Output Speed
AccountingBench score vs. output tokens per second · Source: Artificial Analysis (March 2026)
Confidence–Performance Calibration
AccountingBench score versus model confidence · bins with fewer than three observations are omitted.

Dataset Composition

Category & Task Type
CategoryTasksShare
Task TypeTasksShare
*
Question Format & Education Level
Question FormatTasksShare
Education LevelTasksShare
Regulatory Framework Distribution
Regulatory FrameworkTasksShare