Terminal-Bench
Public benchmark / task-submission framework published by Mercor (terminal-bench-3 on GitHub) for evaluating AI agents on terminal-based engineering tasks.
Public benchmark / task-submission framework published by Mercor (terminal-bench-3 on GitHub) for evaluating AI agents on terminal-based engineering tasks.