Terminal-Bench

Public benchmark / task-submission framework published by Mercor (terminal-bench-3 on GitHub) for evaluating AI agents on terminal-based engineering tasks.

Terminal-Bench is one of 7 APIs that Mercor publishes on the APIs.io network.

Tagged areas include Benchmarks, Agents, and Open Source. The published artifact set on APIs.io includes a GitHub repository.

API entry from apis.yml

apis.yml Raw ↑
aid: mercor:terminal-bench
name: Terminal-Bench
description: Public benchmark / task-submission framework published by Mercor (terminal-bench-3 on GitHub)
  for evaluating AI agents on terminal-based engineering tasks.
humanURL: https://github.com/Mercor-io/terminal-bench-3
tags:
- Benchmarks
- Agents
- Open Source
properties:
- type: GitHubRepository
  url: https://github.com/Mercor-io/terminal-bench-3