Runloop Benchmark API
Define, configure, and run Benchmarks against your agents. Runloop ships SWE-Bench Verified and SWE-smith out of the box; the Benchmark API also supports custom benchmarks built from your own scenarios and scorers. Resources include benchmarks, benchmark runs (with start, cancel, complete lifecycle), benchmark jobs, scenario runs, and downloadable run logs.
Runloop Benchmark API is one of 13 APIs that Runloop publishes on the APIs.io network, described by a machine-readable OpenAPI specification.
This API exposes 3 machine-runnable capabilities that can be deployed as REST, MCP, or Agent Skill surfaces via Naftiko and 2 JSON Schema definitions.
Tagged areas include AI, AI Agents, Benchmarks, Evaluation, and SWE-Bench. The published artifact set on APIs.io includes API documentation, an OpenAPI specification, sample payloads, 3 Naftiko capability specs, and 2 JSON Schemas.