Runloop
Runloop Benchmark API

Define, configure, and run Benchmarks against your agents. Runloop ships SWE-Bench Verified and SWE-smith out of the box; the Benchmark API also supports custom benchmarks built from your own scenarios and scorers. Resources include benchmarks, benchmark runs (with start, cancel, complete lifecycle), benchmark jobs, scenario runs, and downloadable run logs.
Runloop Benchmark API is one of 13 APIs that Runloop publishes on the APIs.io network, described by a machine-readable OpenAPI specification.
This API exposes 3 machine-runnable capabilities that can be deployed as REST, MCP, or Agent Skill surfaces via Naftiko and 2 JSON Schema definitions.
Tagged areas include AI, AI Agents, Benchmarks, Evaluation, and SWE-Bench. The published artifact set on APIs.io includes API documentation, an OpenAPI specification, sample payloads, 3 Naftiko capability specs, and 2 JSON Schemas.
Documentation GitHub OpenAPI
OpenAPI Specification

openapi: 3.0.3
info:
  title: Runloop Benchmark API
  version: '0.1'
  description: "Run and manage Benchmarks and Benchmark Runs \u2014 the evaluation framework for AI coding agents. Supports\
    \ SWE-Bench, SWE-smith, and custom benchmark definitions, scenario aggregation, run lifecycle (start/cancel/complete),\
    \ scoring, and log retrieval."
  contact:
    name: Runloop AI Support
    url: https://runloop.ai
    email: [email protected]
servers:
- url: https://api.runloop.ai
  description: Runloop API
  variables: {}
tags:
- name: Benchmark
paths:
  /v1/benchmark_jobs:
    post:
      tags:
      - Benchmark
      summary: '[Beta] Create a BenchmarkJob.'
      description: '[Beta] Create a BenchmarkJob that runs a set of scenarios entirely on runloop.'
      operationId: createBenchmarkJob
      parameters: []
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/BenchmarkJobCreateParameters'
        required: false
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkJobView'
      deprecated: false
    get:
      tags:
      - Benchmark
      summary: '[Beta] List BenchmarkJobs.'
      description: '[Beta] List all BenchmarkJobs matching filter.'
      operationId: listBenchmarkJobs
      parameters:
      - name: name
        in: query
        description: Filter by name
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      - name: limit
        in: query
        description: The limit of items to return. Default is 20. Max is 5000.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: integer
          format: int32
      - name: starting_after
        in: query
        description: Load the next page of data starting after the item with the given ID.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      - name: include_total_count
        in: query
        description: If true (default), includes total_count in the response. Set to false to skip the count query for better
          performance on large datasets.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: boolean
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkJobListView'
      deprecated: false
  /v1/benchmark_jobs/{id}:
    get:
      tags:
      - Benchmark
      summary: '[Beta] Get a previously created BenchmarkJob.'
      description: '[Beta] Get a BenchmarkJob given ID.'
      operationId: getBenchmarkJob
      parameters:
      - name: id
        in: path
        description: The BenchmarkJob ID.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkJobView'
      deprecated: false
  /v1/benchmark_runs:
    get:
      tags:
      - Benchmark
      summary: List BenchmarkRuns.
      description: List all BenchmarkRuns matching filter.
      operationId: listBenchmarkRuns
      parameters:
      - name: name
        in: query
        description: Filter by name
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      - name: benchmark_id
        in: query
        description: The Benchmark ID to filter by.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      - name: state
        in: query
        description: Filter by state
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      - name: limit
        in: query
        description: The limit of items to return. Default is 20. Max is 5000.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: integer
          format: int32
      - name: starting_after
        in: query
        description: Load the next page of data starting after the item with the given ID.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      - name: include_total_count
        in: query
        description: If true (default), includes total_count in the response. Set to false to skip the count query for better
          performance on large datasets.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: boolean
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkRunListView'
      deprecated: false
  /v1/benchmark_runs/{id}:
    get:
      tags:
      - Benchmark
      summary: Get a previously created BenchmarkRun.
      description: Get a BenchmarkRun given ID.
      operationId: getBenchmarkRun
      parameters:
      - name: id
        in: path
        description: The BenchmarkRun ID.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkRunView'
      deprecated: false
  /v1/benchmark_runs/{id}/cancel:
    post:
      tags:
      - Benchmark
      summary: Cancel a currently running Benchmark run.
      description: 'Cancel a Benchmark run. This will do the following: 1. Cancel all running scenarios and shutdown the underlying
        Devbox resources 2. Update the benchmark state to CANCELED 3. Calculate final score from completed scenarios'
      operationId: cancelBenchmarkRun
      parameters:
      - name: id
        in: path
        description: The BenchmarkRun ID.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkRunView'
      deprecated: false
  /v1/benchmark_runs/{id}/complete:
    post:
      tags:
      - Benchmark
      summary: Complete a BenchmarkRun.
      description: Complete a currently running BenchmarkRun.
      operationId: completeBenchmarkRun
      parameters:
      - name: id
        in: path
        description: The BenchmarkRun ID.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkRunView'
      deprecated: false
  /v1/benchmark_runs/{id}/download_logs:
    post:
      tags:
      - Benchmark
      summary: Download logs for a Benchmark run.
      description: Download a zip file containing all logs for a Benchmark run.
      operationId: downloadBenchmarkRunLogs
      parameters:
      - name: id
        in: path
        description: The BenchmarkRun ID.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      responses:
        '200':
          description: OK
          content:
            application/zip:
              schema:
                format: binary
          headers:
            Content-Type:
              description: application/zip
              required: true
              schema:
                type: string
            Content-Disposition:
              description: attachment; filename="benchmark_run_logs.zip"
              required: true
              schema:
                type: string
      deprecated: false
  /v1/benchmark_runs/{id}/scenario_runs:
    get:
      tags:
      - Benchmark
      summary: List started scenario runs for a benchmark run.
      description: List started scenario runs for a benchmark run.
      operationId: listBenchmarkRunScenarioRuns
      parameters:
      - name: id
        in: path
        description: The BenchmarkRun ID.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      - name: state
        in: query
        description: Filter by Scenario Run state
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          $ref: '#/components/schemas/ScenarioRunState'
      - name: limit
        in: query
        description: The limit of items to return. Default is 20. Max is 5000.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: integer
          format: int32
      - name: starting_after
        in: query
        description: Load the next page of data starting after the item with the given ID.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      - name: include_total_count
        in: query
        description: If true (default), includes total_count in the response. Set to false to skip the count query for better
          performance on large datasets.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: boolean
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ScenarioRunListView'
      deprecated: false
  /v1/benchmarks:
    post:
      tags:
      - Benchmark
      summary: Create a Benchmark.
      description: Create a Benchmark with a set of Scenarios.
      operationId: createBenchmark
      parameters: []
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/BenchmarkCreateParameters'
        required: false
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkDefinitionView'
      deprecated: false
    get:
      tags:
      - Benchmark
      summary: List Benchmarks.
      description: List all Benchmarks matching filter.
      operationId: listBenchmarks
      parameters:
      - name: name
        in: query
        description: Filter by name
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      - name: limit
        in: query
        description: The limit of items to return. Default is 20. Max is 5000.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: integer
          format: int32
      - name: starting_after
        in: query
        description: Load the next page of data starting after the item with the given ID.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      - name: include_total_count
        in: query
        description: If true (default), includes total_count in the response. Set to false to skip the count query for better
          performance on large datasets.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: boolean
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkDefinitionListView'
      deprecated: false
  /v1/benchmarks/list_public:
    get:
      tags:
      - Benchmark
      summary: List Public Benchmarks.
      description: List all public benchmarks matching filter.
      operationId: listPublicBenchmarks
      parameters:
      - name: limit
        in: query
        description: The limit of items to return. Default is 20. Max is 5000.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: integer
          format: int32
      - name: starting_after
        in: query
        description: Load the next page of data starting after the item with the given ID.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      - name: include_total_count
        in: query
        description: If true (default), includes total_count in the response. Set to false to skip the count query for better
          performance on large datasets.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: boolean
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkDefinitionListView'
      deprecated: false
  /v1/benchmarks/metadata/keys:
    get:
      tags:
      - Benchmark
      summary: List available benchmark metadata keys.
      description: Returns a list of all available metadata keys that can be used for filtering benchmarks.
      operationId: getBenchmarkMetadataKeys
      parameters: []
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/MetadataKeysView'
      deprecated: false
  /v1/benchmarks/metadata/keys/{key}/values:
    get:
      tags:
      - Benchmark
      summary: List values for a specific benchmark metadata key.
      description: Returns a list of all available metadata keys that can be used for filtering benchmarks.
      operationId: getBenchmarkMetadataValues
      parameters:
      - name: key
        in: path
        description: The metadata key to get values for.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/MetadataValuesView'
        '400':
          description: Invalid metadata key provided.
      deprecated: false
  /v1/benchmarks/runs:
    get:
      tags:
      - Benchmark
      summary: List BenchmarkRuns.
      description: List all BenchmarkRuns matching filter.
      operationId: listBenchmarkRunsDeprecated
      parameters:
      - name: name
        in: query
        description: Filter by name
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      - name: benchmark_id
        in: query
        description: The Benchmark ID to filter by.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      - name: state
        in: query
        description: Filter by state
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      - name: limit
        in: query
        description: The limit of items to return. Default is 20. Max is 5000.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: integer
          format: int32
      - name: starting_after
        in: query
        description: Load the next page of data starting after the item with the given ID.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      - name: include_total_count
        in: query
        description: If true (default), includes total_count in the response. Set to false to skip the count query for better
          performance on large datasets.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: boolean
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkRunListView'
      deprecated: true
  /v1/benchmarks/runs/{id}:
    get:
      tags:
      - Benchmark
      summary: Get a previously created BenchmarkRun.
      description: Get a BenchmarkRun given ID.
      operationId: getBenchmarkRunDeprecated
      parameters:
      - name: id
        in: path
        description: The BenchmarkRun ID.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkRunView'
      deprecated: true
  /v1/benchmarks/runs/{id}/cancel:
    post:
      tags:
      - Benchmark
      summary: Cancel a currently running Benchmark run.
      description: 'Cancel a Benchmark run. This will do the following: 1. Cancel all running scenarios and shutdown the underlying
        Devbox resources 2. Update the benchmark state to CANCELED 3. Calculate final score from completed scenarios'
      operationId: cancelBenchmarkRunDeprecated
      parameters:
      - name: id
        in: path
        description: The BenchmarkRun ID.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkRunView'
      deprecated: true
  /v1/benchmarks/runs/{id}/complete:
    post:
      tags:
      - Benchmark
      summary: Complete a BenchmarkRun.
      description: Complete a currently running BenchmarkRun.
      operationId: completeBenchmarkRunDeprecated
      parameters:
      - name: id
        in: path
        description: The BenchmarkRun ID.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkRunView'
      deprecated: true
  /v1/benchmarks/runs/{id}/download_logs:
    post:
      tags:
      - Benchmark
      summary: Download logs for a Benchmark run.
      description: Download a zip file containing all logs for a Benchmark run.
      operationId: downloadBenchmarkRunLogsDeprecated
      parameters:
      - name: id
        in: path
        description: The BenchmarkRun ID.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      responses:
        '200':
          description: OK
          content:
            application/zip:
              schema:
                format: binary
          headers:
            Content-Type:
              description: application/zip
              required: true
              schema:
                type: string
            Content-Disposition:
              description: attachment; filename="benchmark_run_logs.zip"
              required: true
              schema:
                type: string
      deprecated: true
  /v1/benchmarks/runs/{id}/scenario_runs:
    get:
      tags:
      - Benchmark
      summary: List started scenario runs for a benchmark run.
      description: List started scenario runs for a benchmark run.
      operationId: listBenchmarkRunScenarioRunsDeprecated
      parameters:
      - name: id
        in: path
        description: The BenchmarkRun ID.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      - name: state
        in: query
        description: Filter by Scenario Run state
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          $ref: '#/components/schemas/ScenarioRunState'
      - name: limit
        in: query
        description: The limit of items to return. Default is 20. Max is 5000.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: integer
          format: int32
      - name: starting_after
        in: query
        description: Load the next page of data starting after the item with the given ID.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      - name: include_total_count
        in: query
        description: If true (default), includes total_count in the response. Set to false to skip the count query for better
          performance on large datasets.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: boolean
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ScenarioRunListView'
      deprecated: true
  /v1/benchmarks/start_run:
    post:
      tags:
      - Benchmark
      summary: Start a new BenchmarkRun.
      description: Start a new BenchmarkRun based on the provided Benchmark.
      operationId: startBenchmarkRun
      parameters: []
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/StartBenchmarkRunParameters'
        required: false
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkRunView'
      deprecated: false
  /v1/benchmarks/{id}:
    post:
      tags:
      - Benchmark
      summary: Update a Benchmark.
      description: Update a Benchmark. Fields that are null will preserve the existing value. Fields that are provided (including
        empty values) will replace the existing value entirely.
      operationId: updateBenchmark
      parameters:
      - name: id
        in: path
        description: The Benchmark ID.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/BenchmarkUpdateParameters'
        required: false
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkDefinitionView'
      deprecated: false
    get:
      tags:
      - Benchmark
      summary: Get a Benchmark.
      description: Get a previously created Benchmark.
      operationId: getBenchmark
      parameters:
      - name: id
        in: path
        description: The Benchmark ID.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkDefinitionView'
      deprecated: false
  /v1/benchmarks/{id}/archive:
    post:
      tags:
      - Benchmark
      summary: Archive a Benchmark.
      description: Archive a previously created Benchmark. The benchmark will no longer appear in list endpoints but can still
        be retrieved by ID.
      operationId: archiveBenchmark
      parameters:
      - name: id
        in: path
        description: The ID of the Benchmark to archive.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkDefinitionView'
        '403':
          description: Cannot archive public benchmarks.
        '404':
          description: Benchmark not found.
      deprecated: false
  /v1/benchmarks/{id}/definitions:
    get:
      tags:
      - Benchmark
      summary: Get scenario definitions for a Benchmark.
      description: Get scenario definitions for a previously created Benchmark.
      operationId: getBenchmarkScenarioDefinitions
      parameters:
      - name: id
        in: path
        description: The Benchmark ID.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      - name: limit
        in: query
        description: The limit of items to return. Default is 20. Max is 5000.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: integer
          format: int32
      - name: starting_after
        in: query
        description: Load the next page of data starting after the item with the given ID.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ScenarioDefinitionListView'
      deprecated: false
  /v1/benchmarks/{id}/runs:
    get:
      tags:
      - Benchmark
      summary: Get runs for a provided Benchmark.
      description: Get runs for a previously created Benchmark.
      operationId: getBenchmarkRuns
      parameters:
      - name: id
        in: path
        description: The Benchmark ID.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      - name: limit
        in: query
        description: The limit of items to return. Default is 20. Max is 5000.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: integer
          format: int32
      - name: starting_after
        in: query
        description: Load the next page of data starting after the item with the given ID.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: string
      - name: include_total_count
        in: query
        description: If true (default), includes total_count in the response. Set to false to skip the count query for better
          performance on large datasets.
        required: false
        deprecated: false
        allowEmptyValue: true
        schema:
          type: boolean
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkRunListView'
      deprecated: false
  /v1/benchmarks/{id}/scenarios:
    post:
      tags:
      - Benchmark
      summary: Modify scenarios for a Benchmark.
      description: Add and/or remove Scenario IDs from an existing Benchmark.
      operationId: updateBenchmarkScenarios
      parameters:
      - name: id
        in: path
        description: The Benchmark ID.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/BenchmarkScenarioUpdateParameters'
        required: false
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkDefinitionView'
      deprecated: false
  /v1/benchmarks/{id}/unarchive:
    post:
      tags:
      - Benchmark
      summary: Unarchive a Benchmark.
      description: Unarchive a previously archived Benchmark. The benchmark will appear in list endpoints again.
      operationId: unarchiveBenchmark
      parameters:
      - name: id
        in: path
        description: The ID of the Benchmark to unarchive.
        required: true
        deprecated: false
        allowEmptyValue: false
        schema:
          type: string
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BenchmarkDefinitionView'
        '403':
          description: Cannot unarchive public benchmarks.
        '404':
          description: Benchmark not found.
      deprecated: false
components:
  schemas:
    AgentMount:
      type: object
      additionalProperties: false
      properties:
        agent_id:
          type: string
          nullable: true
          description: The ID of the agent to mount. Either agent_id or name must be set.
        agent_name:
          type: string
          nullable: true
          description: The name of the agent to mount. Returns the most recent agent with a matching name if no agent id string
            provided. Either agent id or name must be set
        agent_path:
          type: string
          nullable: true
          description: Path to mount the agent on the Devbox. Required for git and object agents. Use absolute path (e.g.,
            /home/user/agent)
        auth_token:
          type: string
          nullable: true
          description: Optional auth token for private repositories. Only used for git agents.
        type:
          type: string
          enum:
          - agent_mount
          default: agent_mount
      required:
      - agent_id
      - agent_name
      - type
    Architecture:
      type: string
      enum:
      - x86_64
      - arm64
    AstGrepScoringFunction:
      type: object
      additionalProperties: false
      description: AstGrepScoringFunction utilizes structured coach search for scoring.
      properties:
        lang:
          type: string
          description: The language of the pattern.
        search_directory:
          type: string
          description: The path to search.
        pattern:
          type: string
          description: AST pattern to match. Pattern will be passed to ast-grep using the commandline surround by double quotes
            ("), so make sure to use proper escaping (for example, \$\$\$).
        type:
          type: string
          enum:
          - ast_grep_scorer
          default: ast_grep_scorer
      required:
      - search_directory
      - pattern
      - type
    BashScriptScoringFunction:
      type: object
      additionalProperties: false
      description: BashScriptScoringFunction is a scoring function specified by a bash script that will be run in the context
        of your environment.
      properties:
        bash_script:
          type: string
          description: A single bash script that sets up the environment, scores, and prints the final score to standard out.
            Score should be a float between 0.0 and 1.0, and look like "score=[0.0..1.0].
        type:
          type: string
          enum:
          - bash_script_scorer
          default: bash_script_scorer
      required:
      - type
    BenchmarkCreateParameters:
      type: object
      additionalProperties: false
      description: BenchmarkCreateParameters contain the set of parameters to create a Benchmark.
      properties:
        name:
          type: string
          description: The unique name of the Benchmark.
        scenario_ids:
          type: array
          items:
            type: string
          nullable: true
          description: The Scenario IDs that make up the Benchmark.
        metadata:
          type: object
          additionalProperties:
            type: string
          nullable: true
          description: User defined metadata to attach to the benchmark.
        required_environment_variables:
          type: array
          items:
            type: string
          nullable: true
          description: Environment variables required to run the benchmark. If any required variables are not supplied, the
            benchmark will fai

# --- truncated at 32 KB (85 KB total) ---
# Full source: https://raw.githubusercontent.com/api-evangelist/runloop-ai/refs/heads/main/openapi/runloop-benchmark-api-openapi.yml
Runloop Benchmark API

Documentation

Specifications

Examples

Schemas & Data

Other Resources

OpenAPI Specification