Evals

BIG-Bench

The Beyond the Imitation Game Benchmark (BIG-Bench) is "a collaborative benchmark intended to probe large language models and extrapolate their future capabilities." It contains more than 200 tasks across JSON-based simplified tasks and programmatic tasks; a curated subset (BIG-Bench Lite) of 24 tasks is provided as the canonical headline measurement. Maintained on GitHub by Google with open community task submissions.

Documentation GitHub

API entry from apis.yml

name: BIG-Bench
description: The Beyond the Imitation Game Benchmark (BIG-Bench) is "a collaborative benchmark intended
  to probe large language models and extrapolate their future capabilities." It contains more than 200
  tasks across JSON-based simplified tasks and programmatic tasks; a curated subset (BIG-Bench Lite) of
  24 tasks is provided as the canonical headline measurement. Maintained on GitHub by Google with open
  community task submissions.
humanURL: https://github.com/google/BIG-bench
baseURL: https://github.com/google/BIG-bench
tags:
- Benchmark
- Collaborative
- Multitask
- Google
- BIG-Bench Lite
properties:
- type: GitHubRepository
  url: https://github.com/google/BIG-bench
- type: Paper
  url: https://arxiv.org/abs/2206.04615
- type: Documentation
  url: https://github.com/google/BIG-bench/blob/main/README.md

BIG-Bench

Documentation

SDKs

Other Resources

API entry from apis.yml