Evals

DeepEval (Confident AI)

DeepEval is an open-source LLM evaluation package, paired with Confident AI as the hosted observability/evals/monitoring tier. The docs call DeepEval "an open-source LLM eval package" and Confident AI "an AI quality platform with observability, evals, and monitoring." Metrics include GEval (research-backed custom metric), AnswerRelevancyMetric, TaskCompletionMetric, and ConversationalGEval. Test cases use LLMTestCase and ConversationalTestCase shapes; datasets organize Golden test cases for sync or async runs.

Documentation GitHub

API entry from apis.yml

name: DeepEval (Confident AI)
description: DeepEval is an open-source LLM evaluation package, paired with Confident AI as the hosted
  observability/evals/monitoring tier. The docs call DeepEval "an open-source LLM eval package" and Confident
  AI "an AI quality platform with observability, evals, and monitoring." Metrics include GEval (research-backed
  custom metric), AnswerRelevancyMetric, TaskCompletionMetric, and ConversationalGEval. Test cases use
  LLMTestCase and ConversationalTestCase shapes; datasets organize Golden test cases for sync or async
  runs.
humanURL: https://www.deepeval.com/
baseURL: https://api.confident-ai.com
tags:
- Open Source
- GEval
- RAG
- Conversational
- Python
properties:
- type: Documentation
  url: https://www.deepeval.com/docs/getting-started
- type: GitHubRepository
  url: https://github.com/confident-ai/deepeval
- type: Portal
  url: https://app.confident-ai.com

DeepEval (Confident AI)

Documentation

SDKs

Other Resources

API entry from apis.yml