DeepEval (Confident AI)
DeepEval is an open-source LLM evaluation package, paired with Confident AI as the hosted observability/evals/monitoring tier. The docs call DeepEval "an open-source LLM eval package" and Confident AI "an AI quality platform with observability, evals, and monitoring." Metrics include GEval (research-backed custom metric), AnswerRelevancyMetric, TaskCompletionMetric, and ConversationalGEval. Test cases use LLMTestCase and ConversationalTestCase shapes; datasets organize Golden test cases for sync or async runs.