DeepEval
DeepEval is an open-source Python framework for evaluating LLM applications as unit tests. It ships with research-backed metrics including GEval, AnswerRelevancyMetric, FaithfulnessMetric, TaskCompletionMetric, and ConversationalGEval, and supports end-to-end and component-level testing, multi-turn conversations, and LLM tracing for agents.
DeepEval is one of 3 APIs that Confident AI publishes on the APIs.io network.
Tagged areas include Open Source, LLM Evaluation, Python, and Testing Framework. The published artifact set on APIs.io includes a getting-started guide, API documentation, and SDKs.