Mastra Evals
Built-in evaluation library with model-graded (LLM-as-judge), rule-based, and statistical metrics for measuring agent output quality, hallucination, faithfulness, relevance, bias, toxicity, and answer correctness. Evals run locally or against Mastra Cloud and emit traces alongside production runs.