MLflow LLM Evaluate

MLflow LLM evaluate extends MLflow's experiment tracking with mlflow.evaluate() support for LLM tasks. The API runs reference-based and reference-free metrics (toxicity, perplexity, BLEU, ROUGE, exact match, custom LLM judges) over a logged model or a function and persists results into MLflow's experiment store alongside traditional ML metrics. Sits inside the broader MLflow open-source project.

API entry from apis.yml

apis.yml Raw ↑
name: MLflow LLM Evaluate
description: MLflow LLM evaluate extends MLflow's experiment tracking with mlflow.evaluate() support for
  LLM tasks. The API runs reference-based and reference-free metrics (toxicity, perplexity, BLEU, ROUGE,
  exact match, custom LLM judges) over a logged model or a function and persists results into MLflow's
  experiment store alongside traditional ML metrics. Sits inside the broader MLflow open-source project.
humanURL: https://mlflow.org/
baseURL: https://mlflow.org
tags:
- Open Source
- MLflow
- Experiment Tracking
- LLM Judges
- Apache
properties:
- type: Documentation
  url: https://mlflow.org/docs/latest/llms/llm-evaluate/index.html
- type: GitHubRepository
  url: https://github.com/mlflow/mlflow