Evidently: Open-Source ML and LLM Observability Framework
This repository profile is provided by osrepos.com, an open source repository discovery platform.
Summary
Evidently is an open-source Python library designed for evaluating, testing, and monitoring machine learning and large language model systems. It provides over 100 built-in metrics for various tasks, from data drift detection to LLM judges, supporting both tabular and text data. This framework helps ensure the quality and performance of AI-powered systems throughout their lifecycle.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
Evidently is a powerful open-source Python library that serves as an ML and LLM observability framework. It enables users to evaluate, test, and monitor any AI-powered system or data pipeline, from tabular data to Generative AI applications. With over 100 built-in metrics, Evidently supports both offline evaluations and live monitoring, offering a modular architecture for various use cases.
Installation
To get started with Evidently, you can install it using pip or Conda.
pip install evidentlyAlternatively, for Conda users:
conda install -c conda-forge evidentlyTo run the Evidently UI with demo projects, you can use uv or a standard virtual environment:
uv run --with evidently evidently ui --demo-projects allIf uv is not installed, set up a virtual environment:
pip install virtualenv
virtualenv venv
source venv/bin/activate
pip install evidently
evidently ui --demo-projects allThen, visit localhost:8000 in your browser.
Examples
Evidently offers comprehensive tools for both LLM and traditional ML/data evaluations, along with a monitoring dashboard.
LLM Evaluations
Here's a quick example for LLM evaluations, checking sentiment, text length, and specific word presence in responses.
import pandas as pd
from evidently import Report
from evidently import Dataset, DataDefinition
from evidently.descriptors import Sentiment, TextLength, Contains
from evidently.presets import TextEvals
eval_df = pd.DataFrame([
["What is the capital of Japan?", "The capital of Japan is Tokyo."],
["Who painted the Mona Lisa?", "Leonardo da Vinci."],
["Can you write an essay?", "I'm sorry, but I can't assist with homework."]],
columns=["question", "answer"])
eval_dataset = Dataset.from_pandas(pd.DataFrame(eval_df),
data_definition=DataDefinition(),
descriptors=[
Sentiment("answer", alias="Sentiment"),
TextLength("answer", alias="Length"),
Contains("answer", items=['sorry', 'apologize'], mode="any", alias="Denials")
])
report = Report([
TextEvals()
])
my_eval = report.run(eval_dataset)
my_evalData and ML Evaluations
For data and ML evaluations, Evidently can detect data drift using various statistical methods.
import pandas as pd
from sklearn import datasets
from evidently import Report
from evidently.presets import DataDriftPreset
iris_data = datasets.load_iris(as_frame=True)
iris_frame = iris_data.frame
report = Report([
DataDriftPreset(method="psi")
],
include_tests="True")
my_eval = report.run(iris_frame.iloc[:60], iris_frame.iloc[60:])
my_evalYou can also save reports as HTML files using my_eval.save_html("file.html").
Monitoring Dashboard
Evidently also provides a Monitoring UI service to visualize metrics and test results over time. You can self-host the open-source version or use Evidently Cloud for additional features like dataset management, alerting, and no-code evaluations.
Why Use Evidently
Evidently offers a comprehensive suite of tools for evaluating various aspects of AI systems, making it invaluable for maintaining model quality and reliability. With over 100 built-in evaluations, and the ability to add custom ones, it covers a wide range of needs. It works with tabular and text data, supports evaluations for predictive and generative tasks, and provides both offline evaluations and live monitoring.
Key evaluation capabilities include:
- Text descriptors: Length, sentiment, toxicity, language, special symbols, regular expression matches.
- LLM outputs: Semantic similarity, retrieval relevance, summarization quality, using model- and LLM-based evaluations.
- Data quality: Missing values, duplicates, min-max ranges, new categorical values, correlations.
- Data distribution drift: Over 20 statistical tests and distance metrics to compare shifts in data distribution.
- Classification: Accuracy, precision, recall, ROC AUC, confusion matrix, bias.
- Regression: MAE, ME, RMSE, error distribution, error normality, error bias.
- Ranking (including RAG): NDCG, MAP, MRR, Hit Rate.
- Recommendations: Serendipity, novelty, diversity, popularity bias.
Links
Related repositories
Similar repositories that may be relevant next.

TensorRec: A TensorFlow Recommendation Framework in Python
May 17, 2026
TensorRec is a Python recommendation system built on TensorFlow, designed for quickly developing and customizing recommendation algorithms. It allows users to define custom representation and loss functions while handling data manipulation, scoring, and ranking. Although not under active development, it provides a solid foundation for understanding and implementing recommender systems.
ML-From-Scratch: Machine Learning Models and Algorithms in NumPy
March 9, 2026
ML-From-Scratch is a comprehensive GitHub repository offering bare-bones NumPy implementations of fundamental machine learning models and algorithms. It emphasizes accessibility, making complex concepts easier to understand for learners and practitioners. This project covers a wide range of topics, from linear regression to deep learning and reinforcement learning, all implemented from scratch.
Spotlight: Deep Recommender Models with PyTorch
February 26, 2026
Spotlight is a Python library built on PyTorch for developing deep and shallow recommender models. It offers a comprehensive set of building blocks for various loss functions, representations, and utilities for handling recommendation datasets. This tool is designed for rapid exploration and prototyping of new recommender systems.

Conda: A Cross-Platform Binary Package and Environment Manager
February 4, 2026
Conda is a powerful, cross-platform, language-agnostic binary package and environment manager. It simplifies the creation of isolated environments for various projects, even for C libraries, and efficiently installs packages using hard links. Written entirely in Python and BSD licensed, Conda is a cornerstone for distributions like Anaconda and Miniforge.
Source repository
Open the original repository on GitHub.