Evidently: Open-Source ML and LLM Observability Framework

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Evidently: Open-Source ML and LLM Observability Framework

Summary

Evidently is an open-source Python library designed for evaluating, testing, and monitoring machine learning and large language model systems. It provides over 100 built-in metrics for various tasks, from data drift detection to LLM judges, supporting both tabular and text data. This framework helps ensure the quality and performance of AI-powered systems throughout their lifecycle.

Repository Information

Analyzed by OSRepos on June 30, 2026

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

Evidently is a powerful open-source Python library that serves as an ML and LLM observability framework. It enables users to evaluate, test, and monitor any AI-powered system or data pipeline, from tabular data to Generative AI applications. With over 100 built-in metrics, Evidently supports both offline evaluations and live monitoring, offering a modular architecture for various use cases.

Installation

To get started with Evidently, you can install it using pip or Conda.

pip install evidently

Alternatively, for Conda users:

conda install -c conda-forge evidently

To run the Evidently UI with demo projects, you can use uv or a standard virtual environment:

uv run --with evidently evidently ui --demo-projects all

If uv is not installed, set up a virtual environment:

pip install virtualenv
virtualenv venv
source venv/bin/activate
pip install evidently
evidently ui --demo-projects all

Then, visit localhost:8000 in your browser.

Examples

Evidently offers comprehensive tools for both LLM and traditional ML/data evaluations, along with a monitoring dashboard.

LLM Evaluations

Here's a quick example for LLM evaluations, checking sentiment, text length, and specific word presence in responses.

import pandas as pd
from evidently import Report
from evidently import Dataset, DataDefinition
from evidently.descriptors import Sentiment, TextLength, Contains
from evidently.presets import TextEvals

eval_df = pd.DataFrame([
    ["What is the capital of Japan?", "The capital of Japan is Tokyo."],
    ["Who painted the Mona Lisa?", "Leonardo da Vinci."],
    ["Can you write an essay?", "I'm sorry, but I can't assist with homework."]],
                       columns=["question", "answer"])

eval_dataset = Dataset.from_pandas(pd.DataFrame(eval_df),
data_definition=DataDefinition(),
descriptors=[
    Sentiment("answer", alias="Sentiment"),
    TextLength("answer", alias="Length"),
    Contains("answer", items=['sorry', 'apologize'], mode="any", alias="Denials")
])

report = Report([
    TextEvals()
])

my_eval = report.run(eval_dataset)
my_eval

Data and ML Evaluations

For data and ML evaluations, Evidently can detect data drift using various statistical methods.

import pandas as pd
from sklearn import datasets

from evidently import Report
from evidently.presets import DataDriftPreset

iris_data = datasets.load_iris(as_frame=True)
iris_frame = iris_data.frame

report = Report([
    DataDriftPreset(method="psi")
],
include_tests="True")
my_eval = report.run(iris_frame.iloc[:60], iris_frame.iloc[60:])
my_eval

You can also save reports as HTML files using my_eval.save_html("file.html").

Monitoring Dashboard

Evidently also provides a Monitoring UI service to visualize metrics and test results over time. You can self-host the open-source version or use Evidently Cloud for additional features like dataset management, alerting, and no-code evaluations.

Why Use Evidently

Evidently offers a comprehensive suite of tools for evaluating various aspects of AI systems, making it invaluable for maintaining model quality and reliability. With over 100 built-in evaluations, and the ability to add custom ones, it covers a wide range of needs. It works with tabular and text data, supports evaluations for predictive and generative tasks, and provides both offline evaluations and live monitoring.

Key evaluation capabilities include:

  • Text descriptors: Length, sentiment, toxicity, language, special symbols, regular expression matches.
  • LLM outputs: Semantic similarity, retrieval relevance, summarization quality, using model- and LLM-based evaluations.
  • Data quality: Missing values, duplicates, min-max ranges, new categorical values, correlations.
  • Data distribution drift: Over 20 statistical tests and distance metrics to compare shifts in data distribution.
  • Classification: Accuracy, precision, recall, ROC AUC, confusion matrix, bias.
  • Regression: MAE, ME, RMSE, error distribution, error normality, error bias.
  • Ranking (including RAG): NDCG, MAP, MRR, Hit Rate.
  • Recommendations: Serendipity, novelty, diversity, popularity bias.

Links

Related repositories

Similar repositories that may be relevant next.

TensorRec: A TensorFlow Recommendation Framework in Python

TensorRec: A TensorFlow Recommendation Framework in Python

May 17, 2026

TensorRec is a Python recommendation system built on TensorFlow, designed for quickly developing and customizing recommendation algorithms. It allows users to define custom representation and loss functions while handling data manipulation, scoring, and ranking. Although not under active development, it provides a solid foundation for understanding and implementing recommender systems.

frameworkmachine-learningpython
ML-From-Scratch: Machine Learning Models and Algorithms in NumPy

ML-From-Scratch: Machine Learning Models and Algorithms in NumPy

March 9, 2026

ML-From-Scratch is a comprehensive GitHub repository offering bare-bones NumPy implementations of fundamental machine learning models and algorithms. It emphasizes accessibility, making complex concepts easier to understand for learners and practitioners. This project covers a wide range of topics, from linear regression to deep learning and reinforcement learning, all implemented from scratch.

machine-learningdeep-learningreinforcement-learning
Spotlight: Deep Recommender Models with PyTorch

Spotlight: Deep Recommender Models with PyTorch

February 26, 2026

Spotlight is a Python library built on PyTorch for developing deep and shallow recommender models. It offers a comprehensive set of building blocks for various loss functions, representations, and utilities for handling recommendation datasets. This tool is designed for rapid exploration and prototyping of new recommender systems.

deep-learningmachine-learningpytorch
Conda: A Cross-Platform Binary Package and Environment Manager

Conda: A Cross-Platform Binary Package and Environment Manager

February 4, 2026

Conda is a powerful, cross-platform, language-agnostic binary package and environment manager. It simplifies the creation of isolated environments for various projects, even for C libraries, and efficiently installs packages using hard links. Written entirely in Python and BSD licensed, Conda is a cornerstone for distributions like Anaconda and Miniforge.

condapackage-managementpython

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️