OS
OSRepos
HomeRepositoriesRSS

Repository History

Explore all analyzed open source repositories

Topic: AI Evaluation
judges: A Python Library for LLM-as-a-Judge Evaluators

judges: A Python Library for LLM-as-a-Judge Evaluators

The `judges` library from Databricks provides a concise and powerful way to use and create LLM-as-a-Judge evaluators. It offers a curated set of pre-built judges for various use cases, backed by research, and supports both off-the-shelf usage and custom judge creation. This tool helps developers effectively evaluate the performance and quality of their Large Language Models.

Apr 14, 2026
View Details
Judgy: Correcting LLM Judge Bias for Reliable AI Model Evaluation

Judgy: Correcting LLM Judge Bias for Reliable AI Model Evaluation

Judgy is a Python package designed to improve the reliability of evaluations performed by LLM-as-Judges. It provides tools to estimate the true success rate of a system by correcting for LLM judge bias and generating confidence intervals through bootstrapping. This helps ensure more accurate and trustworthy assessments of AI model performance.

Dec 7, 2025
View Details
Page 1
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

Navigation

HomeRepositoriesSitemapRSS Feed

Legal

Privacy PolicyCookie Policy

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️