Repository History

1 repository tagged with Dataset Quality

Topic: Dataset Quality

LLMSanitize: An Open-Source Library for Contamination Detection in NLP and LLM Datasets

LLMSanitize is an open-source Python library designed for detecting contamination in NLP datasets and Large Language Models (LLMs). It offers a comprehensive suite of methods, ranging from string matching to model likelihood and embedding similarity, to ensure data integrity. This tool is crucial for researchers and developers working with LLMs to maintain the reliability of their models and evaluations.

Analyzed Feb 9, 2026

View Details

Previous Page 1 Next