Explore all analyzed open source repositories
LLMSanitize is an open-source Python library designed for detecting contamination in NLP datasets and Large Language Models (LLMs). It offers a comprehensive suite of methods, ranging from string matching to model likelihood and embedding similarity, to ensure data integrity. This tool is crucial for researchers and developers working with LLMs to maintain the reliability of their models and evaluations.