# RAGChecker: A Fine-grained Framework for Diagnosing RAG Systems

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Source: osrepos.com
Repository profile: https://osrepos.com/repo/amazon-science-ragchecker
Generated for open source discovery and AI-assisted research.

RAGChecker is an advanced automatic evaluation framework developed by Amazon Science, specifically designed to assess and diagnose Retrieval-Augmented Generation (RAG) systems. It offers a comprehensive suite of metrics and tools for in-depth analysis of RAG performance. This framework empowers developers and researchers to thoroughly evaluate and enhance their RAG systems with precision.

GitHub: https://github.com/amazon-science/RAGChecker
OSRepos URL: https://osrepos.com/repo/amazon-science-ragchecker

## Summary

RAGChecker is an advanced automatic evaluation framework developed by Amazon Science, specifically designed to assess and diagnose Retrieval-Augmented Generation (RAG) systems. It offers a comprehensive suite of metrics and tools for in-depth analysis of RAG performance. This framework empowers developers and researchers to thoroughly evaluate and enhance their RAG systems with precision.

## Topics

- Python
- RAG
- LLM
- Evaluation
- AI
- NLP
- Machine Learning

## Repository Information

Last analyzed by OSRepos: Sat Jul 04 2026 16:17:36 GMT+0100 (Western European Summer Time)
Detail views: 2
GitHub clicks: 1

## Safety Notice

OSRepos shares public repositories for knowledge and discovery only. Review source code, dependencies, licenses, and security implications before running or installing anything.

## Content

## Introduction
RAGChecker is an advanced automatic evaluation framework developed by Amazon Science, specifically designed to assess and diagnose Retrieval-Augmented Generation (RAG) systems. It provides a comprehensive suite of metrics and tools for in-depth analysis of RAG performance, helping to identify and address issues within both the retrieval and generation components. This framework utilizes claim-level entailment operations for fine-grained evaluation, offering valuable insights for targeted improvements.

## Installation
To get started with RAGChecker, you can install it via pip and download the necessary spaCy model:

bash
pip install ragchecker
python -m spacy download en_core_web_sm


## Examples
RAGChecker supports both command-line interface (CLI) and Python API for evaluating your RAG systems.

### CLI Example
First, prepare your data in a JSON format similar to the example below, where `gt_answer` is the only required annotation for each query:


{
  "results": [
    {
      "query_id": "<query id>",
      "query": "<input query>",
      "gt_answer": "<ground truth answer>",
      "response": "<response generated by the RAG generator>",
      "retrieved_context": [
        {
          "doc_id": "<doc id>",
          "text": "<content of the chunk>"
        }
      ]
    }
  ]
}


Then, run the checking pipeline using the `ragchecker-cli` command, specifying your input and output paths, and the models for the extractor and checker:

bash
ragchecker-cli \
    --input_path=examples/checking_inputs.json \
    --output_path=examples/checking_outputs.json \
    --extractor_name=bedrock/meta.llama3-1-70b-instruct-v1:0 \
    --checker_name=bedrock/meta.llama3-1-70b-instruct-v1:0 \
    --batch_size_extractor=64 \
    --batch_size_checker=64 \
    --metrics all_metrics


The output will provide detailed metrics:


{
  "overall_metrics": {
    "precision": 73.3,
    "recall": 62.5,
    "f1": 67.3
  },
  "retriever_metrics": {
    "claim_recall": 61.4,
    "context_precision": 87.5
  },
  "generator_metrics": {
    "context_utilization": 87.5,
    "noise_sensitivity_in_relevant": 22.5,
    "noise_sensitivity_in_irrelevant": 0.0,
    "hallucination": 4.2,
    "self_knowledge": 25.0,
    "faithfulness": 70.8
  }
}


### Python API Example
You can also integrate RAGChecker directly into your Python code:

python
from ragchecker import RAGResults, RAGChecker
from ragchecker.metrics import all_metrics

# initialize ragresults from json/dict
with open("examples/checking_inputs.json") as fp:
    rag_results = RAGResults.from_json(fp.read())

# set-up the evaluator
evaluator = RAGChecker(
    extractor_name="bedrock/meta.llama3-1-70b-instruct-v1:0",
    checker_name="bedrock/meta.llama3-1-70b-instruct-v1:0",
    batch_size_extractor=32,
    batch_size_checker=32
)

# evaluate results with selected metrics or certain groups, e.g., retriever_metrics, generator_metrics, all_metrics
evaluator.evaluate(rag_results, all_metrics)
print(rag_results)


## Why Use RAGChecker
RAGChecker empowers developers and researchers to thoroughly evaluate, diagnose, and enhance their RAG systems with precision and depth. Its key benefits include:
*   **Holistic Evaluation**: Offers `Overall Metrics` for a comprehensive assessment of the entire RAG pipeline.
*   **Diagnostic Metrics**: Provides `Diagnostic Retriever Metrics` and `Diagnostic Generator Metrics` to analyze specific components, offering valuable insights for targeted improvements.
*   **Fine-grained Evaluation**: Utilizes `claim-level entailment` operations for highly detailed evaluation.
*   **Benchmark Dataset**: Includes a comprehensive RAG benchmark dataset for robust testing.
*   **Meta-Evaluation**: Features a human-annotated preference dataset to correlate RAGChecker's results with human judgments.
*   **LlamaIndex Integration**: Seamlessly integrates with LlamaIndex, making it a powerful evaluation tool for RAG applications built with LlamaIndex.

## Links
*   **GitHub Repository**: [https://github.com/amazon-science/RAGChecker](https://github.com/amazon-science/RAGChecker){:target="_blank"}
*   **RAGChecker Paper (arXiv)**: [https://arxiv.org/pdf/2408.08067](https://arxiv.org/pdf/2408.08067){:target="_blank"}
*   **Tutorial (English)**: [https://github.com/amazon-science/RAGChecker/blob/main/tutorial/ragchecker_tutorial_en.md](https://github.com/amazon-science/RAGChecker/blob/main/tutorial/ragchecker_tutorial_en.md){:target="_blank"}
*   **LlamaIndex Integration Documentation**: [https://docs.llamaindex.ai/en/latest/examples/evaluation/RAGChecker/](https://docs.llamaindex.ai/en/latest/examples/evaluation/RAGChecker/){:target="_blank"}