{"name":"RAGChecker: A Fine-grained Framework for Diagnosing RAG Systems","description":"RAGChecker is an advanced automatic evaluation framework developed by Amazon Science, specifically designed to assess and diagnose Retrieval-Augmented Generation (RAG) systems. It offers a comprehensive suite of metrics and tools for in-depth analysis of RAG performance. This framework empowers developers and researchers to thoroughly evaluate and enhance their RAG systems with precision.","github":"https://github.com/amazon-science/RAGChecker","url":"https://osrepos.com/repo/amazon-science-ragchecker","source":"osrepos.com","sourceDescription":"This repository profile is provided by osrepos.com, an open source repository discovery platform.","repositoryProfile":"https://osrepos.com/repo/amazon-science-ragchecker","generatedFor":"open source discovery and AI-assisted research","markdown":"https://osrepos.com/repo/amazon-science-ragchecker.md","json":"https://osrepos.com/repo/amazon-science-ragchecker.json","topics":["Python","RAG","LLM","Evaluation","AI","NLP","Machine Learning"],"keywords":["Python","RAG","LLM","Evaluation","AI","NLP","Machine Learning"],"stars":null,"summary":"RAGChecker is an advanced automatic evaluation framework developed by Amazon Science, specifically designed to assess and diagnose Retrieval-Augmented Generation (RAG) systems. It offers a comprehensive suite of metrics and tools for in-depth analysis of RAG performance. This framework empowers developers and researchers to thoroughly evaluate and enhance their RAG systems with precision.","content":"## Introduction\nRAGChecker is an advanced automatic evaluation framework developed by Amazon Science, specifically designed to assess and diagnose Retrieval-Augmented Generation (RAG) systems. It provides a comprehensive suite of metrics and tools for in-depth analysis of RAG performance, helping to identify and address issues within both the retrieval and generation components. This framework utilizes claim-level entailment operations for fine-grained evaluation, offering valuable insights for targeted improvements.\n\n## Installation\nTo get started with RAGChecker, you can install it via pip and download the necessary spaCy model:\n\nbash\npip install ragchecker\npython -m spacy download en_core_web_sm\n\n\n## Examples\nRAGChecker supports both command-line interface (CLI) and Python API for evaluating your RAG systems.\n\n### CLI Example\nFirst, prepare your data in a JSON format similar to the example below, where `gt_answer` is the only required annotation for each query:\n\n\n{\n  \"results\": [\n    {\n      \"query_id\": \"<query id>\",\n      \"query\": \"<input query>\",\n      \"gt_answer\": \"<ground truth answer>\",\n      \"response\": \"<response generated by the RAG generator>\",\n      \"retrieved_context\": [\n        {\n          \"doc_id\": \"<doc id>\",\n          \"text\": \"<content of the chunk>\"\n        }\n      ]\n    }\n  ]\n}\n\n\nThen, run the checking pipeline using the `ragchecker-cli` command, specifying your input and output paths, and the models for the extractor and checker:\n\nbash\nragchecker-cli \\\n    --input_path=examples/checking_inputs.json \\\n    --output_path=examples/checking_outputs.json \\\n    --extractor_name=bedrock/meta.llama3-1-70b-instruct-v1:0 \\\n    --checker_name=bedrock/meta.llama3-1-70b-instruct-v1:0 \\\n    --batch_size_extractor=64 \\\n    --batch_size_checker=64 \\\n    --metrics all_metrics\n\n\nThe output will provide detailed metrics:\n\n\n{\n  \"overall_metrics\": {\n    \"precision\": 73.3,\n    \"recall\": 62.5,\n    \"f1\": 67.3\n  },\n  \"retriever_metrics\": {\n    \"claim_recall\": 61.4,\n    \"context_precision\": 87.5\n  },\n  \"generator_metrics\": {\n    \"context_utilization\": 87.5,\n    \"noise_sensitivity_in_relevant\": 22.5,\n    \"noise_sensitivity_in_irrelevant\": 0.0,\n    \"hallucination\": 4.2,\n    \"self_knowledge\": 25.0,\n    \"faithfulness\": 70.8\n  }\n}\n\n\n### Python API Example\nYou can also integrate RAGChecker directly into your Python code:\n\npython\nfrom ragchecker import RAGResults, RAGChecker\nfrom ragchecker.metrics import all_metrics\n\n# initialize ragresults from json/dict\nwith open(\"examples/checking_inputs.json\") as fp:\n    rag_results = RAGResults.from_json(fp.read())\n\n# set-up the evaluator\nevaluator = RAGChecker(\n    extractor_name=\"bedrock/meta.llama3-1-70b-instruct-v1:0\",\n    checker_name=\"bedrock/meta.llama3-1-70b-instruct-v1:0\",\n    batch_size_extractor=32,\n    batch_size_checker=32\n)\n\n# evaluate results with selected metrics or certain groups, e.g., retriever_metrics, generator_metrics, all_metrics\nevaluator.evaluate(rag_results, all_metrics)\nprint(rag_results)\n\n\n## Why Use RAGChecker\nRAGChecker empowers developers and researchers to thoroughly evaluate, diagnose, and enhance their RAG systems with precision and depth. Its key benefits include:\n*   **Holistic Evaluation**: Offers `Overall Metrics` for a comprehensive assessment of the entire RAG pipeline.\n*   **Diagnostic Metrics**: Provides `Diagnostic Retriever Metrics` and `Diagnostic Generator Metrics` to analyze specific components, offering valuable insights for targeted improvements.\n*   **Fine-grained Evaluation**: Utilizes `claim-level entailment` operations for highly detailed evaluation.\n*   **Benchmark Dataset**: Includes a comprehensive RAG benchmark dataset for robust testing.\n*   **Meta-Evaluation**: Features a human-annotated preference dataset to correlate RAGChecker's results with human judgments.\n*   **LlamaIndex Integration**: Seamlessly integrates with LlamaIndex, making it a powerful evaluation tool for RAG applications built with LlamaIndex.\n\n## Links\n*   **GitHub Repository**: [https://github.com/amazon-science/RAGChecker](https://github.com/amazon-science/RAGChecker){:target=\"_blank\"}\n*   **RAGChecker Paper (arXiv)**: [https://arxiv.org/pdf/2408.08067](https://arxiv.org/pdf/2408.08067){:target=\"_blank\"}\n*   **Tutorial (English)**: [https://github.com/amazon-science/RAGChecker/blob/main/tutorial/ragchecker_tutorial_en.md](https://github.com/amazon-science/RAGChecker/blob/main/tutorial/ragchecker_tutorial_en.md){:target=\"_blank\"}\n*   **LlamaIndex Integration Documentation**: [https://docs.llamaindex.ai/en/latest/examples/evaluation/RAGChecker/](https://docs.llamaindex.ai/en/latest/examples/evaluation/RAGChecker/){:target=\"_blank\"}","metrics":{"detailViews":2,"githubClicks":1},"dates":{"published":null,"modified":"2026-07-04T15:17:36.000Z"}}