Repository History

24 repositories tagged with NLP

Topic: NLP

RL4LMs: A Modular RL Library for Fine-tuning Language Models

RL4LMs is a powerful and modular reinforcement learning library designed to fine-tune language models to human preferences. It offers easily customizable building blocks for training, including on-policy algorithms, reward functions, and metrics. Thoroughly tested and benchmarked, RL4LMs supports a wide range of NLP tasks and models.

Analyzed Jul 6, 2026

View Details

torchtune: PyTorch Native Library for LLM Post-Training and Experimentation

torchtune is a PyTorch native library designed for authoring, post-training, and experimenting with Large Language Models (LLMs). It offers hackable training recipes, simple PyTorch implementations of popular LLMs, and best-in-class memory efficiency. Please note: torchtune is no longer actively maintained as of 2025.

Analyzed Jul 5, 2026

View Details

RAGChecker: A Fine-grained Framework for Diagnosing RAG Systems

RAGChecker is an advanced automatic evaluation framework developed by Amazon Science, specifically designed to assess and diagnose Retrieval-Augmented Generation (RAG) systems. It offers a comprehensive suite of metrics and tools for in-depth analysis of RAG performance. This framework empowers developers and researchers to thoroughly evaluate and enhance their RAG systems with precision.

Analyzed Jul 4, 2026

View Details

rerankers: Unified API for Reranking and Cross-Encoder Models

rerankers is a lightweight, low-dependency Python library that provides a unified API for various reranking and cross-encoder models. It simplifies the integration of different reranking approaches into retrieval architectures, offering a consistent interface for diverse models like cross-encoders, RankGPT, T5, and API-based rerankers. This library aims to make reranking more accessible and easier to implement for developers.

Analyzed Jul 4, 2026

View Details

LightLLM: A Lightweight and High-Speed LLM Inference and Serving Framework

LightLLM is a Python-based framework designed for efficient Large Language Model (LLM) inference and serving. It stands out for its lightweight architecture, impressive scalability, and high-speed performance, making it an excellent choice for deploying LLMs. The framework integrates and builds upon the strengths of various leading open-source implementations to deliver optimized results.

Analyzed Jul 4, 2026

View Details

DataDreamer: Streamlining Synthetic Data Generation and LLM Workflows

DataDreamer is an open-source Python library designed for efficient prompting, synthetic data generation, and model training workflows. It simplifies the process of creating complex LLM workflows, generating high-quality synthetic datasets, and aligning or fine-tuning models. Built to be simple, efficient, and research-grade, DataDreamer empowers users to build reproducible and shareable AI solutions.

Analyzed Jul 3, 2026

View Details

EasyInstruct: An Easy-to-Use Instruction Processing Framework for LLMs

EasyInstruct is an open-source Python framework designed to simplify instruction processing for Large Language Models (LLMs). Accepted at ACL 2024, it offers modularized components for instruction generation, selection, and prompting, supporting various LLMs like GPT-4 and LLaMA. This framework is ideal for researchers and developers working on LLM-based experiments and applications.

Analyzed Jul 2, 2026

View Details

AuditNLG: Auditing Generative AI for Trustworthiness

AuditNLG is an open-source library from Salesforce designed to enhance the trustworthiness of generative AI language models. It provides state-of-the-art techniques to detect and improve factualness, safety, and constraint adherence in AI-generated text. This library simplifies the process of auditing AI outputs, offering explanations and alternative suggestions for problematic content.

Analyzed Jun 25, 2026

View Details

spacy-llm: Integrating LLMs into Structured NLP Pipelines with spaCy

spacy-llm seamlessly integrates Large Language Models (LLMs) into spaCy, offering a modular system for rapid prototyping and transforming unstructured LLM responses into robust outputs for various NLP tasks. It supports a wide range of LLMs, including OpenAI, Cohere, Anthropic, and open-source models, enabling users to combine the power of LLMs with spaCy's production-ready capabilities. This package allows for quick experimentation and the creation of efficient, reliable, and controlled NLP systems.

Analyzed Jun 24, 2026

View Details

Qwen3: Alibaba Cloud's Advanced Large Language Model Series

Qwen3 is a powerful series of large language models developed by the Qwen team at Alibaba Cloud. It offers advanced capabilities in reasoning, multilingual support, and long-context understanding, available in various sizes and modes for diverse applications. This repository provides comprehensive resources for running, deploying, and building with Qwen3 models.

Analyzed May 10, 2026

View Details

Trafilatura: Advanced Web Scraping and Text Extraction in Python

Trafilatura is a robust Python package and command-line tool designed for gathering text and metadata from the web. It simplifies web crawling, scraping, and content extraction, transforming raw HTML into structured data. Widely adopted by major companies and institutions, it offers high efficiency and accuracy for various text processing needs.

Analyzed May 1, 2026

View Details

judges: A Python Library for LLM-as-a-Judge Evaluators

The `judges` library from Databricks provides a concise and powerful way to use and create LLM-as-a-Judge evaluators. It offers a curated set of pre-built judges for various use cases, backed by research, and supports both off-the-shelf usage and custom judge creation. This tool helps developers effectively evaluate the performance and quality of their Large Language Models.

Analyzed Apr 14, 2026

View Details

Previous Page 1 Next