Repository History

Explore all analyzed open source repositories

Topic: data-processing
Argo Workflows: A Cloud-Native Workflow Engine for Kubernetes

Argo Workflows: A Cloud-Native Workflow Engine for Kubernetes

Argo Workflows is an open-source, container-native workflow engine designed for orchestrating parallel jobs on Kubernetes. It allows users to define multi-step workflows where each step is a container, modeling dependencies using directed acyclic graphs (DAGs). This CNCF graduated project is ideal for machine learning pipelines, data processing, and CI/CD.

Feb 12, 2026
View Details
DataTrove: Streamlining Large-Scale Data Processing for LLMs

DataTrove: Streamlining Large-Scale Data Processing for LLMs

DataTrove is a powerful Python library designed to simplify the complex task of processing, filtering, and deduplicating text data at a massive scale. It offers a collection of customizable, platform-agnostic pipeline blocks, making it ideal for preparing training data for large language models. With support for various execution environments, DataTrove frees developers from scripting madness, enabling efficient and reproducible data workflows.

Jan 27, 2026
View Details
GraphRAG: A Modular Graph-Based RAG System for LLM Discovery

GraphRAG: A Modular Graph-Based RAG System for LLM Discovery

GraphRAG, developed by Microsoft, is a powerful and modular graph-based Retrieval-Augmented Generation (RAG) system. It is designed to extract meaningful, structured data from unstructured text using Large Language Models (LLMs). This system enhances an LLM's ability to reason about private and narrative data by leveraging knowledge graph memory structures.

Dec 17, 2025
View Details
Cerberus: Lightweight and Extensible Data Validation for Python

Cerberus: Lightweight and Extensible Data Validation for Python

Cerberus is a lightweight and extensible data validation library for Python, offering robust type checking and base functionality. It is designed for easy customization and integration, allowing for custom validation rules. With no external dependencies, Cerberus provides a powerful yet simple solution for validating data structures.

Dec 13, 2025
View Details
Page 1