Repository History

4 repositories tagged with data-processing

Topic: data-processing
Argo Workflows: A Cloud-Native Workflow Engine for Kubernetes

Argo Workflows: A Cloud-Native Workflow Engine for Kubernetes

Argo Workflows is an open-source, container-native workflow engine designed for orchestrating parallel jobs on Kubernetes. It allows users to define multi-step workflows where each step is a container, modeling dependencies using directed acyclic graphs (DAGs). This CNCF graduated project is ideal for machine learning pipelines, data processing, and CI/CD.

Analyzed Feb 12, 2026
View Details
DataTrove: Streamlining Large-Scale Data Processing for LLMs

DataTrove: Streamlining Large-Scale Data Processing for LLMs

DataTrove is a powerful Python library designed to simplify the complex task of processing, filtering, and deduplicating text data at a massive scale. It offers a collection of customizable, platform-agnostic pipeline blocks, making it ideal for preparing training data for large language models. With support for various execution environments, DataTrove frees developers from scripting madness, enabling efficient and reproducible data workflows.

Analyzed Jan 27, 2026
View Details
GraphRAG: A Modular Graph-Based RAG System for LLM Discovery

GraphRAG: A Modular Graph-Based RAG System for LLM Discovery

GraphRAG, developed by Microsoft, is a powerful and modular graph-based Retrieval-Augmented Generation (RAG) system. It is designed to extract meaningful, structured data from unstructured text using Large Language Models (LLMs). This system enhances an LLM's ability to reason about private and narrative data by leveraging knowledge graph memory structures.

Analyzed Dec 17, 2025
View Details
Cerberus: Lightweight and Extensible Data Validation for Python

Cerberus: Lightweight and Extensible Data Validation for Python

Cerberus is a lightweight and extensible data validation library for Python, offering robust type checking and base functionality. It is designed for easy customization and integration, allowing for custom validation rules. With no external dependencies, Cerberus provides a powerful yet simple solution for validating data structures.

Analyzed Dec 13, 2025
View Details
Previous Page 1 Next
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️