DeepFabric: High-Quality Synthetic Data for Agentic AI Systems

This repository profile is provided by osrepos.com, an open source repository discovery platform.

DeepFabric: High-Quality Synthetic Data for Agentic AI Systems

Summary

DeepFabric is an open-source Python library designed to generate high-quality synthetic training data for language models and agent evaluations. It excels at creating domain-specific datasets that teach models to think, plan, and act effectively, including correct tool usage and adherence to schema structures. This comprehensive pipeline also integrates training and evaluation capabilities, ensuring robust model development.

Repository Information

Analyzed by OSRepos on July 2, 2026

Topics

Click on any tag to explore related repositories

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

DeepFabric is a powerful open-source Python library that streamlines the process of generating synthetic training data for language models and evaluating agentic systems. It provides a complete pipeline to create high-quality, domain-specific datasets, train models, and rigorously assess their performance, particularly in tool-calling scenarios. By focusing on realistic reasoning traces and tool-calling patterns, DeepFabric helps develop models that can think, plan, and act effectively.

Installation

Getting started with DeepFabric is straightforward. You can install it using pip:

pip install deepfabric

Examples

DeepFabric can be used via its CLI, as a library, or with YAML configurations. Here's a quick example using the CLI to generate a dataset:

export OPENAI_API_KEY="your-api-key"

deepfabric generate \
  --topic-prompt "Python programming fundamentals" \
  --generation-system-prompt "You are a Python expert" \
  --mode graph \
  --depth 3 \
  --degree 3 \
  --num-samples 9 \
  --batch-size 3 \
  --provider openai \
  --model gpt-4o \
  --output-save-as dataset.jsonl

This command generates a topic graph and creates 27 unique nodes, then generates 27 training samples saved to dataset.jsonl, ensuring 100% topic coverage.

For evaluation, after training your model, you can use DeepFabric's built-in evaluator:

from deepfabric.evaluation import Evaluator, EvaluatorConfig, InferenceConfig
from datasets import load_dataset

# Load your evaluation dataset
dataset = load_dataset("your-username/your-dataset", split="test")

config = EvaluatorConfig(
    inference_config=InferenceConfig(
        model_path="./output/checkpoint-final",  # Local path or HF Hub ID
        backend="transformers",
    ),
)

evaluator = Evaluator(config)
results = evaluator.evaluate(dataset=dataset)

print(f"Overall Score: {results.metrics.overall_score:.2%}")

Why Use It

DeepFabric stands out by generating synthetic data that ensures high diversity while maintaining domain-anchored relevance, thanks to its unique topic graph generation algorithms. This approach prevents model overfit, a common issue with other tools. A key differentiator is its support for real tool execution using the Spin Framework, allowing agents to interact with isolated WebAssembly sandboxes. This produces authentic training data where decisions are based on actual observations, rather than simulated outputs. The platform also offers robust evaluation metrics, including tool selection accuracy, parameter accuracy, and execution success rate, providing a comprehensive view of model performance.

Links

Related repositories

Similar repositories that may be relevant next.

Docling: Streamline Document Processing for Generative AI Applications

Docling: Streamline Document Processing for Generative AI Applications

July 3, 2026

Docling is a powerful Python library designed to simplify document processing, preparing diverse formats for generative AI applications. It offers advanced parsing capabilities, including sophisticated PDF understanding, and provides a unified document representation. With seamless integrations into the AI ecosystem, Docling empowers developers to build robust AI solutions.

aidocument-parsingpdf-converter
OpenMontage: The First Open-Source, Agentic Video Production System

OpenMontage: The First Open-Source, Agentic Video Production System

June 29, 2026

OpenMontage is the world's first open-source, agentic video production system, designed to transform your AI coding assistant into a full video production studio. It features 12 pipelines, 52 tools, and over 500 agent skills, enabling end-to-end video creation from a simple prompt. This powerful tool handles research, scripting, asset generation, editing, and final composition, including the unique ability to produce real video from stock footage.

agentic-aivideo-productionopen-source
MarkLLM: An Open-Source Toolkit for LLM Watermarking

MarkLLM: An Open-Source Toolkit for LLM Watermarking

June 23, 2026

MarkLLM is an open-source toolkit designed to simplify the research and application of watermarking technologies for large language models (LLMs). It offers a unified framework for implementing various watermarking algorithms, alongside robust visualization and comprehensive evaluation tools. This toolkit helps researchers and the broader community understand and assess the authenticity and origin of machine-generated text.

large-language-modelsllmsafety
Agent-Reach: Empower Your AI Agents with Internet Access, Zero API Fees

Agent-Reach: Empower Your AI Agents with Internet Access, Zero API Fees

June 21, 2026

Agent-Reach is a powerful GitHub repository that equips AI agents with the ability to access and search the entire internet, including platforms like Twitter, Reddit, YouTube, and Bilibili. It provides a streamlined CLI experience, eliminating the need for complex API configurations and associated fees. This project ensures your AI agent can "see" and interact with web content effortlessly.

ai-agentagent-infrastructureai-search

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️