DeepFabric: High-Quality Synthetic Data for Agentic AI Systems
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
DeepFabric is an open-source Python library designed to generate high-quality synthetic training data for language models and agent evaluations. It excels at creating domain-specific datasets that teach models to think, plan, and act effectively, including correct tool usage and adherence to schema structures. This comprehensive pipeline also integrates training and evaluation capabilities, ensuring robust model development.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
DeepFabric is a powerful open-source Python library that streamlines the process of generating synthetic training data for language models and evaluating agentic systems. It provides a complete pipeline to create high-quality, domain-specific datasets, train models, and rigorously assess their performance, particularly in tool-calling scenarios. By focusing on realistic reasoning traces and tool-calling patterns, DeepFabric helps develop models that can think, plan, and act effectively.
Installation
Getting started with DeepFabric is straightforward. You can install it using pip:
pip install deepfabric
Examples
DeepFabric can be used via its CLI, as a library, or with YAML configurations. Here's a quick example using the CLI to generate a dataset:
export OPENAI_API_KEY="your-api-key"
deepfabric generate \
--topic-prompt "Python programming fundamentals" \
--generation-system-prompt "You are a Python expert" \
--mode graph \
--depth 3 \
--degree 3 \
--num-samples 9 \
--batch-size 3 \
--provider openai \
--model gpt-4o \
--output-save-as dataset.jsonl
This command generates a topic graph and creates 27 unique nodes, then generates 27 training samples saved to dataset.jsonl, ensuring 100% topic coverage.
For evaluation, after training your model, you can use DeepFabric's built-in evaluator:
from deepfabric.evaluation import Evaluator, EvaluatorConfig, InferenceConfig
from datasets import load_dataset
# Load your evaluation dataset
dataset = load_dataset("your-username/your-dataset", split="test")
config = EvaluatorConfig(
inference_config=InferenceConfig(
model_path="./output/checkpoint-final", # Local path or HF Hub ID
backend="transformers",
),
)
evaluator = Evaluator(config)
results = evaluator.evaluate(dataset=dataset)
print(f"Overall Score: {results.metrics.overall_score:.2%}")
Why Use It
DeepFabric stands out by generating synthetic data that ensures high diversity while maintaining domain-anchored relevance, thanks to its unique topic graph generation algorithms. This approach prevents model overfit, a common issue with other tools. A key differentiator is its support for real tool execution using the Spin Framework, allowing agents to interact with isolated WebAssembly sandboxes. This produces authentic training data where decisions are based on actual observations, rather than simulated outputs. The platform also offers robust evaluation metrics, including tool selection accuracy, parameter accuracy, and execution success rate, providing a comprehensive view of model performance.
Links
- GitHub Repository: https://github.com/nolabs-ai/deepfabric
- Documentation: https://always-further.github.io/deepfabric/
- Discord: https://discord.gg/pPcjYzGvbS
- Issues: https://github.com/always-further/deepfabric/issues
Related repositories
Similar repositories that may be relevant next.

Docling: Streamline Document Processing for Generative AI Applications
July 3, 2026
Docling is a powerful Python library designed to simplify document processing, preparing diverse formats for generative AI applications. It offers advanced parsing capabilities, including sophisticated PDF understanding, and provides a unified document representation. With seamless integrations into the AI ecosystem, Docling empowers developers to build robust AI solutions.
OpenMontage: The First Open-Source, Agentic Video Production System
June 29, 2026
OpenMontage is the world's first open-source, agentic video production system, designed to transform your AI coding assistant into a full video production studio. It features 12 pipelines, 52 tools, and over 500 agent skills, enabling end-to-end video creation from a simple prompt. This powerful tool handles research, scripting, asset generation, editing, and final composition, including the unique ability to produce real video from stock footage.

MarkLLM: An Open-Source Toolkit for LLM Watermarking
June 23, 2026
MarkLLM is an open-source toolkit designed to simplify the research and application of watermarking technologies for large language models (LLMs). It offers a unified framework for implementing various watermarking algorithms, alongside robust visualization and comprehensive evaluation tools. This toolkit helps researchers and the broader community understand and assess the authenticity and origin of machine-generated text.

Agent-Reach: Empower Your AI Agents with Internet Access, Zero API Fees
June 21, 2026
Agent-Reach is a powerful GitHub repository that equips AI agents with the ability to access and search the entire internet, including platforms like Twitter, Reddit, YouTube, and Bilibili. It provides a streamlined CLI experience, eliminating the need for complex API configurations and associated fees. This project ensures your AI agent can "see" and interact with web content effortlessly.
Source repository
Open the original repository on GitHub.