# DeepFabric: High-Quality Synthetic Data for Agentic AI Systems

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Source: osrepos.com
Repository profile: https://osrepos.com/repo/stacklok-promptwright
Generated for open source discovery and AI-assisted research.

DeepFabric is an open-source Python library designed to generate high-quality synthetic training data for language models and agent evaluations. It excels at creating domain-specific datasets that teach models to think, plan, and act effectively, including correct tool usage and adherence to schema structures. This comprehensive pipeline also integrates training and evaluation capabilities, ensuring robust model development.

GitHub: https://github.com/stacklok/promptwright
OSRepos URL: https://osrepos.com/repo/stacklok-promptwright

## Summary

DeepFabric is an open-source Python library designed to generate high-quality synthetic training data for language models and agent evaluations. It excels at creating domain-specific datasets that teach models to think, plan, and act effectively, including correct tool usage and adherence to schema structures. This comprehensive pipeline also integrates training and evaluation capabilities, ensuring robust model development.

## Topics

- python
- ai
- machine-learning
- synthetic-data
- agents
- huggingface
- evaluation
- open-source

## Repository Information

Last analyzed by OSRepos: Thu Jul 02 2026 16:04:28 GMT+0100 (Western European Summer Time)
Detail views: 2
GitHub clicks: 1

## Safety Notice

OSRepos shares public repositories for knowledge and discovery only. Review source code, dependencies, licenses, and security implications before running or installing anything.

## Content

## Introduction
DeepFabric is a powerful open-source Python library that streamlines the process of generating synthetic training data for language models and evaluating agentic systems. It provides a complete pipeline to create high-quality, domain-specific datasets, train models, and rigorously assess their performance, particularly in tool-calling scenarios. By focusing on realistic reasoning traces and tool-calling patterns, DeepFabric helps develop models that can think, plan, and act effectively.

## Installation
Getting started with DeepFabric is straightforward. You can install it using pip:

bash
pip install deepfabric


## Examples
DeepFabric can be used via its CLI, as a library, or with YAML configurations. Here's a quick example using the CLI to generate a dataset:

bash
export OPENAI_API_KEY="your-api-key"

deepfabric generate \
  --topic-prompt "Python programming fundamentals" \
  --generation-system-prompt "You are a Python expert" \
  --mode graph \
  --depth 3 \
  --degree 3 \
  --num-samples 9 \
  --batch-size 3 \
  --provider openai \
  --model gpt-4o \
  --output-save-as dataset.jsonl


This command generates a topic graph and creates 27 unique nodes, then generates 27 training samples saved to `dataset.jsonl`, ensuring 100% topic coverage.

For evaluation, after training your model, you can use DeepFabric's built-in evaluator:

python
from deepfabric.evaluation import Evaluator, EvaluatorConfig, InferenceConfig
from datasets import load_dataset

# Load your evaluation dataset
dataset = load_dataset("your-username/your-dataset", split="test")

config = EvaluatorConfig(
    inference_config=InferenceConfig(
        model_path="./output/checkpoint-final",  # Local path or HF Hub ID
        backend="transformers",
    ),
)

evaluator = Evaluator(config)
results = evaluator.evaluate(dataset=dataset)

print(f"Overall Score: {results.metrics.overall_score:.2%}")


## Why Use It
DeepFabric stands out by generating synthetic data that ensures high diversity while maintaining domain-anchored relevance, thanks to its unique topic graph generation algorithms. This approach prevents model overfit, a common issue with other tools. A key differentiator is its support for real tool execution using the Spin Framework, allowing agents to interact with isolated WebAssembly sandboxes. This produces authentic training data where decisions are based on actual observations, rather than simulated outputs. The platform also offers robust evaluation metrics, including tool selection accuracy, parameter accuracy, and execution success rate, providing a comprehensive view of model performance.

## Links
*   GitHub Repository: [https://github.com/nolabs-ai/deepfabric](https://github.com/nolabs-ai/deepfabric){:target="_blank"}
*   Documentation: [https://always-further.github.io/deepfabric/](https://always-further.github.io/deepfabric/){:target="_blank"}
*   Discord: [https://discord.gg/pPcjYzGvbS](https://discord.gg/pPcjYzGvbS){:target="_blank"}
*   Issues: [https://github.com/always-further/deepfabric/issues](https://github.com/always-further/deepfabric/issues){:target="_blank"}