handit.ai: Your AI Teammate for Reliable Production AI

Introduction

Modern AI applications are often fragile, prone to issues like hallucinations, broken schemas, PII leaks, and silent failures. Debugging these problems can be a nightmare, especially when they occur in production. handit.ai steps in as your dedicated AI teammate, providing 24/7 monitoring and automated solutions to these challenges.

handit.ai is an open-source platform that automatically detects issues in your AI, generates and tests fixes against real data, and then ships these improvements as pull requests to your GitHub repository. It's built to make AI truly reliable in production, allowing your team to focus on building features rather than firefighting.

Installation

Getting your AI teammate up and running with handit.ai is straightforward and can be done in under 5 minutes.

Quick Start

Start the Setup Process: Navigate to your AI project directory and run:
```
npx @handit/cli setup
```
The CLI will guide you through connecting your handit.ai account, installing the SDK, configuring your API key, connecting evaluation models, and linking your GitHub repository for automated PRs.
Verify Your Setup:
- Check your dashboard at dashboard.handit.ai to see tracing data, quality scores, and agent performance.
- Confirm GitHub integration by checking your repository settings; the handit app should be installed and ready for PRs.

Manual Setup (Advanced)

For custom control, you can manually install the SDK and add monitoring decorators to your agent functions.

Install the SDK:

# Python
pip install handit-ai

# JavaScript/TypeScript  
npm install @handit.ai/handit-ai

Add monitoring to your main agent function:

Python:

# Auto-generated by handit-cli setup
from handit_ai import tracing, configure
import os
 
configure(HANDIT_API_KEY=os.getenv("HANDIT_API_KEY"))
 
# Tracing added to your main agent function (entry point)
@tracing(agent="customer-service-agent")
async def process_customer_request(user_message: str):
    # Your existing agent logic (unchanged)
    intent = await classify_intent(user_message)
    context = await search_knowledge(intent)
    response = await generate_response(context)
    return response

JavaScript:

// Auto-generated by handit-cli setup
import { configure, startTracing, endTracing } from '@handit.ai/handit-ai';
 
configure({
  HANDIT_API_KEY: process.env.HANDIT_API_KEY
});
 
// Tracing added to your main agent function (entry point)
export const processCustomerRequest = async (userMessage) => {
  startTracing({ agent: "customer-service-agent" });
  try {
    // Your existing agent logic (unchanged)
    const intent = await classifyIntent(userMessage);
    const context = await searchKnowledge(intent);
    const response = await generate_response(context);
    return response;
    } finally {
    endTracing();
  }
};

Examples

handit.ai can power self-improving AI agents across various use cases. One compelling example is the Unstructured to Structured agent.

This example demonstrates an AI agent that automatically converts messy, unstructured documents into clean, structured data and CSV tables. It's ideal for processing invoices, contracts, or medical reports. The key feature is its self-improvement capability, where handit.ai observes every agent interaction, detects failures, and automatically fixes them, making the agent better over time.

Key Features:

Schema Inference: AI analyzes documents and creates optimal JSON structures.
Data Extraction: Maps document fields to schema with confidence scoring.
CSV Generation: Automatically creates organized tables for data visualization.
Multimodal Support: Handles images, PDFs, and text files.
Self-improvement: Handit observes interactions and automatically fixes detected failures.

You can explore the source code for this example and others on the handit-examples GitHub repository.

Why Use handit.ai?

handit.ai addresses critical challenges in AI reliability by providing a comprehensive, automated solution.

Real-Time Failure Detection

handit.ai acts as your 24/7 on-call engineer, monitoring every request and catching failures before they impact customers. It detects:

Hallucinations and incorrect responses
Schema breaks and validation errors
PII leaks and security issues
Performance degradation and timeouts

Automated Fix Generation

The platform analyzes root causes, generates intelligent fixes, and tests solutions against actual production failure cases. This includes:

Prompt improvements and optimizations
Configuration changes and guardrails
Code fixes for logic errors
Model parameter adjustments

GitHub-Native Deployment

Once fixes are proven, handit.ai opens pull requests with detailed explanations, performance data, and A/B testing results. You can review and merge, or even configure auto-deployment with guardrails.

Proven Results

Teams like Aspe.ai and XBuild have seen significant improvements:

Aspe.ai: Achieved +62.3% accuracy improvement and +97.8% success rate within 48 hours.
XBuild: Saw +34.6% accuracy improvement and +19.1% success rate, eliminating prompt drift with thousands of automatic evaluations.

Broad Language Support

handit.ai supports a wide range of languages and frameworks, including Python, JavaScript, TypeScript, Go, Java, C#, Ruby, PHP, LangChain, LangGraph, LlamaIndex, AutoGen, and CrewAI.