Opik: Open-Source LLM Observability, Evaluation, and Optimization

Introduction

Opik, by Comet, is a robust open-source platform designed to debug, evaluate, and monitor LLM applications, RAG systems, and agentic workflows. It offers comprehensive tracing, automated evaluations, and production-ready dashboards, streamlining the generative AI development lifecycle from prototype to production. With Opik, developers can optimize prompts and agents, ensure full observability of LLM calls, and implement safe, responsible AI practices.

Installation

Getting started with Opik is straightforward, with options for cloud deployment or self-hosting.

Option 1: Comet.com Cloud (Recommended)

Create your free Comet account

Option 2: Self-Host for Full Control

Deploy Opik in your own environment, choosing between Docker for local setups or Kubernetes for scalability.

Self-Hosting with Docker Compose (Local Development)

For a local Opik instance, use the installation scripts:

On Linux or Mac:

# Clone the Opik repository
git clone https://github.com/comet-ml/opik.git

# Navigate to the repository
cd opik

# Start the Opik platform
./opik.sh

On Windows:

# Clone the Opik repository
git clone https://github.com/comet-ml/opik.git

# Navigate to the repository
cd opik

# Start the Opik platform
powershell -ExecutionPolicy ByPass -c ".\opik.ps1"

For detailed instructions, refer to the Local Deployment Guide.

Self-Hosting with Kubernetes & Helm (Scalable Deployments)

For production or larger-scale self-hosted deployments, Opik can be installed on a Kubernetes cluster using its Helm chart.

Kubernetes Installation Guide using Helm

Examples

Opik offers a comprehensive Python SDK and integrations to facilitate tracing and evaluation.

Python SDK Quick Start

Install the package and configure it:

# install using pip
pip install opik

# or install with uv
uv pip install opik

Configure the SDK:

opik configure

You can also configure programmatically, for example, opik.configure(use_local=True). Refer to the Python SDK documentation for more options.

Logging LLM Traces

The easiest way to log traces is to use one of Opik's many direct integrations, which support frameworks like LangChain, LlamaIndex, OpenAI, Autogen, and many others.

Alternatively, use the opik.track decorator:

import opik

opik.configure(use_local=True) # Run locally

@opik.track
def my_llm_function(user_question: str) -> str:
    # Your LLM code here
    return "Hello"

LLM as a Judge Metrics

Opik's Python SDK includes several LLM as a judge metrics to help evaluate your LLM application, such as hallucination detection.

from opik.evaluation.metrics import Hallucination

metric = Hallucination()
score = metric.score(
    input="What is the capital of France?",
    output="Paris",
    context=["France is a country in Europe."]
)
print(score)

Explore more metrics in the metrics documentation.

Why Use Opik?

Opik is an essential tool for any generative AI developer, offering:

Comprehensive Observability: Deep tracing of LLM calls, conversation logging, and agent activity.
Advanced Evaluation: Robust prompt evaluation, LLM-as-a-judge metrics, and experiment management.
Production-Ready: Scalable monitoring dashboards and online evaluation rules to identify production issues, supporting over 40 million traces per day.
Optimization and Safety: Tools like Opik Agent Optimizer and Opik Guardrails to continuously improve and secure your LLM applications.
Flexible Integration: Support for a wide range of frameworks and integration with CI/CD pipelines via PyTest.