Skip to content

Quick Start

Get started with ragpill in just a few minutes!

Basic Usage

1. Prepare Your Test Data

Create a CSV file with your test cases. Here's a simple example:

Question,test_type,expected,tags,check
capital of france?,LLMJudge,true,geography,The answer is Paris
2+2?,LLMJudge,true,math,The answer should be 4

For more details on the csv structure, see csv-adapter

2. Prepare env variables

We recommend a .env file which is automatically detected by pydantic-settings.

You need those environment variables for mlflow:

EVAL_MLFLOW_

If you are using LLMJudge. you need at least the API_KEY:

RAGPILL_LLMJUDGE_API_KEY=<your-api-key>
RAGPILL_LLMJUDGE_BASE_URL=<optional>
RAGPILL_LLMJUDGE_MODEL_NAME=<optional, defaults to 'gpt-4o'>

2. Load the TestSet

from pathlib import Path
from ragpill.csv.testset import load_testset, default_evaluator_classes

# Define your CSV path
csv_path = Path("testset.csv")

# Create the dataset using default evaluators
dataset = load_testset(
    csv_path=csv_path,
    evaluator_classes=default_evaluator_classes,
)

print(f"✅ Created dataset with {len(dataset.cases)} test cases")

Note: For LLMJudge to work, set these environment variables: - RAGPILL_LLMJUDGE_API_KEY - RAGPILL_LLMJUDGE_BASE_URL - RAGPILL_LLMJUDGE_MODEL_NAME

Note: For mlflow tracking to work, you can either pass the

3. Run Evaluation

# Define your agent or function to test
async def my_agent(question: str) -> str:
    # Your agent logic here
    # For this example, we'll use a simple mock
    return "Paris"

# Run evaluation
from pydantic_evals import eval_

results = await eval_(
    dataset=dataset,
    callable=my_agent,
)

# Print results
print(f"\n📊 Evaluation Results:")
print(f"Total cases: {len(results.results)}")

CSV Format Guide

Your CSV file should have these columns:

Column Description Required
Question The input question/prompt Yes
test_type Type of evaluator (e.g., LLMJudge) Yes
expected Boolean (true/false) - should this check pass? Yes
tags Comma-separated tags No
check Evaluation criteria (for LLMJudge: the rubric text) Yes

Multiple Evaluators Per Question

You can add multiple rows with the same question to apply multiple evaluators:

Question,test_type,expected,tags,check
What is the capital of France?,LLMJudge,true,"geography,factual",Should mention Paris
What is the capital of France?,LLMJudge,false,quality,Should NOT mention historical irrelevant details

Next Steps