Quick Start¶
Get started with ragpill in just a few minutes!
Basic Usage¶
1. Prepare Your Test Data¶
Create a CSV file with your test cases. Here's a simple example:
Question,test_type,expected,tags,check
capital of france?,LLMJudge,true,geography,The answer is Paris
2+2?,LLMJudge,true,math,The answer should be 4
For more details on the csv structure, see csv-adapter
2. Prepare env variables¶
We recommend a .env file which is automatically detected by pydantic-settings.
You need those environment variables for mlflow:
If you are using LLMJudge. you need at least the API_KEY:
RAGPILL_LLMJUDGE_API_KEY=<your-api-key>
RAGPILL_LLMJUDGE_BASE_URL=<optional>
RAGPILL_LLMJUDGE_MODEL_NAME=<optional, defaults to 'gpt-4o'>
2. Load the TestSet¶
from pathlib import Path
from ragpill.csv.testset import load_testset, default_evaluator_classes
# Define your CSV path
csv_path = Path("testset.csv")
# Create the dataset using default evaluators
dataset = load_testset(
csv_path=csv_path,
evaluator_classes=default_evaluator_classes,
)
print(f"✅ Created dataset with {len(dataset.cases)} test cases")
Note: For LLMJudge to work, set these environment variables:
- RAGPILL_LLMJUDGE_API_KEY
- RAGPILL_LLMJUDGE_BASE_URL
- RAGPILL_LLMJUDGE_MODEL_NAME
Note: For mlflow tracking to work, you can either pass the
3. Run Evaluation¶
# Define your agent or function to test
async def my_agent(question: str) -> str:
# Your agent logic here
# For this example, we'll use a simple mock
return "Paris"
# Run evaluation
from pydantic_evals import eval_
results = await eval_(
dataset=dataset,
callable=my_agent,
)
# Print results
print(f"\n📊 Evaluation Results:")
print(f"Total cases: {len(results.results)}")
CSV Format Guide¶
Your CSV file should have these columns:
| Column | Description | Required |
|---|---|---|
Question |
The input question/prompt | Yes |
test_type |
Type of evaluator (e.g., LLMJudge) | Yes |
expected |
Boolean (true/false) - should this check pass? | Yes |
tags |
Comma-separated tags | No |
check |
Evaluation criteria (for LLMJudge: the rubric text) | Yes |
Multiple Evaluators Per Question¶
You can add multiple rows with the same question to apply multiple evaluators:
Question,test_type,expected,tags,check
What is the capital of France?,LLMJudge,true,"geography,factual",Should mention Paris
What is the capital of France?,LLMJudge,false,quality,Should NOT mention historical irrelevant details
Next Steps¶
- Learn more about Loading TestSets in detail
- Explore Custom Evaluators
- Set up MLflow Integration for tracking