Base Classes¶
This module contains the base classes for building custom evaluators and test cases.
BaseEvaluator¶
ragpill.base.BaseEvaluator
dataclass
¶
BaseEvaluator(evaluation_name=uuid4(), expected=None, attributes=dict(), tags=set(), is_global=False)
Bases: Evaluator
Base class for all evaluators.
All custom evaluators must inherit from this class and implement:
from_csv_lineclass method - for CSV integration withload_testsetrunasync method - for evaluation logic
Attributes:
| Name | Type | Description |
|---|---|---|
evaluation_name |
UUID
|
Unique identifier for this evaluator instance |
expected |
bool | None
|
Whether we expect this check to pass. Defaults to None, which means the value is inherited from the case's TestCaseMetadata.expected at evaluation time. If neither evaluator nor case metadata sets it, defaults to True. For non-global evaluators, an explicit evaluator value takes precedence over case metadata. For global evaluators, case metadata takes precedence. attributes: Dictionary for additional metadata (populated from extra CSV columns) |
tags |
set[str]
|
List of tags for organization and filtering |
is_global |
bool
|
Whether this evaluator applies to all test cases |
Note
The 'check' parameter is only used in from_csv_line() to pass configuration when creating the evaluator - it's not stored as a class attribute.
See Also
ragpill.csv.testset.load_testset:
Create datasets from CSV files
from_csv_line
classmethod
¶
Create an evaluator from a CSV line.
This class method is required for CSV integration with
load_testset.
The signature must be exactly as shown. Subclasses can override this method to
customize how they parse the check parameter or handle additional configuration.
Custom Attributes
Any additional CSV columns beyond the standard ones (Question, test_type, expected, tags, check) will be passed as **kwargs and stored in the evaluator's attributes dict. These can be used for metadata tracking, filtering, or custom logic.
If all evaluators for a question share the same attribute value, that attribute becomes part of the Test Case metadata and will be visible in MLflow.
Parameterization Patterns
There are two ways to parameterize custom evaluators:
-
Environment Variables (for shared config across all instances): Use pydantic-settings BaseSettings to load from environment variables. Good for API keys, global thresholds, model names, etc.
-
JSON in check column (for per-instance config): Parse JSON from the check parameter to get per-test configuration. Good for regex patterns, specific values, test-specific thresholds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expected
|
bool
|
Whether we expect this check to pass.
Set to |
required |
tags
|
set[str]
|
Comma-separated tags string from CSV for categorization and filtering. |
required |
check
|
str
|
Evaluator-specific configuration data. Can be JSON string or plain text. For JSON: Will be parsed and passed as **check_params to the evaluator. For plain text: Subclasses should override this method to handle their format. |
required |
**kwargs
|
Any
|
Additional attributes from extra CSV columns (e.g., priority, category).
These become part of |
{}
|
Returns:
| Type | Description |
|---|---|
BaseEvaluator
|
Instance of the evaluator class |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If check is not valid JSON and subclass hasn't overridden this method |
Example
For CSV usage examples, see the CSV Adapter Guide and Custom Evaluators Guide.
class MyEvaluator(BaseEvaluator):
pattern: str
@classmethod
def from_csv_line(cls, expected: bool, tags: set[str],
check: str, **kwargs):
# Parse check parameter (JSON or plain text)
try:
check_dict = json.loads(check)
pattern = check_dict.get('pattern', check)
except json.JSONDecodeError:
pattern = check # Use as-is
return cls(
expected=expected,
tags=tags,
attributes=kwargs, # Contains custom CSV columns
pattern=pattern,
)
Source code in src/ragpill/base.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 | |
run
async
¶
The method to implement the evaluation logic. Overwrite this in subclasses.
:param ctx: The evaluator context :type ctx: EvaluatorContext[Any, Any, EvaluatorMetadata] :return: The evaluation result with reason :rtype: EvaluationReason
Source code in src/ragpill/base.py
TestCaseMetadata¶
ragpill.base.TestCaseMetadata
¶
Bases: BaseModel
In general: For non-global evaluators the evaluator metadata takes precedence over case metadata. For global evaluators, the case metadata takes precedence over evaluator metadata. This is to allow global evaluators to set default expected values, which can be overridden by case metadata.
EvaluatorMetadata¶
ragpill.base.EvaluatorMetadata
¶
Bases: BaseModel
Metadata for LLM evaluation evaluators.
See Also¶
- Evaluators Module - Pre-built evaluators
- Custom Evaluators Tutorial