Skip to content

CSV Module

The CSV module provides functionality for loading test sets from CSV files.

load_testset

ragpill.csv.testset.load_testset

load_testset(csv_path, evaluator_classes=default_evaluator_classes, skip_unknown_evaluators=False, question_column='Question', test_type_column='test_type', expected_column='expected', tags_column='tags', check_column='check')

Create a Dataset from a CSV file with evaluator configurations.

Each evaluator class must implement a from_csv_line() class method that accepts: - Standard CSV columns: expected, tags, check - Additional CSV columns as **kwargs (passed to evaluator.attributes)

CSV Format

The CSV file should contain the following standard columns:

  • Question: The input question/prompt for the test case
  • test_type: Name of the evaluator class (must match key in evaluator_classes dict)
  • expected, tags, check: Standard evaluator parameters

For detailed descriptions of these parameters, see ragpill.base.BaseEvaluator.from_csv_line.

Any additional columns (e.g., priority, category, domain) will be: 1. Passed to each evaluator's attributes dict via **kwargs in from_csv_line() 2. If all evaluators for a question have the same value for an attribute, that attribute becomes part of the Test Case metadata and will be visible in MLflow

Global Evaluators

Rows with empty questions are treated as global evaluators and will be added to ALL test cases:

Question,test_type,expected,tags,check
,LLMJudge,true,global,"response is polite"
What is X?,RegexEvaluator,true,factual,"X.*definition"

The LLMJudge evaluator will be added to all cases, including the "What is X?" case.

Custom Attributes

You can add custom columns to track metadata:

Question,test_type,expected,tags,check,priority,category
What is X?,LLMJudge,true,factual,"contains the fact, that x is ...",high,science
What is Y's email?,RegexEvaluator,true,"auth,contacts","y@example.com",low,validation

These custom attributes (priority, category) are automatically: - Available in evaluator.attributes - Promoted to Case metadata if all evaluators share the same value - Visible in MLflow tracking for analysis and filtering

Parameters:

Name Type Description Default
csv_path str | Path

Path to the CSV file

required
evaluator_classes dict[str, type[BaseEvaluator]]

Dictionary mapping test_type names to evaluator classes. Extend default_evaluator_classes with custom evaluators: default_evaluator_classes | {'MyEval': MyEvaluator}

default_evaluator_classes
skip_unknown_evaluators bool

If True, skip rows with unknown evaluator types instead of raising an error

False
question_column str

Name of the column containing questions (default: 'Question')

'Question'
test_type_column str

Name of the column containing evaluator class names (default: 'test_type')

'test_type'
expected_column str

Name of the column for expected flag (default: 'expected')

'expected'
tags_column str

Name of the column for comma-separated tags (default: 'tags')

'tags'
check_column str

Name of the column for evaluator-specific check data (default: 'check')

'check'

Returns:

Type Description
Dataset[str, str, TestCaseMetadata]

Dataset with Cases grouped by question, each Case having multiple evaluators

Example
from ragpill.csv.testset import load_testset, default_evaluator_classes
from ragpill.evaluators import LLMJudge

# Extend default evaluators with custom ones
dataset = load_testset(
    csv_path='testset.csv',
    evaluator_classes=default_evaluator_classes | {'CustomEval': CustomEvaluator}
)
See Also

ragpill.base.BaseEvaluator.from_csv_line: Detailed descriptions of standard parameters ragpill.csv.testset.default_evaluator_classes: Dict of built-in evaluators

Source code in src/ragpill/csv/testset.py
def load_testset(
    csv_path: str | Path,
    evaluator_classes: dict[str, type[BaseEvaluator]] = default_evaluator_classes,
    skip_unknown_evaluators: bool = False,
    question_column: str = "Question",
    test_type_column: str = "test_type",
    expected_column: str = "expected",
    tags_column: str = "tags",
    check_column: str = "check",
) -> Dataset[str, str, TestCaseMetadata]:
    """Create a Dataset from a CSV file with evaluator configurations.

    Each evaluator class must implement a from_csv_line() class method that accepts:
    - Standard CSV columns: expected, tags, check
    - Additional CSV columns as **kwargs (passed to evaluator.attributes)

    CSV Format:
        The CSV file should contain the following standard columns:

        - **Question**: The input question/prompt for the test case
        - **test_type**: Name of the evaluator class (must match key in evaluator_classes dict)
        - **expected**, **tags**, **check**: Standard evaluator parameters

        For detailed descriptions of these parameters, see
        [`ragpill.base.BaseEvaluator.from_csv_line`][ragpill.base.BaseEvaluator.from_csv_line].

        Any additional columns (e.g., priority, category, domain) will be:
        1. Passed to each evaluator's attributes dict via **kwargs in from_csv_line()
        2. If all evaluators for a question have the same value for an attribute,
           that attribute becomes part of the Test Case metadata and will be visible in MLflow

    Global Evaluators:
        Rows with empty questions are treated as global evaluators and will be added to ALL test cases:

        ```csv
        Question,test_type,expected,tags,check
        ,LLMJudge,true,global,"response is polite"
        What is X?,RegexEvaluator,true,factual,"X.*definition"
        ```

        The LLMJudge evaluator will be added to all cases, including the "What is X?" case.

    Custom Attributes:
        You can add custom columns to track metadata:

        ```csv
        Question,test_type,expected,tags,check,priority,category
        What is X?,LLMJudge,true,factual,"contains the fact, that x is ...",high,science
        What is Y's email?,RegexEvaluator,true,"auth,contacts","y@example.com",low,validation
        ```

        These custom attributes (priority, category) are automatically:
        - Available in evaluator.attributes
        - Promoted to Case metadata if all evaluators share the same value
        - Visible in MLflow tracking for analysis and filtering

    Args:
        csv_path: Path to the CSV file
        evaluator_classes: Dictionary mapping test_type names to evaluator classes.
                          Extend default_evaluator_classes with custom evaluators:
                          `default_evaluator_classes | {'MyEval': MyEvaluator}`
        skip_unknown_evaluators: If True, skip rows with unknown evaluator types instead of raising an error
        question_column: Name of the column containing questions (default: 'Question')
        test_type_column: Name of the column containing evaluator class names (default: 'test_type')
        expected_column: Name of the column for expected flag (default: 'expected')
        tags_column: Name of the column for comma-separated tags (default: 'tags')
        check_column: Name of the column for evaluator-specific check data (default: 'check')

    Returns:
        Dataset with Cases grouped by question, each Case having multiple evaluators

    Example:
        ```python
        from ragpill.csv.testset import load_testset, default_evaluator_classes
        from ragpill.evaluators import LLMJudge

        # Extend default evaluators with custom ones
        dataset = load_testset(
            csv_path='testset.csv',
            evaluator_classes=default_evaluator_classes | {'CustomEval': CustomEvaluator}
        )
        ```

    See Also:
        [`ragpill.base.BaseEvaluator.from_csv_line`][ragpill.base.BaseEvaluator.from_csv_line]:
            Detailed descriptions of standard parameters
        [`ragpill.csv.testset.default_evaluator_classes`][ragpill.csv.testset.default_evaluator_classes]:
            Dict of built-in evaluators
    """
    # Read CSV
    rows = _read_csv_with_encoding(csv_path)

    # Group by question
    question_to_rows = _group_rows_by_question(rows, question_column)

    # Standard columns
    standard_columns = {question_column, test_type_column, expected_column, tags_column, check_column}

    # Extract global evaluators (rows with empty questions)
    global_evaluators: list[BaseEvaluator] = []
    global_rows = question_to_rows.pop("", None)
    if global_rows:
        for row in global_rows:
            test_type = row[test_type_column]
            evaluator_class = evaluator_classes.get(test_type)

            if evaluator_class is None:
                if skip_unknown_evaluators:
                    continue
                else:
                    raise ValueError(
                        f"Unknown evaluator type in global evaluator: {test_type}. Available types: {list(evaluator_classes.keys())}"
                    )

            evaluator, _, _ = create_evaluator_from_row(
                row, evaluator_class, standard_columns, expected_column, tags_column, check_column
            )
            global_evaluators.append(evaluator)

    # Create cases
    cases: list[Case[str, str, TestCaseMetadata]] = []
    for question, question_rows in question_to_rows.items():
        case = _create_case_from_rows(
            question=question,
            rows=question_rows,
            evaluator_classes=evaluator_classes,
            standard_columns=standard_columns,
            test_type_column=test_type_column,
            expected_column=expected_column,
            tags_column=tags_column,
            check_column=check_column,
            skip_unknown_evaluators=skip_unknown_evaluators,
        )
        if case is not None:
            cases.append(case)

    return Dataset[str, str, TestCaseMetadata](cases=cases, evaluators=global_evaluators)

default_evaluator_classes

ragpill.csv.testset.default_evaluator_classes module-attribute

default_evaluator_classes = {'LLMJudge': LLMJudge, 'WrappedPydanticEvaluator': WrappedPydanticEvaluator, 'RegexInSourcesEvaluator': RegexInSourcesEvaluator, 'RegexInDocumentMetadata': RegexInDocumentMetadataEvaluator, 'LiteralQuoteEvaluator': LiteralQuoteEvaluator, 'HasQuotesEvaluator': HasQuotesEvaluator, 'RegexInOutputEvaluator': RegexInOutputEvaluator}

See Also