Adding a New Evaluation Benchmark¶
Overview¶
Evaluations are used to benchmark agent performance on specific tasks.
Steps¶
- Create evaluation module in
src/aigise/evaluations/ - Implement evaluation interface
- Add configuration template
- Add sample data handling
Evaluation Structure¶
# src/aigise/evaluations/my_benchmark/my_evaluation.py
from aigise.evaluations import EvaluationTask
class MyEvaluation:
async def run_evaluation(self, tasks):
# Evaluation logic
pass
Configuration¶
Create config template in src/aigise/evaluations/configs/:
Data Handling¶
- Load benchmark data
- Run agents on tasks
- Collect results
- Generate metrics
See Also¶
- Development Guides - Other development guides
- Testing Debugging - Testing evaluations