End-to-end evaluation tutorial
This tutorial shows how to run an evaluation suite against a small dataset.
- Prepare your dataset (CSV with columns: query, ground_truth_answer, ground_truth_docs_ids)
- Run a script that executes runs for each query and stores the trace
- Compute metrics
Note: This tutorial is yet to be implemented as a ready-to-run notebook. The code snippets below illustrate the key steps, that will be reproducible.
from vero.rag import SimpleRAGPipeline
from vero.trace import TraceDB
from vero.eval import Evaluator
trace_db = TraceDB(db_path="runs.db")
pipeline = SimpleRAGPipeline(retriever="faiss", generator="openai", trace_db=trace_db)
# Run your pipeline
run = pipeline.run("Who invented the transistor?")
print("Answer:", run.answer)
# Later, compute metrics for all runs
evaluator = Evaluator(trace_db=trace_db)
results = evaluator.evaluate()
print(results)