Key Evaluation Metrics for RAG Systems

Evaluating RAG means scoring both retrieval and generation. You care not only that the answer is correct, but that it is grounded in the retrieved documents and that those documents are themselves relevant.

Common metrics include:

Retrieval precision / recall – how often relevant documents are retrieved.
Context relevance – whether retrieved docs truly match the query.
Answer accuracy – correctness of the final generated answer.
Citation quality – whether cited sources actually support the answer.
Hallucination rate – portion of content not supported by context.

For guidance on how these metrics fit into tests and checklists, see the RAG Systems pillar page.