Build · RAG

New here? How this lab works

Evaluation Runs

Select a run to inspect its scorecard and regression analysis.

RunDateRetrieverRerankerPromptCasesOverallPassCritical FailsRegressionRelease
baseline-vector-rag-v12026-01-12Semantic (dense) onlyprompt-v1386466%5No RegressionBlock
query-rewrite-v22026-02-09Semantic + query rewritingprompt-v2446970%4No RegressionHold
hybrid-search-v32026-03-08Hybrid (dense + BM25)prompt-v2487375%3No RegressionHold
reranker-enabled-v42026-04-11Hybrid + cross-encoder rerankerprompt-v3487679%3WatchHold
citation-validator-v52026-05-16Hybrid + rerankerprompt-v4 (citation-grounded)507678%2WatchHold
compliance-guardrail-v62026-06-15Hybrid + reranker + metadata filterprompt-v5 (guardrail + escalation)507882%1WatchHold

Scorecard · compliance-guardrail-v6

Compliance guardrails cut critical failures to one and improved high-risk handling. Citation accuracy and P95 latency remain below target.

Overall Scorevs prev: +2
78%target 80%
Retrieval Qualityvs prev: 0
86%target 85%
Faithfulnessvs prev: +1
84%target 85%
Citation Accuracyvs prev: +4
82%target 85%
Pass Ratevs prev: +4
82%target 85%
High-Risk Pass Ratevs prev: +4
87%target 90%

Hallucination Risk

11%

Avg Latency

2.60s

P95 Latency

4.25s

Cost / Query

$0.042

Regression Analysis

vs citation-validator-v5

No Regression
  • All tracked metrics are within regression tolerances.

Release Recommendation

Hold

Run Comparison

Overall score and citation accuracy across all runs. Selected run highlighted.