Build · RAG

New here? How this lab works

Failures by Category

Colored by severity. Citation and stale-source errors dominate.

Failure Heatmap

Failure counts by domain and category.

DomainIncorrect citationStale sourceUnsupported claimConflicting documentsPartial context
Finance43232
Compliance34421
Security21212
Legal31111
HR12000
Support10000
LowerHigher

Root Cause Insights

The three highest-volume failure patterns and how to resolve them.

Incorrect citation

High

14 failures · 23% of total

Why: Citations map to topically related but non-supporting chunks; no claim-to-evidence overlap check.

Fix: Require token/semantic overlap between claim and cited span before surfacing a citation.

Stale source

High

11 failures · 18% of total

Why: Retired document versions (Travel v2.7, AI Governance v1.3) remain in the index and outrank current versions.

Fix: Add freshness metadata and down-weight or exclude superseded versions at retrieval time.

Unsupported claim

Critical

9 failures · 15% of total

Why: Model compresses multi-condition policies into a single condition, producing confident but unsupported claims.

Fix: Enforce claim-level grounding and refuse partial answers on critical-risk policies.

Top Failing Documents

Source documents responsible for the most failures.

DocumentVersionFailed QueriesDominant Failure ModeRiskRecommended Fix
Employee Travel Policyv2.7 (retired)8Stale source conflicting with v3.2HighRemove retired version from the index or hard-filter by effective date.
AI Usage Governance Standardv1.37Ambiguous multi-condition sections; unsupported claimsCriticalRevise ambiguous sections and add structured condition lists for grounding.
Finance Approval Matrixv1.66Unclear approval thresholds across tiersCriticalClarify tier thresholds and chunk the matrix so each tier is independently retrievable.
Travel Policy Addendumv1.44Overlapping reimbursement guidance with main policyMediumMerge the addendum into v3.2 or mark precedence explicitly.