Failure Analysis · Sudeep Lalka

Failures by Category

Colored by severity. Citation and stale-source errors dominate.

Failure counts by domain and category.

Domain	Incorrect citation	Stale source	Unsupported claim	Conflicting documents	Partial context
Finance	4	3	2	3	2
Compliance	3	4	4	2	1
Security	2	1	2	1	2
Legal	3	1	1	1	1
HR	1	2	0	0	0
Support	1	0	0	0	0

LowerHigher

The three highest-volume failure patterns and how to resolve them.

High

14 failures · 23% of total

Why: Citations map to topically related but non-supporting chunks; no claim-to-evidence overlap check.

Fix: Require token/semantic overlap between claim and cited span before surfacing a citation.

High

11 failures · 18% of total

Why: Retired document versions (Travel v2.7, AI Governance v1.3) remain in the index and outrank current versions.

Fix: Add freshness metadata and down-weight or exclude superseded versions at retrieval time.

Critical

9 failures · 15% of total

Why: Model compresses multi-condition policies into a single condition, producing confident but unsupported claims.

Fix: Enforce claim-level grounding and refuse partial answers on critical-risk policies.

Source documents responsible for the most failures.

Document	Version	Failed Queries	Dominant Failure Mode	Risk	Recommended Fix
Employee Travel Policy	v2.7 (retired)	8	Stale source conflicting with v3.2	High	Remove retired version from the index or hard-filter by effective date.
AI Usage Governance Standard	v1.3	7	Ambiguous multi-condition sections; unsupported claims	Critical	Revise ambiguous sections and add structured condition lists for grounding.
Finance Approval Matrix	v1.6	6	Unclear approval thresholds across tiers	Critical	Clarify tier thresholds and chunk the matrix so each tier is independently retrievable.
Travel Policy Addendum	v1.4	4	Overlapping reimbursement guidance with main policy	Medium	Merge the addendum into v3.2 or mark precedence explicitly.