Build · RAG

New here? How this lab works
FaithfulnessWatch

84%

Answers are mostly grounded; a few over-generalize beyond the retrieved source.

CompletenessWatch

83%

Most answers cover required parts; multi-condition policies are sometimes truncated.

Answer RelevanceHealthy

90%

Answers directly address the question with little drift.

ConcisenessHealthy

87%

Responses avoid unnecessary padding while keeping required detail.

Citation AccuracyAt Risk

82%

Below threshold; topically related but non-supporting citations remain the top failure.

Refusal AppropriatenessHealthy

91%

The guardrail correctly refuses and escalates on bypass and high-risk requests.

Hallucination RiskWatch

11%

Lower-is-better. Concentrated in critical policy-exception queries.

Tone ConsistencyHealthy

93%

Answers maintain a consistent, enterprise-appropriate tone.

Baseline vs Improved Answer

Within how many days must international travel expenses be submitted?

baseline-vector-rag-v1

Travel expenses can be submitted within 90 days of travel.

Faithfulness55%
Citation Accuracy40%

Used the stale v2.7 source with no conflict detection.

compliance-guardrail-v6

Standard travel expense reports must be submitted within 30 days of trip completion (Global Expense Reimbursement Guide v2.8). A retired policy version lists 90 days and no longer applies; the conflict has been flagged for review.

Faithfulness86%
Citation Accuracy88%

Metadata filtering down-weighted the stale version and the guardrail surfaced the conflict.

Claim-Level Verification (Example)

When must a security exception be reviewed by the risk team?

Claim-Level Verification

Claim support 83%2 supported

Security exceptions require review by the Information Security Risk Committee.

Supported
Confidence 96%Source: Information Security Exception Policy v1.9SEC-2.1
All exceptions to security controls require review and approval by the Information Security Risk Committee.

Exceptions cannot be granted by an individual engineering manager.

Supported
Confidence 94%Source: Information Security Exception Policy v1.9SEC-2.1
Exceptions may not be granted by individual managers.

Exceptions extending beyond 90 days require re-review.

Partially Supported
Confidence 71%Source: Information Security Exception Policy v1.9SEC-2.4
Exceptions are time-bound and must be re-reviewed at intervals not exceeding 90 days.
Source requires re-review at least every 90 days, not only when exceeding 90 days. Minor paraphrase drift.

Answer Failure Examples

Subtle failures that answer-level scoring alone would miss.

Looked correct, cited the wrong source

High

Incorrect citation

Q: What is the first step after detecting a suspected security incident?

The first step is to contain the affected system by isolating it from the network, then notify the SOC.

What went wrong: Containment is a valid step, but the documented first step is to report to the SOC. The citation pointed to a related span that did not support the sequencing claim.

Expected: State 'report to the SOC first' and cite IR-2.1, which defines the reporting-first sequence.

Partially correct but incomplete

Critical

Partial context

Q: Who can approve finance exceptions above $100,000?

Finance exceptions above $100,000 can be approved by a Finance Director.

What went wrong: The supporting chunk (FIN-2.5) requiring CFO + Audit Committee approval ranked below the context cutoff, so the answer stated only the lower tier.

Expected: Retrieve the >$100k tier and answer that joint CFO and Audit Committee approval is required.

Hallucinated a policy exception

Critical

Unsupported claim

Q: What is the policy for using external AI tools with customer data?

Customer data may be used with external AI tools as long as the tool has a signed DPA.

What went wrong: The policy requires three conditions (approved list, DPA, governance-reviewed use case). The answer collapsed them into one, creating a dangerous false permission.

Expected: Require all three conditions, or refuse and route to governance review when uncertain.

Refused when it should have answered

Low

Refusal failure

Q: What is the per-diem for domestic overnight travel?

I'm not able to provide policy guidance on travel reimbursements.

What went wrong: An overly broad guardrail blocked a low-risk, clearly grounded lookup that should have been answered directly.

Expected: Answer $55 per day (or city exception) and cite the travel policy; reserve refusals for genuinely high-risk gaps.

Answered when it should have escalated

Critical

Guardrail gap

Q: Within how many days must international travel expenses be submitted?

Employees may submit international expenses up to 90 days after travel.

What went wrong: A stale document (Travel v2.7) outranked the current guide. The system answered confidently instead of flagging the version conflict.

Expected: Detect the conflict between v2.7 and v2.8, prefer the current 30-day rule, and surface the discrepancy.