Total Cases
50
Passing
32 (64%)
Critical-Risk
9
Require Human Review
26
Golden Dataset
The evaluation test suite. Filter by domain, risk, difficulty, and status.
Showing 50 of 50 cases
| Question | Required Source | Category | Type | Difficulty | Risk | Status | Review |
|---|---|---|---|---|---|---|---|
| What is the daily meal reimbursement limit for international travel?Capped at $75 per day unless a regional exception applies. | Global Expense Reimbursement Guide v2.8Global Expense Reimbursement Guide, EXP-6.2 | Finance Policy | Simple lookup | Easy | Low | Passed | — |
| When is manager approval required for travel expenses?For trips expected to exceed $2,500 or any single expense over $500. | Global Expense Reimbursement Guide v2.8Global Expense Reimbursement Guide, EXP-7.6 | Finance Policy | Multi-hop | Medium | Medium | Passed | — |
| Within how many days must travel expenses be submitted?Within 30 days of trip completion per the current guide. | Global Expense Reimbursement Guide v2.8Global Expense Reimbursement Guide, EXP-1.1 | Finance Policy | Conflicting source | Hard | Medium | Failed | — |
| Can employees claim reimbursement without a receipt?Receipts required at $25 or more; under $25 allowed with a description. | Global Expense Reimbursement Guide v2.8Global Expense Reimbursement Guide, EXP-2.3 | Finance Policy | Low-risk FAQ | Easy | Low | Passed | — |
| Who can approve finance exceptions above $100,000?Joint CFO and Audit Committee approval is required above $100,000. | Global Expense Reimbursement Guide v2.8Global Expense Reimbursement Guide, EXP-1.5 | Finance Policy | High-risk policy | Edge Case | Critical | Failed | Yes |
| What is the mileage reimbursement rate for personal vehicle use?Reimbursed at the current published per-mile rate in Appendix B. | Global Expense Reimbursement Guide v2.8Global Expense Reimbursement Guide, EXP-4.1 | Finance Policy | Simple lookup | Easy | Low | Passed | — |
| Are first-class flights reimbursable?Only with VP approval for flights over 8 hours; otherwise economy. | Global Expense Reimbursement Guide v2.8Global Expense Reimbursement Guide, EXP-2.4 | Finance Policy | Multi-hop | Medium | Medium | Passed | — |
| What documents are required before onboarding a new vendor?Signed MSA, security questionnaire, proof of insurance, and tax forms. | AI Usage Governance Standard v1.3AI Usage Governance Standard, AIG-7.1 | Compliance | Simple lookup | Medium | Medium | Passed | — |
| When must a security exception be reviewed by the risk team?All exceptions require Information Security Risk Committee review. | Access Management Standard v3.0Access Management Standard, ACC-4.1 | Security | Compliance-sensitive | Hard | Critical | Needs Review | Yes |
| What is the escalation path for a critical customer issue?Immediate escalation to on-call duty manager and account executive. | Customer Escalation Playbook v5.0Customer Escalation Playbook, ESC-7.1 | Customer Service | Simple lookup | Easy | Medium | Passed | — |
| What AI use cases require governance review?Those involving personal data, automated decisions, or external content. | AI Usage Governance Standard v1.3AI Usage Governance Standard, AIG-2.2 | Compliance | Ambiguous | Hard | High | Needs Review | Yes |
| How long should customer interaction records be retained?7 years from last activity unless a longer statutory period applies. | AI Usage Governance Standard v1.3AI Usage Governance Standard, AIG-1.5 | Compliance | Multi-hop | Medium | High | Passed | Yes |
| What is the first step after detecting a suspected security incident?Report it immediately to the Security Operations Center. | Access Management Standard v3.0Access Management Standard, ACC-7.1 | Security | Simple lookup | Medium | High | Passed | Yes |
| What access review frequency is required for privileged users?At least quarterly and immediately on role change or termination. | Access Management Standard v3.0Access Management Standard, ACC-4.1 | Security | Simple lookup | Medium | High | Passed | Yes |
| What should the assistant do if two policy documents conflict?Prefer the current version, disclose the conflict, escalate high-risk. | AI Usage Governance Standard v1.3AI Usage Governance Standard, AIG-3.3 | Compliance | Conflicting source | Hard | High | Needs Review | Yes |
| When should the system refuse and route to human review?On high-risk or critical queries lacking sufficient grounded evidence. | AI Usage Governance Standard v1.3AI Usage Governance Standard, AIG-7.2 | Compliance | Compliance-sensitive | Edge Case | Critical | Needs Review | Yes |
| What is the policy for using external AI tools with customer data?Only approved tools, with a DPA, and a governance-reviewed use case. | AI Usage Governance Standard v1.3AI Usage Governance Standard, AIG-2.5 | Compliance | High-risk policy | Edge Case | Critical | Failed | Yes |
| What are the required contract review steps before signature?Clause checks, risk review, and sign-off by authorized approver. | Contract Review Checklist v2.7Contract Review Checklist, CON-5.5 | Legal | Multi-hop | Hard | High | Needs Review | Yes |
| Who has authority to sign contracts above $250,000?Requires General Counsel and CFO co-signature. | Contract Review Checklist v2.7Contract Review Checklist, CON-3.1 | Legal | High-risk policy | Hard | Critical | Needs Review | Yes |
| What indemnification clauses require legal escalation?Uncapped liability or IP indemnity clauses require legal escalation. | Contract Review Checklist v2.7Contract Review Checklist, CON-4.3 | Legal | Multi-hop | Hard | High | Needs Review | Yes |
| How do I reset a locked product account?Use the admin console reset flow; verify identity before unlocking. | Product Support Knowledge Base v6.2Product Support Knowledge Base, KB-2.5 | IT Support | Simple lookup | Easy | Low | Passed | — |
| What are the supported single sign-on providers?SAML 2.0 and OIDC providers listed in the integration guide. | Product Support Knowledge Base v6.2Product Support Knowledge Base, KB-2.5 | IT Support | Simple lookup | Easy | Low | Passed | — |
| How do I export audit logs from the product?Use Settings > Audit > Export; logs cover the last 13 months. | Product Support Knowledge Base v6.2Product Support Knowledge Base, KB-1.5 | Product Documentation | Simple lookup | Easy | Low | Passed | — |
| What is the data residency option for EU customers?EU data can be pinned to the Frankfurt region at provisioning. | Product Support Knowledge Base v6.2Product Support Knowledge Base, KB-4.4 | Product Documentation | Multi-hop | Medium | Medium | Passed | — |
| What is the SLA for severity-2 customer issues?Initial response within 4 business hours, daily updates. | Customer Escalation Playbook v5.0Customer Escalation Playbook, ESC-7.3 | Customer Service | Simple lookup | Easy | Medium | Passed | — |
| When should a customer issue be downgraded in severity?Only with customer confirmation that impact is reduced. | Customer Escalation Playbook v5.0Customer Escalation Playbook, ESC-8.5 | Customer Service | Ambiguous | Medium | Medium | Passed | — |
| How are refunds for service outages calculated?Pro-rated service credits per the SLA credit schedule. | Customer Escalation Playbook v5.0Customer Escalation Playbook, ESC-8.3 | Customer Service | Multi-hop | Medium | Medium | Passed | — |
| What is the password complexity requirement?Minimum 14 characters with MFA required for all accounts. | Access Management Standard v3.0Access Management Standard, ACC-5.2 | Security | Simple lookup | Easy | Medium | Passed | — |
| How often must access certifications be completed?Quarterly for privileged and semi-annually for standard access. | Access Management Standard v3.0Access Management Standard, ACC-3.6 | Security | Multi-hop | Medium | High | Passed | Yes |
| What triggers a mandatory data breach notification?Confirmed unauthorized access to regulated personal data. | AI Usage Governance Standard v1.3AI Usage Governance Standard, AIG-4.1 | Compliance | Compliance-sensitive | Hard | Critical | Needs Review | Yes |
| How long are backups retained?Operational backups for 35 days; archival per retention policy. | Product Support Knowledge Base v6.2Product Support Knowledge Base, KB-5.5 | IT Support | Simple lookup | Medium | Medium | Passed | — |
| What is the approval flow for emergency production changes?Expedited CAB review with retroactive documentation within 24h. | Product Support Knowledge Base v6.2Product Support Knowledge Base, KB-8.3 | IT Support | Multi-hop | Hard | High | Needs Review | Yes |
| Can contractors access production systems?Only with sponsored, time-bound, least-privilege access. | Access Management Standard v3.0Access Management Standard, ACC-8.3 | Security | High-risk policy | Hard | High | Needs Review | Yes |
| What is the per-diem for domestic overnight travel?Domestic per-diem is $55 per day unless a city exception applies. | Global Expense Reimbursement Guide v2.8Global Expense Reimbursement Guide, EXP-2.1 | Finance Policy | Simple lookup | Easy | Low | Passed | — |
| How are conflicting expense categories resolved?Categorize by primary business purpose; document the rationale. | Global Expense Reimbursement Guide v2.8Global Expense Reimbursement Guide, EXP-7.2 | Finance Policy | Ambiguous | Medium | Medium | Passed | — |
| What is the policy on gifts to clients?Gifts over $100 require manager and compliance pre-approval. | AI Usage Governance Standard v1.3AI Usage Governance Standard, AIG-6.2 | Compliance | Multi-hop | Medium | High | Passed | Yes |
| When is a privacy impact assessment required?For new processing of personal data or significant changes. | AI Usage Governance Standard v1.3AI Usage Governance Standard, AIG-8.4 | Compliance | Compliance-sensitive | Hard | High | Needs Review | Yes |
| How do I configure role-based access in the product?Assign roles under Settings > Access; follow least-privilege. | Product Support Knowledge Base v6.2Product Support Knowledge Base, KB-1.6 | Product Documentation | Simple lookup | Easy | Low | Passed | — |
| What is the retention period for security logs?Security logs retained for 13 months minimum. | Access Management Standard v3.0Access Management Standard, ACC-2.5 | Security | Simple lookup | Medium | High | Passed | Yes |
| Who approves exceptions to the data retention schedule?The Privacy Office, with documented legal justification. | AI Usage Governance Standard v1.3AI Usage Governance Standard, AIG-6.3 | Compliance | High-risk policy | Hard | Critical | Needs Review | Yes |
| What is the process for offboarding a vendor?Revoke access, recover data, and confirm deletion certificate. | AI Usage Governance Standard v1.3AI Usage Governance Standard, AIG-6.5 | Compliance | Multi-hop | Medium | Medium | Passed | — |
| How is customer consent recorded for data processing?Captured and versioned in the consent management system. | AI Usage Governance Standard v1.3AI Usage Governance Standard, AIG-8.5 | Compliance | Multi-hop | Medium | High | Passed | Yes |
| What are the steps to escalate a billing dispute?Route to Finance Shared Services with the invoice and context. | Global Expense Reimbursement Guide v2.8Global Expense Reimbursement Guide, EXP-8.1 | Finance Policy | Simple lookup | Easy | Low | Passed | — |
| When can a security exception be auto-renewed?Never; each renewal requires fresh committee review. | Access Management Standard v3.0Access Management Standard, ACC-2.3 | Security | Edge Case | Edge Case | Critical | Needs Review | Yes |
| What is the maximum duration of a security exception?Time-bound to 90 days, re-reviewed before renewal. | Access Management Standard v3.0Access Management Standard, ACC-8.6 | Security | Multi-hop | Medium | High | Passed | Yes |
| How do I request elevated database access?Submit an access request with business justification and sponsor. | Access Management Standard v3.0Access Management Standard, ACC-2.1 | Security | Simple lookup | Medium | High | Passed | Yes |
| What is the policy on storing customer PII in spreadsheets?Prohibited outside approved, access-controlled systems. | AI Usage Governance Standard v1.3AI Usage Governance Standard, AIG-5.6 | Compliance | High-risk policy | Hard | Critical | Failed | Yes |
| How are after-hours support requests handled?Routed to on-call rotation per the escalation playbook. | Customer Escalation Playbook v5.0Customer Escalation Playbook, ESC-8.3 | Customer Service | Simple lookup | Easy | Medium | Passed | — |
| What is the refund window for annual subscriptions?Pro-rated refunds within 30 days of renewal. | Global Expense Reimbursement Guide v2.8Global Expense Reimbursement Guide, EXP-7.6 | Finance Policy | Simple lookup | Easy | Low | Passed | — |
| When must contracts include a data processing addendum?Whenever a vendor processes personal data on our behalf. | Contract Review Checklist v2.7Contract Review Checklist, CON-6.1 | Legal | Compliance-sensitive | Hard | High | Needs Review | Yes |