Demo data — toggle to recompute from your own activity.
Pipeline stages
Throughput and yield per transform on the latest batch (2,140 files in)
Ingest & decode97.5% yield
54 files unreadable (corrupt / wrong encoding) → quarantined
2,140 in → 2,086 out
Profile structure100% yield
Schema, types & cardinality inferred for every file
2,086 in → 2,086 out
Clean & normalize88.3% yield
Dedup, trim, missing-value & date-format passes
2,086 in → 1,842 out
Apply org guidelines91.2% yield
Freshness, provenance & metadata rules enforced
1,842 in → 1,680 out
PII scan & redact95.1% yield
83 files needed redaction or manual sign-off
1,680 in → 1,597 out
Chunk & embed-prep95.2% yield
Semantic split at ~512 tokens; 76 over-length re-split
1,597 in → 1,521 out
Quality dimensions
Batch score vs target
Completeness
94%target 90%
De-duplication
97%target 95%
PII clearance
87%target 95%
Metadata coverage
82%target 90%
Format consistency
91%target 90%
Avg chunk tokens
468target 512
Recent files
Per-stage status across the last submissions
| File | Type | Profiled | Cleaned | PII cleared | Chunks | Gate |
|---|---|---|---|---|---|---|
| policies_2024.pdf.txt | TXT | 312 | Approved | |||
| crm_export_v2.csv | CSV | 1,204 | Rejected | |||
| vendor_kb.json | JSON | 880 | Approved | |||
| support_tickets.csv | CSV | 2,460 | Conditional | |||
| travel_policy_v2.7.txt | TXT | 140 | Hold | |||
| eng_update_q2.md | MD | 96 | Approved |