Data

New here? How this lab works

Demo data — toggle to recompute from your own activity.

Pipeline stages

Throughput and yield per transform on the latest batch (2,140 files in)

Ingest & decode97.5% yield
54 files unreadable (corrupt / wrong encoding) → quarantined
2,140 in → 2,086 out
Profile structure100% yield
Schema, types & cardinality inferred for every file
2,086 in → 2,086 out
Clean & normalize88.3% yield
Dedup, trim, missing-value & date-format passes
2,086 in → 1,842 out
Apply org guidelines91.2% yield
Freshness, provenance & metadata rules enforced
1,842 in → 1,680 out
PII scan & redact95.1% yield
83 files needed redaction or manual sign-off
1,680 in → 1,597 out
Chunk & embed-prep95.2% yield
Semantic split at ~512 tokens; 76 over-length re-split
1,597 in → 1,521 out

Quality dimensions

Batch score vs target

Completeness
94%target 90%
De-duplication
97%target 95%
PII clearance
87%target 95%
Metadata coverage
82%target 90%
Format consistency
91%target 90%
Avg chunk tokens
468target 512

Recent files

Per-stage status across the last submissions

FileTypeProfiledCleanedPII clearedChunksGate
policies_2024.pdf.txtTXT312Approved
crm_export_v2.csvCSV1,204Rejected
vendor_kb.jsonJSON880Approved
support_tickets.csvCSV2,460Conditional
travel_policy_v2.7.txtTXT140Hold
eng_update_q2.mdMD96Approved