Demo data — toggle to recompute from your own activity.
Files in pipeline
2,140
+18%
Q2 submissions
Volume up as three new source systems onboarded.
Ingestion-ready
71%
+6 pts
target ≥ 80%
Most files pass; PII redaction is the main gap.
Avg prep effort
2.4 hrs
−33%
per file · was 3.6
Automated profiling cut hands-on cleanup time.
Blocked / rejected
214
−4%
10% of intake
Mostly sensitive-data and licensing holds.
Ingestion funnel
Files received this quarter and where they end up
Received2,140 · 100.0%
Raw files submitted from all sources
Profiled2,086 · 97.5%
Readable & structure detected
Cleaned1,842 · 86.1%
Dedup, missing-value & format fixes
Guideline-cleared1,597 · 74.6%
PII, provenance & metadata resolved
Approved1,521 · 71.1%
Passed the ingestion gate
Handed to RAG1,521 · 71.1%
Embedded into the vector database
Where the effort goes
Avg analyst hours per file
2.4
hrs/file
- Profiling0.3 hrs13%
- Cleaning0.8 hrs33%
- PII review0.6 hrs25%
- Guideline sign-off0.4 hrs17%
- Chunk tuning0.3 hrs13%
Readiness trend
% of files approved on first pass vs. target
What the numbers mean
Effort is paying off
Automated profiling dropped average prep time from 3.6 to 2.4 hours per file while volume grew 18%.
One gap remains
Readiness sits at 71% against an 80% target. The bottleneck is PII redaction and licensing sign-off, not data quality.
Recommended focus
Standardize redaction and pre-clear common source licenses to lift first-pass approval above target — which means fewer conflicting answers downstream in the RAG Evaluator.
ROI calculator
What automating data prep is worth at your volume
Effort / file
1.9 hrs
was 3.6 hrs
Hours saved / mo
840
analyst hours
Saved / year
$655k
at $65/hr
Automation here = the share of profiling, cleaning, and PII clearance the Data Lab handles before an analyst touches the file.