Data

New here? How this lab works

Demo data — toggle to recompute from your own activity.

Files in pipeline
2,140
+18%
Q2 submissions
Volume up as three new source systems onboarded.
Ingestion-ready
71%
+6 pts
target ≥ 80%
Most files pass; PII redaction is the main gap.
Avg prep effort
2.4 hrs
−33%
per file · was 3.6
Automated profiling cut hands-on cleanup time.
Blocked / rejected
214
−4%
10% of intake
Mostly sensitive-data and licensing holds.

Ingestion funnel

Files received this quarter and where they end up

Received2,140 · 100.0%
Raw files submitted from all sources
Profiled2,086 · 97.5%
Readable & structure detected
Cleaned1,842 · 86.1%
Dedup, missing-value & format fixes
Guideline-cleared1,597 · 74.6%
PII, provenance & metadata resolved
Approved1,521 · 71.1%
Passed the ingestion gate
Handed to RAG1,521 · 71.1%
Embedded into the vector database

Where the effort goes

Avg analyst hours per file

2.4
hrs/file
  • Profiling0.3 hrs13%
  • Cleaning0.8 hrs33%
  • PII review0.6 hrs25%
  • Guideline sign-off0.4 hrs17%
  • Chunk tuning0.3 hrs13%

Readiness trend

% of files approved on first pass vs. target

What the numbers mean

Effort is paying off

Automated profiling dropped average prep time from 3.6 to 2.4 hours per file while volume grew 18%.

One gap remains

Readiness sits at 71% against an 80% target. The bottleneck is PII redaction and licensing sign-off, not data quality.

Recommended focus

Standardize redaction and pre-clear common source licenses to lift first-pass approval above target — which means fewer conflicting answers downstream in the RAG Evaluator.

ROI calculator

What automating data prep is worth at your volume

Effort / file
1.9 hrs
was 3.6 hrs
Hours saved / mo
840
analyst hours
Saved / year
$655k
at $65/hr

Automation here = the share of profiling, cleaning, and PII clearance the Data Lab handles before an analyst touches the file.