Executive Overview · Sudeep Lalka

Demo data — toggle to recompute from your own activity.

Files in pipeline

2,140

+18%

Q2 submissions

Volume up as three new source systems onboarded.

Ingestion-ready

71%

+6 pts

target ≥ 80%

Most files pass; PII redaction is the main gap.

Avg prep effort

2.4 hrs

−33%

per file · was 3.6

Automated profiling cut hands-on cleanup time.

Blocked / rejected

214

−4%

10% of intake

Mostly sensitive-data and licensing holds.

Ingestion funnel

Files received this quarter and where they end up

Received2,140 · 100.0%

Raw files submitted from all sources

Profiled2,086 · 97.5%

Readable & structure detected

Cleaned1,842 · 86.1%

Dedup, missing-value & format fixes

Guideline-cleared1,597 · 74.6%

PII, provenance & metadata resolved

Approved1,521 · 71.1%

Passed the ingestion gate

Handed to RAG1,521 · 71.1%

Embedded into the vector database

Where the effort goes

Avg analyst hours per file

2.4

hrs/file

Profiling0.3 hrs13%
Cleaning0.8 hrs33%
PII review0.6 hrs25%
Guideline sign-off0.4 hrs17%
Chunk tuning0.3 hrs13%

Readiness trend

% of files approved on first pass vs. target

What the numbers mean

Effort is paying off

Automated profiling dropped average prep time from 3.6 to 2.4 hours per file while volume grew 18%.

One gap remains

Readiness sits at 71% against an 80% target. The bottleneck is PII redaction and licensing sign-off, not data quality.

Recommended focus

Standardize redaction and pre-clear common source licenses to lift first-pass approval above target — which means fewer conflicting answers downstream in the RAG Evaluator.

ROI calculator

What automating data prep is worth at your volume

Files / month500Automation level60%

Effort / file

1.9 hrs

was 3.6 hrs

Hours saved / mo

840

analyst hours

Saved / year

$655k

at $65/hr

Automation here = the share of profiling, cleaning, and PII clearance the Data Lab handles before an analyst touches the file.