Build · RAG

New here? How this lab works

What matters most for this initiative?

Pick a priority profile — the ranking re-weights live. There's no universally best model, only the best fit for your constraints. Fine-tune the criterion weights below, or inspect the full matrix at the bottom.

Criterion weights
#1

Open-weights — small (self-hosted / edge)

Best fit

Cheap, fast, private — for narrower tasks.

78
fit
Self-hosted (VPC / on-prem)Open weightsVery low $/query · Fast on modest hardware · Smaller context window

Best for — On-prem or edge deployment, cost-critical volume, and well-scoped tasks you can fine-tune for.

Watch out — Limited raw reasoning; usually needs fine-tuning or tight retrieval to hit quality.

e.g. a small open-weights model on commodity hardware
#2

Fine-tuned small specialist

An open base trained for one repeatable task.

76
fit
Self-hosted (VPC / on-prem)Open weightsVery low $/query at scale · Fast · Smaller context window

Best for — A narrow, high-volume task you can train for — often the best $/quality once it's dialed in.

Watch out — Upfront labeled data and a training/eval loop; brittle outside its trained domain.

e.g. a fine-tuned open base for one workflow
#3

Open-weights — large (self-hosted)

Strong model you run inside your own boundary.

73
fit
Self-hosted (VPC / on-prem)Open weightsInfra cost, no per-token bill · Depends on your GPUs · Good context window

Best for — Regulated or sovereign data, deep customization, and avoiding a per-token vendor bill at scale.

Watch out — Real GPU + MLOps burden — you own capacity, uptime, patching, and evals.

e.g. a large open-weights family hosted in your VPC
#4

Multi-model router

A gateway that sends each query to the cheapest model that can handle it.

67
fit
Hosted APIClosed / APIBlended — cheap on easy queries · Moderate (routing hop) · Varies by target model

Best for — Mixed workloads — route easy questions to a cheap model and hard ones to a flagship, optimizing cost and quality together.

Watch out — You own the routing logic, evals, and fallbacks; a misroute quietly costs quality or money.

e.g. a model gateway in front of several providers
#5

Frontier hosted — fast / mini

Most of the quality, a fraction of the cost and latency.

66
fit
Hosted APIClosed / APILow–moderate $/query · Fast · Large context window

Best for — High-volume, latency-sensitive workloads that still need solid reasoning — the workhorse default.

Watch out — Lower ceiling on the genuinely hard queries; still hosted, so residency and lock-in remain.

e.g. the fast/mini tier of a major hosted provider
#6

Regional / sovereign hosted

Managed convenience with a residency guarantee.

66
fit
Regional hostedClosed / APIModerate $/query · Moderate · Good context window

Best for — Meeting data-residency rules without running infrastructure yourself.

Watch out — Smaller model menu, still a managed dependency, and partial lock-in.

e.g. an in-region cloud or sovereign AI offering
#7

Frontier hosted — flagship

Top of the capability curve, via a managed API.

60
fit
Hosted APIClosed / APIHighest $/query · Moderate · Very large context window

Best for — The hardest reasoning, long-document synthesis, and agentic chains where quality is non-negotiable.

Watch out — Priciest per call, data leaves your boundary, and you inherit vendor lock-in and rate limits.

e.g. the flagship tier of a major hosted provider
#8

Multimodal generalist

Handles text plus images, scans, charts, and audio in one pipeline.

59
fit
Hosted APIClosed / APIModerate–high $/query · Moderate · Large, multimodal context

Best for — Inputs beyond text — scanned documents, screenshots, diagrams, or audio that retrieval needs to read.

Watch out — Pays a premium for vision/audio; text-only quality can trail a text-specialized peer at the same price.

e.g. a vision + text hosted model
#9

Reasoning-specialized tier

Spends more compute to think through hard, multi-step problems.

54
fit
Hosted APIClosed / APIVery high $/answer · Slow (deliberate) · Large context window

Best for — Genuinely hard analysis, planning, math, or code where a slower, deeper answer is worth it.

Watch out — Slowest and priciest per answer — overkill and frustrating for simple lookups.

e.g. a reasoning-optimized hosted tier

Recommended

For: Balanced

Open-weights — small (self-hosted / edge)

78 fit

Leads on Data residency & control and Portability for this profile. Watch its operational simplicity.

Pick an engine to lock it in

Choosing a model carries its cost, latency, and deployment profile downstream into AI Ops and Govern — and frames the retrieval tuning you do next in this lab.

Choose, then revisit

Model choice and retrieval design co-evolve. Treat this as the starting gate — re-run it once your evaluations show where quality, cost, or latency actually bind.

How weighting changes the pick

See each candidate's profile, and exactly which criteria are driving the leader's fit right now.

Capability profiles

The shape of the top three candidates across all criteria. A bigger, rounder shape is a stronger all-rounder; a spiky shape is a specialist.

What drives Open-weights — small (self-hosted / edge)’s fit

Each bar is how much a criterion contributes to the 78-point fit right now (weight × score). Change a weight or profile and the bars re-proportion live — that’s the lever turning priorities into a pick.

Data residency & control
15% · w3·s96
Portability
15% · w3·s95
Customizability
15% · w3·s92
Cost efficiency
14% · w3·s90
Speed / latency
14% · w3·s86
Capability & quality
10% · w3·s60
Context headroom
9% · w3·s56
Operational simplicity
7% · w3·s46

Bar length = share of the fit it drives; color = how strong this model is on that axis (green strong → red weak). A long red bar is a heavily-weighted weakness; a long green bar is why it’s winning.

Full comparison matrix

Every candidate across every criterion (higher is better; cost, latency and ops are already inverted).

ModelCapability & quality Cost efficiency Speed / latency Context headroom Data residency Portability Fine-tuning Ops simplicity Fit
Open-weights — small (self-hosted / edge)609086569695924678
Fine-tuned small specialist708886529284984076
Open-weights — large (self-hosted)856050729695963073
Multi-model router888068803862645467
Frontier hosted — fast / mini807692763028509666
Regional / sovereign hosted755666708045508666
Frontier hosted — flagship963255883025559660
Multimodal generalist844656823030509259
Reasoning-specialized tier982430863026459554