Model Fit · Sudeep Lalka

What matters most for this initiative?

Pick a priority profile — the ranking re-weights live. There's no universally best model, only the best fit for your constraints. Fine-tune the criterion weights below, or inspect the full matrix at the bottom.

Criterion weights

Capability & quality 3Cost efficiency 3Speed / latency 3Context headroom 3Data residency & control 3Portability 3Customizability 3Operational simplicity 3

Open-weights — small (self-hosted / edge)

Best fit

Cheap, fast, private — for narrower tasks.

fit

Self-hosted (VPC / on-prem)Open weightsVery low $/query · Fast on modest hardware · Smaller context window

Best for — On-prem or edge deployment, cost-critical volume, and well-scoped tasks you can fine-tune for.

Watch out — Limited raw reasoning; usually needs fine-tuning or tight retrieval to hit quality.

e.g. a small open-weights model on commodity hardware

Fine-tuned small specialist

An open base trained for one repeatable task.

fit

Self-hosted (VPC / on-prem)Open weightsVery low $/query at scale · Fast · Smaller context window

Best for — A narrow, high-volume task you can train for — often the best $/quality once it's dialed in.

Watch out — Upfront labeled data and a training/eval loop; brittle outside its trained domain.

e.g. a fine-tuned open base for one workflow

Open-weights — large (self-hosted)

Strong model you run inside your own boundary.

fit

Self-hosted (VPC / on-prem)Open weightsInfra cost, no per-token bill · Depends on your GPUs · Good context window

Best for — Regulated or sovereign data, deep customization, and avoiding a per-token vendor bill at scale.

Watch out — Real GPU + MLOps burden — you own capacity, uptime, patching, and evals.

e.g. a large open-weights family hosted in your VPC

Multi-model router

A gateway that sends each query to the cheapest model that can handle it.

fit

Hosted APIClosed / APIBlended — cheap on easy queries · Moderate (routing hop) · Varies by target model

Best for — Mixed workloads — route easy questions to a cheap model and hard ones to a flagship, optimizing cost and quality together.

Watch out — You own the routing logic, evals, and fallbacks; a misroute quietly costs quality or money.

e.g. a model gateway in front of several providers

Frontier hosted — fast / mini

Most of the quality, a fraction of the cost and latency.

fit

Hosted APIClosed / APILow–moderate $/query · Fast · Large context window

Best for — High-volume, latency-sensitive workloads that still need solid reasoning — the workhorse default.

Watch out — Lower ceiling on the genuinely hard queries; still hosted, so residency and lock-in remain.

e.g. the fast/mini tier of a major hosted provider

Regional / sovereign hosted

Managed convenience with a residency guarantee.

fit

Regional hostedClosed / APIModerate $/query · Moderate · Good context window

Best for — Meeting data-residency rules without running infrastructure yourself.

Watch out — Smaller model menu, still a managed dependency, and partial lock-in.

e.g. an in-region cloud or sovereign AI offering

Frontier hosted — flagship

Top of the capability curve, via a managed API.

fit

Hosted APIClosed / APIHighest $/query · Moderate · Very large context window

Best for — The hardest reasoning, long-document synthesis, and agentic chains where quality is non-negotiable.

Watch out — Priciest per call, data leaves your boundary, and you inherit vendor lock-in and rate limits.

e.g. the flagship tier of a major hosted provider

Multimodal generalist

Handles text plus images, scans, charts, and audio in one pipeline.

fit

Hosted APIClosed / APIModerate–high $/query · Moderate · Large, multimodal context

Best for — Inputs beyond text — scanned documents, screenshots, diagrams, or audio that retrieval needs to read.

Watch out — Pays a premium for vision/audio; text-only quality can trail a text-specialized peer at the same price.

e.g. a vision + text hosted model

Reasoning-specialized tier

Spends more compute to think through hard, multi-step problems.

fit

Hosted APIClosed / APIVery high $/answer · Slow (deliberate) · Large context window

Best for — Genuinely hard analysis, planning, math, or code where a slower, deeper answer is worth it.

Watch out — Slowest and priciest per answer — overkill and frustrating for simple lookups.

e.g. a reasoning-optimized hosted tier

How weighting changes the pick

See each candidate's profile, and exactly which criteria are driving the leader's fit right now.

Capability profiles

The shape of the top three candidates across all criteria. A bigger, rounder shape is a stronger all-rounder; a spiky shape is a specialist.

What drives Open-weights — small (self-hosted / edge)’s fit

Each bar is how much a criterion contributes to the 78-point fit right now (weight × score). Change a weight or profile and the bars re-proportion live — that’s the lever turning priorities into a pick.

Data residency & control

15% · w3·s96

Portability

15% · w3·s95

Customizability

15% · w3·s92

Cost efficiency

14% · w3·s90

Speed / latency

14% · w3·s86

Capability & quality

10% · w3·s60

Context headroom

9% · w3·s56

Operational simplicity

7% · w3·s46

Bar length = share of the fit it drives; color = how strong this model is on that axis (green strong → red weak). A long red bar is a heavily-weighted weakness; a long green bar is why it’s winning.

Full comparison matrix

Every candidate across every criterion (higher is better; cost, latency and ops are already inverted).

Model	Capability & quality	Cost efficiency	Speed / latency	Context headroom	Data residency	Portability	Fine-tuning	Ops simplicity	Fit
Open-weights — small (self-hosted / edge)	60	90	86	56	96	95	92	46	78
Fine-tuned small specialist	70	88	86	52	92	84	98	40	76
Open-weights — large (self-hosted)	85	60	50	72	96	95	96	30	73
Multi-model router	88	80	68	80	38	62	64	54	67
Frontier hosted — fast / mini	80	76	92	76	30	28	50	96	66
Regional / sovereign hosted	75	56	66	70	80	45	50	86	66
Frontier hosted — flagship	96	32	55	88	30	25	55	96	60
Multimodal generalist	84	46	56	82	30	30	50	92	59
Reasoning-specialized tier	98	24	30	86	30	26	45	95	54

Build · RAG