What matters most for this initiative?
Pick a priority profile — the ranking re-weights live. There's no universally best model, only the best fit for your constraints. Fine-tune the criterion weights below, or inspect the full matrix at the bottom.
Open-weights — small (self-hosted / edge)
Best fitCheap, fast, private — for narrower tasks.
Best for — On-prem or edge deployment, cost-critical volume, and well-scoped tasks you can fine-tune for.
Watch out — Limited raw reasoning; usually needs fine-tuning or tight retrieval to hit quality.
Fine-tuned small specialist
An open base trained for one repeatable task.
Best for — A narrow, high-volume task you can train for — often the best $/quality once it's dialed in.
Watch out — Upfront labeled data and a training/eval loop; brittle outside its trained domain.
Open-weights — large (self-hosted)
Strong model you run inside your own boundary.
Best for — Regulated or sovereign data, deep customization, and avoiding a per-token vendor bill at scale.
Watch out — Real GPU + MLOps burden — you own capacity, uptime, patching, and evals.
Multi-model router
A gateway that sends each query to the cheapest model that can handle it.
Best for — Mixed workloads — route easy questions to a cheap model and hard ones to a flagship, optimizing cost and quality together.
Watch out — You own the routing logic, evals, and fallbacks; a misroute quietly costs quality or money.
Frontier hosted — fast / mini
Most of the quality, a fraction of the cost and latency.
Best for — High-volume, latency-sensitive workloads that still need solid reasoning — the workhorse default.
Watch out — Lower ceiling on the genuinely hard queries; still hosted, so residency and lock-in remain.
Regional / sovereign hosted
Managed convenience with a residency guarantee.
Best for — Meeting data-residency rules without running infrastructure yourself.
Watch out — Smaller model menu, still a managed dependency, and partial lock-in.
Frontier hosted — flagship
Top of the capability curve, via a managed API.
Best for — The hardest reasoning, long-document synthesis, and agentic chains where quality is non-negotiable.
Watch out — Priciest per call, data leaves your boundary, and you inherit vendor lock-in and rate limits.
Multimodal generalist
Handles text plus images, scans, charts, and audio in one pipeline.
Best for — Inputs beyond text — scanned documents, screenshots, diagrams, or audio that retrieval needs to read.
Watch out — Pays a premium for vision/audio; text-only quality can trail a text-specialized peer at the same price.
Reasoning-specialized tier
Spends more compute to think through hard, multi-step problems.
Best for — Genuinely hard analysis, planning, math, or code where a slower, deeper answer is worth it.
Watch out — Slowest and priciest per answer — overkill and frustrating for simple lookups.
Recommended
For: Balanced
Open-weights — small (self-hosted / edge)
Leads on Data residency & control and Portability for this profile. Watch its operational simplicity.
Pick an engine to lock it in
Choose, then revisit
How weighting changes the pick
See each candidate's profile, and exactly which criteria are driving the leader's fit right now.
Capability profiles
The shape of the top three candidates across all criteria. A bigger, rounder shape is a stronger all-rounder; a spiky shape is a specialist.
What drives Open-weights — small (self-hosted / edge)’s fit
Each bar is how much a criterion contributes to the 78-point fit right now (weight × score). Change a weight or profile and the bars re-proportion live — that’s the lever turning priorities into a pick.
Bar length = share of the fit it drives; color = how strong this model is on that axis (green strong → red weak). A long red bar is a heavily-weighted weakness; a long green bar is why it’s winning.
Full comparison matrix
Every candidate across every criterion (higher is better; cost, latency and ops are already inverted).
| Model | Capability & quality | Cost efficiency | Speed / latency | Context headroom | Data residency | Portability | Fine-tuning | Ops simplicity | Fit |
|---|---|---|---|---|---|---|---|---|---|
| Open-weights — small (self-hosted / edge) | 60 | 90 | 86 | 56 | 96 | 95 | 92 | 46 | 78 |
| Fine-tuned small specialist | 70 | 88 | 86 | 52 | 92 | 84 | 98 | 40 | 76 |
| Open-weights — large (self-hosted) | 85 | 60 | 50 | 72 | 96 | 95 | 96 | 30 | 73 |
| Multi-model router | 88 | 80 | 68 | 80 | 38 | 62 | 64 | 54 | 67 |
| Frontier hosted — fast / mini | 80 | 76 | 92 | 76 | 30 | 28 | 50 | 96 | 66 |
| Regional / sovereign hosted | 75 | 56 | 66 | 70 | 80 | 45 | 50 | 86 | 66 |
| Frontier hosted — flagship | 96 | 32 | 55 | 88 | 30 | 25 | 55 | 96 | 60 |
| Multimodal generalist | 84 | 46 | 56 | 82 | 30 | 30 | 50 | 92 | 59 |
| Reasoning-specialized tier | 98 | 24 | 30 | 86 | 30 | 26 | 45 | 95 | 54 |