Stage 03 · Build — Under the Hood
Under the Hood: model internals
A lightweight, optional explanation of transformers, attention, embeddings, and ML frameworks — and where they fit in an enterprise AI lifecycle. This is explanation, not implementation: the Command Center operates above the raw model layer.
This Command Center focuses on enterprise AI delivery decisions, not low-level model training. This optional layer explains the model concepts underneath the lifecycle: how transformers use attention, how embeddings support retrieval, how fine-tuning changes behavior, and where frameworks like PyTorch or TensorFlow fit in real AI engineering.
Architecture
What is a transformer?
The architecture behind most modern LLMs. It converts tokens into contextual representations and uses attention to estimate which parts of the input matter most when generating the next output.
- 1Tokens. Text is split into pieces the model can process.
- 2Embeddings. Tokens are converted into numeric representations.
- 3Attention. The model weighs relationships between tokens.
- 4Layers. Multiple transformer layers refine context.
- 5Output probabilities. The model predicts likely next tokens.
- 6Generated response. The response is assembled token by token.
This product does not implement a transformer. It shows how enterprise teams evaluate, operate, govern, and measure systems built with models that often use transformer architectures.
Inside the context window
What does attention do?
Attention helps a model decide which parts of the input are most relevant to the current generation step — assigning different weights to different tokens based on context.
Example question
“Can I reimburse travel expenses after 30 days?”
More relevant tokens
Less relevant
Attention is not retrieval. Attention helps the model use information already inside its context window. Retrieval decides which external evidence enters the context window in the first place.
| Concept | What it does | Where this product shows it |
|---|---|---|
| Attention | Weighs relationships inside the model context | Under the Hood |
| Retrieval | Selects external evidence before generation | Build/RAG |
| Re-ranking | Improves which retrieved evidence is sent forward | Retrieval modes |
| Evaluation | Tests whether the final answer is grounded and correct | Evaluations |
Semantic space
What are embeddings?
Numeric representations of text or documents. Similar meanings sit closer together in embedding space — which is why embeddings power semantic search and RAG.
In this portfolio demo, vector retrieval uses deterministic local representations to demonstrate retrieval behavior without an external embedding API or vector database. In production, embeddings might come from OpenAI, MiniLM, Voyage, or Cohere models and be stored in a vector database. See the embedding projector and retrieval modes.
| Method | Good for | Limitation |
|---|---|---|
| BM25 | Exact keyword and term matching | Misses semantic matches |
| Embeddings | Semantic similarity | Can retrieve vague/broad neighbors |
| Hybrid | Lexical + semantic balance | Needs tuning |
| Re-ranking | Final, governance-aware ordering | Adds latency and complexity |
Decisioning
Why RAG often comes before fine-tuning
For enterprise knowledge workflows, answers must be grounded in current, approved sources. Fine-tuning changes behavior/tone/task performance — but doesn't solve source freshness, citations, or policy version control.
| Need | Better first choice | Why |
|---|---|---|
| Current policy answers | RAG | Sources change and citations matter |
| Consistent tone/format | Prompting or fine-tuning | Behavior pattern is stable |
| Domain classification | Fine-tuning or traditional ML | Labeled examples can teach the task |
| Evidence-backed answers | RAG | Retrieval provides source grounding |
| Multi-step action workflow | Agent/tools + governance | Needs permissions, logs, approval |
Fine-tuning can be valuable, but it raises requirements around labeled data, splits, overfitting, regression, monitoring, and governance — which is why Training Readiness evaluates it before recommending it.
Framework placement
Where PyTorch and TensorFlow fit
ML frameworks used to build, train, fine-tune, and experiment with models. They sit below this Command Center's operating layer.
This product does not use PyTorch or TensorFlow directly — it is a static portfolio demo focused on lifecycle decisions, not GPU-backed training. In production, outputs here could connect to ML workflows using PyTorch, TensorFlow, Hugging Face, MLflow, W&B, SageMaker, Vertex AI, or Azure ML.
| Layer | Examples | Role |
|---|---|---|
| Model development | PyTorch, TensorFlow, JAX | Build / train / fine-tune models |
| Experiment tracking | MLflow, W&B | Track runs, metrics, artifacts |
| Model registry | MLflow, SageMaker, Vertex AI | Version and promote models |
| Retrieval infrastructure | Pinecone, Weaviate, pgvector, Milvus | Store / search embeddings |
| Observability | Arize, WhyLabs, LangSmith | Monitor drift, cost, quality |
| AI Command Center | This product | Coordinate decisions, readiness, governance, value |
Explanatory content only — none of these tools are dependencies of this product.
Tie-back
How model internals connect to the lifecycle
| Stage | Model-internals relevance |
|---|---|
| Strategy | Choose prompting, RAG, tools, fine-tuning, traditional ML, or hybrid |
| Data | Prepare RAG corpus, eval datasets, training data, labels, metadata, telemetry |
| Build/RAG | Configure retrieval, embeddings, prompts, evals, and model behavior |
| Operate | Monitor latency, cost, drift, regression, incidents, rollback |
| Govern | Require evidence, controls, human review, auditability, risk decisions |
| Realize | Translate model quality + operational risk into adoption, ROI, leakage |
For reviewers
What this layer demonstrates
Transformer awareness
Attention vs retrieval
Embedding literacy
Fine-tuning judgment
Framework placement
Lifecycle integration
Model internals boundary
This product does not train a transformer, implement an attention layer, or run PyTorch/TensorFlow workloads. It explains these concepts only where they affect enterprise AI decisions. The Command Center operates at the lifecycle layer: deciding what to build, what data is ready, how the system is evaluated, operated, and governed, and whether it creates business value.