Stage 03 · Build — Under the Hood

Under the Hood: model internals

A lightweight, optional explanation of transformers, attention, embeddings, and ML frameworks — and where they fit in an enterprise AI lifecycle. This is explanation, not implementation: the Command Center operates above the raw model layer.

This Command Center focuses on enterprise AI delivery decisions, not low-level model training. This optional layer explains the model concepts underneath the lifecycle: how transformers use attention, how embeddings support retrieval, how fine-tuning changes behavior, and where frameworks like PyTorch or TensorFlow fit in real AI engineering.

Architecture

What is a transformer?

The architecture behind most modern LLMs. It converts tokens into contextual representations and uses attention to estimate which parts of the input matter most when generating the next output.

1Tokens. Text is split into pieces the model can process.
2Embeddings. Tokens are converted into numeric representations.
3Attention. The model weighs relationships between tokens.
4Layers. Multiple transformer layers refine context.
5Output probabilities. The model predicts likely next tokens.
6Generated response. The response is assembled token by token.

This product does not implement a transformer. It shows how enterprise teams evaluate, operate, govern, and measure systems built with models that often use transformer architectures.

Inside the context window

What does attention do?

Attention helps a model decide which parts of the input are most relevant to the current generation step — assigning different weights to different tokens based on context.

Example question

“Can I reimburse travel expenses after 30 days?”

More relevant tokens

reimbursetravel expensesafter 30 dayspolicy dateexception rule

Less relevant

filler wordsunrelated sectionsretired policy refs

Attention is not retrieval. Attention helps the model use information already inside its context window. Retrieval decides which external evidence enters the context window in the first place.

Concept	What it does	Where this product shows it
Attention	Weighs relationships inside the model context	Under the Hood
Retrieval	Selects external evidence before generation	Build/RAG
Re-ranking	Improves which retrieved evidence is sent forward	Retrieval modes
Evaluation	Tests whether the final answer is grounded and correct	Evaluations

Semantic space

What are embeddings?

Numeric representations of text or documents. Similar meanings sit closer together in embedding space — which is why embeddings power semantic search and RAG.

In this portfolio demo, vector retrieval uses deterministic local representations to demonstrate retrieval behavior without an external embedding API or vector database. In production, embeddings might come from OpenAI, MiniLM, Voyage, or Cohere models and be stored in a vector database. See the embedding projector and retrieval modes.

Method	Good for	Limitation
BM25	Exact keyword and term matching	Misses semantic matches
Embeddings	Semantic similarity	Can retrieve vague/broad neighbors
Hybrid	Lexical + semantic balance	Needs tuning
Re-ranking	Final, governance-aware ordering	Adds latency and complexity

Decisioning

Why RAG often comes before fine-tuning

For enterprise knowledge workflows, answers must be grounded in current, approved sources. Fine-tuning changes behavior/tone/task performance — but doesn't solve source freshness, citations, or policy version control.

Need	Better first choice	Why
Current policy answers	RAG	Sources change and citations matter
Consistent tone/format	Prompting or fine-tuning	Behavior pattern is stable
Domain classification	Fine-tuning or traditional ML	Labeled examples can teach the task
Evidence-backed answers	RAG	Retrieval provides source grounding
Multi-step action workflow	Agent/tools + governance	Needs permissions, logs, approval

Fine-tuning can be valuable, but it raises requirements around labeled data, splits, overfitting, regression, monitoring, and governance — which is why Training Readiness evaluates it before recommending it.

Framework placement

Where PyTorch and TensorFlow fit

ML frameworks used to build, train, fine-tune, and experiment with models. They sit below this Command Center's operating layer.

This product does not use PyTorch or TensorFlow directly — it is a static portfolio demo focused on lifecycle decisions, not GPU-backed training. In production, outputs here could connect to ML workflows using PyTorch, TensorFlow, Hugging Face, MLflow, W&B, SageMaker, Vertex AI, or Azure ML.

Layer	Examples	Role
Model development	PyTorch, TensorFlow, JAX	Build / train / fine-tune models
Experiment tracking	MLflow, W&B	Track runs, metrics, artifacts
Model registry	MLflow, SageMaker, Vertex AI	Version and promote models
Retrieval infrastructure	Pinecone, Weaviate, pgvector, Milvus	Store / search embeddings
Observability	Arize, WhyLabs, LangSmith	Monitor drift, cost, quality
AI Command Center	This product	Coordinate decisions, readiness, governance, value

Explanatory content only — none of these tools are dependencies of this product.

Tie-back

How model internals connect to the lifecycle

Stage	Model-internals relevance
Strategy	Choose prompting, RAG, tools, fine-tuning, traditional ML, or hybrid
Data	Prepare RAG corpus, eval datasets, training data, labels, metadata, telemetry
Build/RAG	Configure retrieval, embeddings, prompts, evals, and model behavior
Operate	Monitor latency, cost, drift, regression, incidents, rollback
Govern	Require evidence, controls, human review, auditability, risk decisions
Realize	Translate model quality + operational risk into adoption, ROI, leakage

For reviewers

What this layer demonstrates

Transformer awareness

Understands the architecture behind modern LLMs without pretending to rebuild one.

Attention vs retrieval

Separates what happens inside the context window from how external evidence is selected.

Embedding literacy

Connects embeddings to semantic search, vector/hybrid retrieval, and RAG quality.

Fine-tuning judgment

Picks RAG, prompting, fine-tuning, ML, or hybrid by use case — not by default.

Framework placement

Shows where PyTorch/TensorFlow belong without becoming a training platform.

Lifecycle integration

Ties model concepts to evaluation, operations, governance, and value.

Model internals boundary

This product does not train a transformer, implement an attention layer, or run PyTorch/TensorFlow workloads. It explains these concepts only where they affect enterprise AI decisions. The Command Center operates at the lifecycle layer: deciding what to build, what data is ready, how the system is evaluated, operated, and governed, and whether it creates business value.