LLM Internals

Learn through engaging narratives and interactive challenges

6 lessons available

Lessons

1

First Contact

A Large Language Model (LLM) is a neural network with billions of parameters, trained on massive text data. It generates text by predicting the next token (word piece) based on everything before it. It doesn't 'know' things — it recognizes patterns. Models range from 7B parameters (fast, cheap) to 400B+ (powerful, expensive).

2

Decoding the Signal

LLMs don't read words — they read tokens. Tokenization splits text into subword pieces using Byte Pair Encoding (BPE). 'Hello' is one token, but 'tokenization' might be split into ['token', 'ization']. Spaces often attach to the next word. This is why API pricing is per-token, and why context windows are measured in tokens (e.g., 128K tokens ≈ 300 pages).

3

Mapping ARIA's Meaning Space

After tokenization, each token is converted to an embedding — a vector (list of numbers) that captures its meaning. Similar meanings produce similar vectors. 'Happy' and 'joyful' have nearby vectors; 'happy' and 'database' are far apart. Cosine similarity measures this: 1.0 = identical, 0.0 = unrelated. This is the foundation of semantic search and RAG.

4

The Transmission Protocol

Every LLM API call is a JSON request with: 'model' (which LLM to use), 'messages' (array of role/content pairs — system sets behavior, user asks the question), 'temperature' (0=deterministic, 1=creative, 2=wild), and 'max_tokens' (response length limit). The system prompt is invisible to the end user but controls everything. This is the OpenAI-compatible format used by Moonshot, Deepseek, and many others.

5

Live Transmission

The same JSON request works everywhere — curl, Java HttpClient, Python requests, JavaScript fetch. This is the OpenAI-compatible REST API format, used by Moonshot, Deepseek, OpenAI, and many others. You POST to /v1/chat/completions with an Authorization header and JSON body. The response contains choices[0].message.content. Different providers, same format — just change the URL.

6

The Augmented Signal

Production AI systems don't send your raw question to the LLM. They enhance it: a system prompt sets persona and rules, retrieved documents provide context (RAG = Retrieval Augmented Generation), and your question comes last. The user types 'What's our refund policy?' but the real prompt includes the entire policy document. RAG uses embeddings (lesson 02) to find relevant docs, then injects them into the prompt. This is how AI chatbots answer about YOUR data without retraining the model.

Your Progress

0 of 6 lessons completed