Mapping ARIA's Meaning Space
Concept:
After tokenization, each token is converted to an embedding โ a vector (list of numbers) that captures its meaning. Similar meanings produce similar vectors. 'Happy' and 'joyful' have nearby vectors; 'happy' and 'database' are far apart. Cosine similarity measures this: 1.0 = identical, 0.0 = unrelated. This is the foundation of semantic search and RAG.
Science Officer Chen:
Commander, I've made a breakthrough. I've mapped how ARIA organizes concepts internally. It's not a dictionary โ it's a space. A vast, multi-dimensional space.
Commander Vega:
A space? Explain.
Science Officer Chen:
Every token gets converted into a vector โ a list of hundreds of numbers. Think of it as coordinates. 'Star' and 'sun' have similar coordinates โ they're neighbors in ARIA's meaning space. But 'star' and 'breakfast' are galaxies apart.
Commander Vega:
So similar meanings are close together?
Science Officer Chen:
Exactly. We measure closeness with cosine similarity โ a score from -1 to 1. 'Cat on the windowsill' and 'kitten sleeping on the couch' score 0.89 โ almost the same meaning. 'Cat' and 'stock market crash' score 0.04 โ completely unrelated.
Commander Vega:
This is how it understands what we mean, even when we use different words.
Science Officer Chen:
Precisely. And it gets more fascinating. Remember from lesson 00 how ARIA responded to your question? It used these vectors to understand your meaning. And in lesson 05, we'll see how this powers something called RAG โ finding relevant documents by meaning, not keywords.
Commander Vega:
Show me the meaning map. Let me compare some concepts.
Example Code:
Sentence A: "The cat sat on the warm windowsill"
Sentence B: "A kitten was sleeping on the couch"
Cosine similarity: 0.8966
Very similar! Both describe a small feline resting.
Sentence A: "The cat sat on the warm windowsill"
Sentence C: "The stock market crashed on Monday"
Cosine similarity: 0.0411
Very different meanings.
Your Assignment
Compare two sentences by entering their numbers (e.g., '0 3'). Type 'list' first to see all available sentences, then pick two and see how similar the LLM thinks they are.
Llm Console