Article
AI Zero to Hero (Part 2): Fixing Hallucinations with RAG
Why do AIs lie? How can we give them memory? A practical guide to Context Windows, Vector Databases, and Retrieval-Augmented Generation (RAG).
Retrieval-Augmented Generation (RAG) is a technique that gives an AI model access to external, private, or up-to-date information before it answers a question. Instead of relying solely on what it learned during training, the AI "reads" relevant documents you provide and uses them to formulate an accurate response.
In Part 1, we learned that LLMs are essentially advanced autocompletes predicting the next word. Because they are just predicting text, they don't actually know what is true. If they don't have the right information in their training data, they will confidently invent plausible-sounding nonsense to finish the sentence. This is called a hallucination.
If you are building enterprise software, hallucinations are unacceptable. You can't have a customer service bot inventing new return policies. So, how do we fix this?
The Context Window
Every time you talk to an LLM, you are placing text into its Context Window. This is the AI's short-term memory. If you paste a 10-page PDF into the chat box and ask "Summarize this," you are putting that PDF into the context window. The AI can read it perfectly and won't hallucinate the summary.
So, why not just put your company's entire database into the context window every time you ask a question?
- Cost: You are billed per token. Sending millions of tokens for every single question would bankrupt you.
- Speed: Processing massive amounts of text takes time.
- Limits: Even the best models have hard limits on their context windows (e.g., 128k or 200k tokens).
We need a way to only send the relevant parts of our database to the AI.
Enter the Vector Database
Remember embeddings from Part 1? They are lists of numbers that represent the mathematical "meaning" of text.
When you want to build an AI app that knows your company's data, you don't use a normal SQL database. You use a Vector Database.
- You take all your company's documents and chop them into small paragraphs.
- You convert each paragraph into an embedding (numbers).
- You save these numbers into a Vector Database.
Unlike SQL, which looks for exact keyword matches (e.g., SELECT * WHERE text LIKE '%vacation%'), a Vector Database searches by meaning. It calculates the distance between the numbers to find concepts that are mathematically close to each other.
Putting it Together: The RAG Workflow
RAG stands for Retrieval-Augmented Generation. Here is how you build a production AI app that doesn't hallucinate:
- User Asks: The user types, "What is our policy on remote work?"
- Embed the Query: Your app converts that question into an embedding (numbers).
- Retrieve: Your app queries the Vector Database: "Find the paragraphs in our database that are mathematically closest in meaning to this question."
- Augment: The database returns the top 3 most relevant paragraphs (e.g., a snippet from the employee handbook about working from home).
- Generate: You secretly paste those 3 paragraphs into the LLM's Context Window along with the user's original question, and add a prompt: "Answer the user's question using ONLY the provided context."
The LLM now acts as a reading comprehension engine rather than a memory engine. It reads the facts you gave it and generates a perfect, hallucination-free answer.
In Part 3, we'll look at how to take the training wheels off. What if the AI doesn't just read documents, but actually clicks buttons, runs code, and takes actions? Welcome to the world of Agentic AI.
Discussion
Keep the conversation going
Log in to join the discussion.
No comments yet. The first thoughtful reply can set the tone for the whole thread.