RAG By Pantheraa Space 18 Jun 2026 Updated 01 Aug 2026 1 min read

How RAG actually works (with code)

Retrieval-Augmented Generation grounds an LLM in your own data. The core loop in four steps — plus a tiny Python retriever.

RAG (Retrieval-Augmented Generation) grounds an LLM in your own data instead of relying only on what it memorized during training.

The core idea

Embed your documents into vectors.
Retrieve the most similar chunks for a query.
Augment the prompt with those chunks.
Generate an answer grounded in them.

Similarity is usually cosine similarity between embeddings:

$$ \text{sim}(a, b) = \frac{a \cdot b}{\lVert a \rVert \, \lVert b \rVert} $$

Here's a minimal retrieval loop in Python:

import numpy as np

def cosine(a, b):
    return a @ b / (np.linalg.norm(a) * np.linalg.norm(b))

def retrieve(query_vec, docs, k=3):
    scored = [(cosine(query_vec, d["vec"]), d) for d in docs]
    scored.sort(key=lambda x: x[0], reverse=True)
    return [d for _, d in scored[:k]]

The retrieved chunks get stuffed into the context window before generation. Simple — but it changes everything.

#rag #embeddings #vector-db

Written by

Pantheraa Space

Pantheraa Space is an India-based digital marketing & development agency. We build websites and custom software, run Google & Meta Ads, and grow your visibility with SEO, Google Business Profile and AI search optimization (AIO).

Want this done for your business?

Get a free audit

Keep reading

Google Business Profile

Local SEO Checklist for Indian Businesses (2026)

SEO

How Long Does SEO Take to Show Results? An Honest Timeline

Meta Ads

The core idea

Keep reading

Local SEO Checklist for Indian Businesses (2026)

How Long Does SEO Take to Show Results? An Honest Timeline

Facebook Ads Not Working? 7 Reasons Meta Ads Fail (and How to Fix Them)