AI Glossary: RAG, Retrieval & Knowledge
RAG architecture has become fundamental to AI products, enabling grounded, up-to-date, and domain-specific responses.
RAG (Retrieval-Augmented Generation)
Enhancing LLM responses by retrieving relevant external information before generating output. Addresses outdated training data, hallucinations, and domain-specific knowledge gaps. More cost-effective than fine-tuning; allows real-time knowledge updates.
Vector Database
Specialized database for storing and searching high-dimensional embeddings using similarity search. Backbone of RAG systems, enabling semantic search across millions of documents with sub-second queries. Popular options: Pinecone, Qdrant, Milvus, Weaviate.
Reference: Johnson, J., Douze, M., & Jégou, H., "Billion-scale similarity search with GPUs" (FAISS), IEEE Transactions on Big Data, 2019
Additional: Douze, M. et al., "The Faiss library", 2024
Embeddings (for Search)
Dense numerical representations capturing semantic meaning, where similar concepts are mathematically close. Enable semantic search finding "automobile repair" when searching for "car mechanic" even without keyword overlap.
Semantic Search
Search understanding meaning and intent rather than just matching keywords. Uses embeddings to find documents whose meaning is closest to queries. Modern RAG combines semantic with keyword (BM25) search in "hybrid search" configurations.
Chunking
Breaking documents into smaller pieces for embedding and retrieval. Strategies: fixed-length (500 tokens with overlap), sentence-based, semantic (grouping related content). Chunk size affects retrieval quality—poor chunking undermines entire RAG systems.
Reference: ⚠️ No single authoritative foundational paper — practical engineering technique evolved from IR practices.
Recent: Li et al., "Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception", 2024
Retrieval Pipeline
End-to-end system processing queries, retrieving information, and preparing LLM context. Typical flow: query processing, retrieval (vector + keyword search), re-ranking, context assembly. Pipeline design significantly impacts RAG performance.
Reference: ⚠️ No single authoritative foundational paper — general architectural concept in Information Retrieval.
Reference: Manning, Raghavan, & Schütze, "Introduction to Information Retrieval", Cambridge University Press, 2008
Grounding
Anchoring responses in retrieved factual information rather than relying solely on parametric knowledge. Grounded responses cite sources, reducing hallucinations. Enterprise systems emphasize grounding for compliance and trustworthiness.
This glossary is part of a series covering AI and LLM concepts for product designers.