AI Glossary

AI Glossary: RAG, Retrieval & Knowledge

Fearghal

21 May 2026 — 2 min read

RAG architecture has become fundamental to AI products, enabling grounded, up-to-date, and domain-specific responses.

RAG (Retrieval-Augmented Generation)

Enhancing LLM responses by retrieving relevant external information before generating output. Addresses outdated training data, hallucinations, and domain-specific knowledge gaps. More cost-effective than fine-tuning; allows real-time knowledge updates.

Reference: Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks", NeurIPS 2020

Vector Database

Specialized database for storing and searching high-dimensional embeddings using similarity search. Backbone of RAG systems, enabling semantic search across millions of documents with sub-second queries. Popular options: Pinecone, Qdrant, Milvus, Weaviate.

Reference: Johnson, J., Douze, M., & Jégou, H., "Billion-scale similarity search with GPUs" (FAISS), IEEE Transactions on Big Data, 2019
Additional: Douze, M. et al., "The Faiss library", 2024

Embeddings (for Search)

Dense numerical representations capturing semantic meaning, where similar concepts are mathematically close. Enable semantic search finding "automobile repair" when searching for "car mechanic" even without keyword overlap.

Reference: Mikolov, T., Chen, K., Corrado, G., & Dean, J., "Efficient Estimation of Word Representations in Vector Space" (Word2Vec), 2013

Semantic Search

Search understanding meaning and intent rather than just matching keywords. Uses embeddings to find documents whose meaning is closest to queries. Modern RAG combines semantic with keyword (BM25) search in "hybrid search" configurations.

Reference: Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., & Harshman, R., "Indexing by Latent Semantic Analysis", JASIS 41(6), 1990

Chunking

Breaking documents into smaller pieces for embedding and retrieval. Strategies: fixed-length (500 tokens with overlap), sentence-based, semantic (grouping related content). Chunk size affects retrieval quality—poor chunking undermines entire RAG systems.

Reference: ⚠️ No single authoritative foundational paper — practical engineering technique evolved from IR practices.
Recent: Li et al., "Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception", 2024

Retrieval Pipeline

End-to-end system processing queries, retrieving information, and preparing LLM context. Typical flow: query processing, retrieval (vector + keyword search), re-ranking, context assembly. Pipeline design significantly impacts RAG performance.

Reference: ⚠️ No single authoritative foundational paper — general architectural concept in Information Retrieval.
Reference: Manning, Raghavan, & Schütze, "Introduction to Information Retrieval", Cambridge University Press, 2008

Grounding

Anchoring responses in retrieved factual information rather than relying solely on parametric knowledge. Grounded responses cite sources, reducing hallucinations. Enterprise systems emphasize grounding for compliance and trustworthiness.

Reference: Jacovi, A., Wang, A., Alberti, C. et al. (Google DeepMind), "The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input", 2024

This glossary is part of a series covering AI and LLM concepts for product designers.