Retrieval Augmented Generation (RAG)
An architecture that enhances LLM outputs by retrieving relevant information from external knowledge sources before generating responses.
Also known as: RAG, Retrieval-Augmented Generation
Category: Systems
Tags: ai, llm-architecture, knowledge-retrieval, vector-databases, embeddings
Explanation
Retrieval Augmented Generation (RAG) is an AI architecture that combines the generative capabilities of Large Language Models with external knowledge retrieval. It addresses a fundamental limitation of LLMs: their knowledge is frozen at training time and may be outdated, incomplete, or lack domain-specific information.
How RAG works:
1. **Query encoding**: The user's question is converted into an embedding (numerical representation)
2. **Retrieval**: The embedding is used to search a vector database for semantically similar content
3. **Context augmentation**: Retrieved documents are added to the prompt context
4. **Generation**: The LLM generates a response informed by both its training and the retrieved information
Key components:
- **Embedding model**: Converts text into vector representations
- **Vector database**: Stores and indexes document embeddings for fast similarity search
- **Retriever**: Finds the most relevant documents for a given query
- **Generator**: The LLM that produces the final response
Benefits of RAG:
- Access to current information beyond training cutoff
- Grounded responses with traceable sources
- Domain-specific knowledge without fine-tuning
- Reduced hallucinations through factual grounding
- Cost-effective compared to training custom models
RAG is particularly valuable for enterprise applications where accuracy, currency, and source attribution are critical.
Related Concepts
← Back to all concepts