Retrieval-Augmented Generation (RAG)

RAG is a context engineering pattern that augments LLM prompts with passages retrieved at inference time from a vector store, search index, or knowledge base. RAG keeps facts outside the model and is one of the most widely used context engineering techniques.

API entry from apis.yml

apis.yml Raw ↑
aid: context-engineering:retrieval-augmented-generation
name: Retrieval-Augmented Generation (RAG)
description: RAG is a context engineering pattern that augments LLM prompts with passages retrieved at
  inference time from a vector store, search index, or knowledge base. RAG keeps facts outside the model
  and is one of the most widely used context engineering techniques.
humanURL: https://arxiv.org/abs/2005.11401
baseURL: https://arxiv.org
tags:
- Embeddings
- Knowledge Base
- RAG
- Retrieval
properties:
- type: Specification
  url: https://arxiv.org/abs/2005.11401
- type: Reference
  url: https://docs.llamaindex.ai/
- type: Reference
  url: https://python.langchain.com/docs/concepts/rag/
x-features:
- Pluggable retrievers over vector and keyword indexes
- Pre-retrieval rewriting and post-retrieval re-ranking
- Hybrid retrieval combining BM25 and dense vectors
- Citations and grounding for answer auditing
x-useCases:
- Domain-specific question answering over private documents
- Customer support agents with up-to-date knowledge
- Long-tail factual recall outside model training