Retrieval-Augmented Generation (RAG) is a very popular technique used in LLMs to improve the accuracy and relevance of their responses. Instead of relying solely on the information stored within the model, RAG retrieves relevant external documents or data during the response generation process. This leads to more accurate and contextually appropriate answers, especially for tasks needing specific or up-to-date knowledge.
Here is a list of 12 types of RAG that can be used for different purposes:
Original RAG models combine a pre-trained language model (parametric memory) with an external knowledge source (non-parametric memory), like a dense vector index of documents such as Wikipedia. RAG retrieves relevant information from this source during generation, enhancing the model's accuracy and specificity. β Read more
Graph RAG approach presented by Microsoft organizes data into a graph structure, representing text data and its interrelations. Graph RAG is a valuable addition to RAG systems to handle query-focused summarization at scale. β Read more
LongRAG is an improved version of RAG model, which processes larger text units (4,000 tokens instead of 100 words), reducing the number of units to search through. This "long retriever" and "long reader" approach enhances accuracy and performance in extracting answers from large texts without extra training. β Read more
Self-RAG (Self-Reflective approach) allows the model to retrieve and reflect on information only when needed. It outperforms other models like ChatGPT in tasks requiring reasoning and fact-checking. β Read more
Corrective RAG (CRAG) uses an external retrieval evaluator to refine the quality of retrieved documents. It selectively focuses on key information, enhancing the accuracy and robustness of generated content. β Read more
EfficientRAG efficiently handles multi-hop questions by generating new queries without needing LLMs at each step and filtering out irrelevant information. β Read more
Golden-Retriever is a RAG model that uses reflection-based question augmentation to handle domain-specific jargon and context in industrial knowledge bases, ensuring the retrieval of the most relevant documents. β Read more
Adaptive RAG for conversational systems instead of always retrieving external knowledge, assesses the conversation context and decides if RAG is necessary. This approach improves response quality by only using RAG when beneficial, leading to more accurate and confident answers. β Read more
Modular RAG is an advanced framework that breaks down complex RAG systems into independent modules and specialized components. Unlike traditional RAG's simple "retrieve-then-generate" process, Modular RAG offers flexible and customizable configurations like routing, scheduling, and combining processes. β Read more
Speculative RAG combines two types of LMs: a smaller, specialized LM for producing multiple drafts in parallel, and a larger generalist LM that verifies these drafts to find the best answer. It enhances both effectiveness and speed of the system. β Read more
RankRAG is a framework that trains the model to both rank relevant contexts and use it to answer questions. It excels at knowledge-intensive tasks. β Read more
Multi-Head RAG uses different parts of the modelβs attention mechanism to capture various aspects of a query, making it easier to find and use relevant information. It improves retrieval accuracy, especially for complex queries. β Read more