What is HybridRAG?

Did you miss RAG? Retrieval-Augmented Generation (RAG) is continually expanding as one of the most popular methods to enhance LLM with external knowledge. But it still remains challenging to use original RAG in specific domains like financial one, with specialized language and complicated formats in documents. A new HybridRAG approach was made to address this problem. It combines two ways of fetching relevant information — one based on similarity (VectorRAG) and one based on structured relationships (GraphRAG) — resulting in more accurate and contextually rich answers. Tests show that HybridRAG is especially useful in fields like finance where both data formats and relationships matter. It also has a potential to be useful beyond just finance. Curious if that’s what you might need? Let’s discuss why the HybridRAG method can be a relevant solution for handling specific tasks.

In today’s episode, we will cover:

Limitations of LLMs and existing RAG systems for financial sector
Here comes HybridRAG
How does HybridRAG work?
Is HybridRAG really good?
Advantages of HybridRAG
Not without limitations
Conclusion
Clarification of terms: different HybridRAGs
Bonus: Resources

Limitations of LLMs and existing RAG systems for financial sector

The financial sector relies on various sources like news articles and earnings reports to make investment decisions and predictions. As these documents are often disorganized, traditional analysis struggles to make sense of them.

Basically, LLMs help to deal with large amounts of data for tasks like trend predictions or report generation. But when it comes to specialized language and complex structures, LLM alone can’t handle it well.

VectorRAG, or just RAG, is a common method that addresses LLMs’ limitations. It searches for similar chunks of text in an external database to provide context for generating answers. Despite this, VectorRAG struggles with capturing the structure and relationships within the data.

To organize data into entities and their relationships, Knowledge graphs (KGs) are used. GraphRAG simplifies building and maintaining KGs with large datasets, combining them with RAG for more accurate answers. But again, GraphRAG has its own limitations – it struggles with questions that don't directly mention relevant entities.

What if we marry VectorRAG and GraphRAG?

Here comes HybridRAG

Bingo! Researchers from NVIDIA and BlackRock followed a popular strategy to address limitations of all previous methods and made a hybrid system. They combined VectorRAG and GraphRAG to pull relevant information from both text databases and knowledge graphs and get advantages of both methods in one approach.

Their innovative method, formalized in the paper HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction and published in August, makes analyzing financial documents more effective. Let’s explore it in details.

How does HybridRAG work?

As HybridRAG is a combination of two approaches we'll firstly explore how each part of HybridRAG works and then discuss the whole system.

VectorRAG

VectorRAG works by combining information from external documents with the knowledge already present in a language model. Here’s how it works:

Image Credit: VectorRAG, HybridRAG paper

Query: You start with a query (a question or search term).
Search: The system looks through external documents that weren’t part of the model’s original training, using a vector database. This database breaks down the documents into smaller chunks and stores them in a way that makes them easy to search.
Processing chunks: It finds and retrieves the chunks of text most similar to the query.
Generating response: The language model then uses these retrieved chunks along with its own knowledge to generate a response.

By pulling in outside information, VectorRAG ensures that the answers it generates are more up-to-date and relevant. The retrieved context helps the model provide more detailed and accurate responses than it could with its training data alone.

GraphRAG

GraphRAG works similar to VectorRAG but retrieves information by searching a Knowledge Graph (KG) instead of text-based documents. Here’s how it works step by step:

Image Credit: GraphRAG, HybridRAG paper

Query: The system takes a user’s question.
Search the KG: The system looks through the KG, which is a structured map of entities, or nodes, (such as companies, people, or products) and their relationships, or edges (like "owns" or "works at").
Subgraph creation: A small part of the graph (a subgraph) is retrieved based on the query.
Context and response: The retrieved subgraph is then processed and encoded into a format that the language model can understand. It is combined with the language model’s existing knowledge to generate an accurate, context-aware response.

If the query is about a specific entity (e.g., a particular company), additional metadata is used to filter the information and retrieve only the relevant parts, ensuring the answer is focused.

By using structured data from the KG, GraphRAG provides answers that are more accurate and grounded in well-organized, verifiable information compared to purely text-based retrieval systems.

HybridRAG combines these two methods, VectorRAG and GraphRAG, to improve how information is retrieved and used. Here's its working process:

Combining methods:
- VectorRAG retrieves information by looking for similar text, offering broad context.
- GraphRAG pulls structured information from a KG, focusing on relationships between entities.
Stronger together:
- By combining these two, HybridRAG uses the strengths of both: broad, similarity-based retrieval from VectorRAG and detailed, relationship-based data from GraphRAG.
Generating Responses:
- The system takes this combined context and uses it to generate more accurate and context-rich responses.

This approach makes the final answers more detailed and relevant by leveraging both text-based and structured data.

Is HybridRAG really good?

To explore how good HybridRAG is and how it performance differs from VectorRAG and GraphRAG researchers evaluated these three types of RAG on four metrics. Evaluation results show some clear differences in their performance:

Faithfulness (how accurately the answer reflects the provided context): GraphRAG and HybridRAG scored the highest (0.96), while VectorRAG was slightly lower at 0.94.
Answer relevance (how well the answer addresses the original question): HybridRAG performed best with a score of 0.96, followed by VectorRAG (0.91) and GraphRAG (0.89).
Context precision (how well the retrieved information matches the correct answer): GraphRAG had the highest score (0.96), while VectorRAG (0.84) and HybridRAG (0.79) scored lower.
Context recall (how well the retrieved context aligns with the ground truth answer): VectorRAG and HybridRAG both scored a perfect 1, but GraphRAG was lower at 0.85.

Image Credit: Turing Post

While there may be some trade-offs in context precision, HybridRAG shows itself as the most balanced approach, excelling in faithfulness, answer relevance, and context recall.

Advantages of HybridRAG

These HybridRAG features ensure that it’s a promising solution for handling complex specialized domains:

Combining broad and structured information: HybridRAG brings together the wide context from VectorRAG (which retrieves similar text) and the detailed, structured data from GraphRAG (which focuses on relationships between entities).
Improving accuracy: By using both methods, HybridRAG generates more accurate and contextually rich answers, especially useful for complex queries.
Handling complex information: It excels at answering questions involving complicated, domain-specific information, like financial or technical documents.
Generating reliable responses: HybridRAG reduces errors or incomplete answers, ensuring the final response is well-grounded in data.

Not without limitations

Despite demonstrating significant advantages, HybridRAG as a complex systems has some limitations, including:

Influence of data order: Since the combined context is larger, it can affect the accuracy of the generated response. The order in which the context is added matters. If the answer comes from the GraphRAG context, it may be less precise because that context is added last, while answers generated from the VectorRAG context tend to be more precise as it comes first.
Balancing both methods: Integrating results from both VectorRAG and GraphRAG can sometimes result in conflicts or redundancy, making it difficult to balance the strengths of both approaches for optimal answers.
Increased complexity: Combining both VectorRAG and GraphRAG makes the system more complex, which can lead to higher processing time and computational costs
KG maintenance: The Knowledge Graph (KG) component requires regular updates and maintenance to ensure that the structured data stays accurate and up-to-date.
Limited to available data: If the relevant information is not present in the vector database or KG, HybridRAG may struggle to provide accurate responses, especially for highly specialized queries.

Implementation and potential of HybridRAG

The main purpose of developing HybridRAG was to address limitations of the original RAG approach in handling financial sector tasks. The research shows that HybridRAG can be good at financial analysis, as it can understand specialized language and complicated formats in documents. This could make financial analysis more accessible to a broader range of people.

The potential impact of this research goes beyond just financial tasks. As HybridRAG can handle a wide range of input data, researchers can expand its capabilities in this way.

For example, HybridRAG potentially can handle numerical data for better financial numbers analysis. Researchers plan to explore integrating real-time financial data to make the system even more useful in dynamic financial environments.

We also think, that handling different types of inputs open the path to other specific sectors, like engineering or legal domain.

Clarification of terms: different HybridRAGs

HybridRAG is a term that’s gaining traction in various AI applications, but it can be confusing because different implementations use it in slightly different ways. To help our readers navigate the evolving landscape of this technology, here’s a brief overview of the three major uses of HybridRAG and how they differ:

HybridRAG (Integrating Knowledge Graphs and Vector Retrieval - we discussed it above): Combines VectorRAG and GraphRAG to improve retrieval accuracy, especially in domain-specific tasks like financial document analysis. It merges vector database retrieval with structured data from knowledge graphs, enhancing both retrieval and generation stages.
Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning: Focuses on improving reasoning for complex queries by using a combination of vector-based and knowledge graph retrieval. It enhances tasks that require multi-hop reasoning and numerical calculations by leveraging structured and unstructured data (the paper is here).
Hybrid RAG (NVIDIA AI Workbench): This version focuses on infrastructure, combining local and cloud computational resources. It performs retrieval on local systems while utilizing remote GPUs for model inference, offering a scalable solution for hybrid RAG applications across various domains (the blog is here).

Resources (links to papers)

HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction
Original RAG models combine a pre-trained language model (parametric memory) with an external knowledge source (non-parametric memory), like a dense vector index of documents such as Wikipedia. RAG retrieves relevant information from this source during generation, enhancing the model's accuracy and specificity.
Graph RAG approach presented by Microsoft organizes data into a graph structure, representing text data and its interrelations. Graph RAG is a valuable addition to RAG systems to handle query-focused summarization at scale.
LongRAG is an improved version of RAG model, which processes larger text units (4,000 tokens instead of 100 words), reducing the number of units to search through. This "long retriever" and "long reader" approach enhances accuracy and performance in extracting answers from large texts without extra training.
Self-RAG (Self-Reflective approach) allows the model to retrieve and reflect on information only when needed. It outperforms other models like ChatGPT in tasks requiring reasoning and fact-checking.
Corrective RAG (CRAG) uses an external retrieval evaluator to refine the quality of retrieved documents. It selectively focuses on key information, enhancing the accuracy and robustness of generated content.
EfficientRAG efficiently handles multi-hop questions by generating new queries without needing LLMs at each step and filtering out irrelevant information.
Golden-Retriever is a RAG model that uses reflection-based question augmentation to handle domain-specific jargon and context in industrial knowledge bases, ensuring the retrieval of the most relevant documents.
Adaptive RAG for conversational systems instead of always retrieving external knowledge, assesses the conversation context and decides if RAG is necessary. This approach improves response quality by only using RAG when beneficial, leading to more accurate and confident answers.
Modular RAG is an advanced framework that breaks down complex RAG systems into independent modules and specialized components. Unlike traditional RAG's simple "retrieve-then-generate" process, Modular RAG offers flexible and customizable configurations like routing, scheduling, and combining processes.
Speculative RAG combines two types of LMs: a smaller, specialized LM for producing multiple drafts in parallel, and a larger generalist LM that verifies these drafts to find the best answer. It enhances both effectiveness and speed of the system.
RankRAG is a framework that trains the model to both rank relevant contexts and use it to answer questions. It excels at knowledge-intensive tasks.
Multi-Head RAG uses different parts of the model’s attention mechanism to capture various aspects of a query, making it easier to find and use relevant information. It improves retrieval accuracy, especially for complex queries.

How did you like it?

Share this article with three friends and get a 1-month subscription for free! 🤍