Turing Post
Posts
Token 1.3: What is Retrieval-Augmented Generation (RAG)?

Token 1.3: What is Retrieval-Augmented Generation (RAG)?

we discuss the origins of RAG, what LLMs limitations it tries to fix, its architecture, and why it is so popular. Enjoy the collection of helpful links

Ksenia Se & Valeriia Kuka
October 04, 2023

One term that has become a buzzing topic recently is RAG.

What is it, and how you can utilize it to improve an LLM performance? Let’s dive in!

We discuss the origins of RAG, what LLMs limitations it tries to fix, RAG’s architecture, and why it is so popular. You will also get a curated collection of helpful links for your RAG experiments.

Introduction

Though circulating very actively lately, the term itself came in 2020, when researchers at Meta AI introduced it in their paper Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.

Retrieval-Augmented Generation (RAG) model is an architecture designed to harness the capabilities of large language models (LLMs) while providing the freedom to incorporate and update custom data at will. Unlike the resource-intensive process of constructing bespoke language models or repeatedly fine-tuning them whenever data updates occur, RAG offers a more streamlined and efficient approach for developers and businesses.

As you probably know, pre-trained language models undergo training using vast amounts of unlabeled text data in a self-supervised* manner. Consequently, these models acquire a significant depth of knowledge, leveraging the statistical relationships underlying the language data they have been trained on.

*Self-supervised learning uses unlabeled data to generate its own supervisory signal for training models.

This knowledge is encapsulated within the model's parameters, which can be harnessed to execute various language-related tasks without the need for external knowledge sources. This phenomenon is commonly referred to as a parameterized implicit knowledge base.

Although this parameterized implicit knowledge base is very impressive and allows the model to have a surprisingly good performance for some queries and tasks, this approach is still prone to errors and, so-called, hallucinations*.

*Hallucination in language models occurs when false information is generated and presented as true.

Why do errors happen in LLMs?

It is essential to recognize that LLMs do not possess a genuine understanding of language in the human sense. They rely on statistical patterns within the language they were trained on. Recent research has shown that no matter how much implicit knowledge a model has, it still has trouble with logical reasoning. While LLMs have achieved significant success in text generation, they still have problems using the data they already have, which often results in hallucinations.

How to deal with it? Surprisingly, introducing more external data. It can be used to expand or revise the model’s memory and as a base to assess and interpret its predictions. This is precisely what the Meta AI researchers implemented in the new type of models called RAG models.

Other limitations of current LLMs

Apart from hallucinations, contemporary language models suffer from a significant shortcoming for companies that want to implement them – they lack a company's internal data context. To address this issue through fine-tuning, ML practitioners must repeatedly adjust the model whenever the data undergoes changes. RAG addresses these limitations as well.

Token 1.3: What is Retrieval-Augmented Generation (RAG)?

we discuss the origins of RAG, what LLMs limitations it tries to fix, its architecture, and why it is so popular. Enjoy the collection of helpful links

Introduction

Why do errors happen in LLMs?

Other limitations of current LLMs

RAG Architecture

Reply