DataStax's CTO discusses how Retrieval Augmented Generation (RAG) enhances AI reliability, reduces hallucinations, and transforms information retrieval.
Retrieval Augmented Generation (RAG) has become essential for IT leaders and enterprises looking to implement generative AI. By using a large language model (LLM) and RAG, enterprises can ground an LLM in enterprise data, improving the accuracy of outputs.
But how does RAG work? What are the use cases for RAG? And are there any real alternatives?
TechRepublic sat down with Davor Bonaci, chief technology officer and executive vice president at database and AI company DataStax, to find out how RAG is being leveraged in the market during the rollout of generative AI in 2024 and what he sees as the technology's next step in 2025.
RAG is a technique that improves the relevance and accuracy of generative AI LLM model outputs by adding extended or augmented context from an enterprise. It is what allows IT leaders to use generative AI LLMs for enterprise use cases.
Bonaci explained that while LLMs have "basically been trained on all the information available on the internet," up to a certain cut-off date, depending on the model, their language and general knowledge strengths are offset by significant and well-known problems, such as AI hallucinations.
SEE: Zetaris on why federated data lakes are the future for powering AI
"If you want to use it in an enterprise setting, you must ground it in enterprise data. Otherwise, you get a lot of hallucinations," he said. "With RAG, instead of just asking the LLM to produce something, you say, 'I want you to produce something, but please consider these things that I know to be accurate.'"
RAG gives an LLM reference to an enterprise information set, such as a knowledge base, a database, or a document set. For instance, DataStax's main product is its vector database, Astra DB, which enterprises are using to support the building of AI applications in enterprises.
In practice, a query input given by a user would go through a retrieval step -- a vector search -- identifying the most relevant documents or pieces of information from a pre-defined knowledge source. This could include enterprise documents, academic papers, or FAQs.
The retrieved information is then fed into the generative model as additional context alongside the original query, allowing the model to ground its response in real-world, up-to-date, or domain-specific knowledge. This grounding reduces the risk of hallucinations that could be deal breakers for an enterprise.
The difference between using generative AI with and without RAG is "night and day," Bonaci said. For an enterprise, the propensity for an LLM to hallucinate essentially means they are "unusable" or only for very limited use cases. The RAG technique is what opens the door to generative AI for enterprises.
"At the end of the day, they [LLMs] have knowledge from seeing things on the internet," Bonaci explained. "But if you ask a question that is kind of out of the left field, they're going to give you a very confident answer that may ... be completely wrong."
SEE: Generative AI has become a source of costly mistakes for enterprises
Bonaci noted that RAG techniques can boost the accuracy of LLM outputs to over 90% for non-reasoning tasks, depending on the models and the benchmarks used. For complex reasoning tasks, they are more likely to deliver between 70-80% accuracy using RAG techniques.
RAG is used across several typical generative AI use cases for organisations, including:
Using LLMs augmented with RAG, enterprises can automate repeatable tasks. A common use case for automation is customer support, where the system can be empowered to search documentation, provide answers, and take actions like canceling a ticket or making a purchase.
RAG can be leveraged to synthesize and summarise large amounts of information. Bonaci gave the example of customer reviews, which can be summarised in a personalised way that is relevant to the user's context, such as their location, past purchases, or travel preferences.
RAG can be applied to improve search results in an enterprise, making them more relevant and context-specific. Bonaci noted how RAG helps streaming service users find movies or content relevant to their location or interests, even if the search terms don't exactly match available content.
Using knowledge graphs with RAG is an "advanced version" of basic RAG. Bonaci explained that while a vector search in basic RAG identifies similarities in a vector database -- making it well-suited for general knowledge and natural human language -- it has limitations for certain enterprise use cases.
In a scenario where a mobile phone company offers multiple-tiered plans with varying inclusions, a customer inquiry -- such as whether international roaming is included -- would require the AI to decide. A knowledge graph can help organise information to help it figure out what applies.
SEE: Digital maturity key to success in AI for cybersecurity
"The problem is the content in those plan documents are conflicting with each other," Bonaci said. "So the system doesn't know which one is true. So you could use a knowledge graph to help you organise and relate information correctly, to help you resolve these conflicts."
The main alternative to RAG is fine-tuning a generative AI model. With fine-tuning, instead of using enterprise data as a prompt, data is fed into the model itself to create an influenced data set to prime the model for use in a way that can leverage that enterprise data.
Bonaci said that, to date, RAG has been the method widely agreed upon in the industry as the most effective way to make generative AI relevant for an enterprise.
"We do see people fine-tuning models, but it just solves a small niche of problems, and so it has not been widely accepted as a solution," he said.