Skip to content

Latest commit

 

History

History
109 lines (104 loc) · 73.8 KB

rag_research_table.md

File metadata and controls

109 lines (104 loc) · 73.8 KB

🌟 Most Impactful RAG Papers

(March 2023 to Present)

The concept of Retrieval-Augmented Generation (RAG) was introduced in 2021 through a seminal paper. Since then, there has been significant growth in RAG research, particularly in the past year, driven by the emergence of numerous LLMs. RAG has become one of the most widely used applications of LLMs. The below table provides summaries of top papers published between March 2023 and May 2024, covering various topics related to RAG research. These topics are:

  1. RAG Survey: Comprehensive overview of existing methods in RAG.
  2. RAG Enhancement (Advanced Techniques): Proposals for improving the efficiency and effectiveness of the RAG pipeline.
  3. Retrieval Improvement: Techniques focused on enhancing the retrieval component of RAG.
  4. Comparison Papers: Studies comparing RAG with other methods or approaches.
  5. Domain-Specific RAG: Adaptation of RAG techniques for specific domains or applications.
  6. RAG Evaluation: Assessment of the performance and effectiveness of RAG models.
  7. RAG Embeddings: Methods for developing better embedding techniques optimized for RAG or retrieval in RAG.
  8. Input Processing for RAG:Techniques for preprocessing input data to optimize the performance and effectiveness of RAG models.
  9. RAG Framework: An open-sourced tool/framework that can be used to implement RAG pipelines

This table will continue to be updated regularly, so stay tuned for more updates!

Title Description Tags Month
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation FRAMES is a newly proposed evaluation dataset aimed at testing large language models (LLMs) in end-to-end Retrieval-Augmented Generation (RAG) scenarios, focusing on factuality, retrieval, and reasoning. Unlike previous benchmarks that assess these abilities separately, FRAMES provides a unified framework to evaluate LLMs' performance in generating accurate, multi-hop responses that require synthesizing information from multiple sources. Baseline results show that state-of-the-art LLMs struggle with this task, achieving 0.40 accuracy without retrieval. However, the accuracy improves to 0.66 with a multi-step retrieval pipeline, highlighting the dataset's value in advancing RAG system development. RAG Evaluation September 2024
Boosting Healthcare LLMs Through Retrieved Context This paper examines the limitations and potential of context retrieval methods to improve factuality and reliability in large language models (LLMs), specifically within healthcare. By optimizing retrieval components, the research shows that open LLMs can perform on par with private solutions in healthcare benchmarks, such as multiple-choice question answering. To address the unrealistic inclusion of possible answers in benchmark setups, the study introduces OpenMedPrompt, a pipeline designed to generate more reliable open-ended answers, moving LLM technology closer to practical use in healthcare settings. Retrieval Improvement September 2024
Enhancing Structured-Data Retrieval with GraphRAG: Soccer Data Case Study Structured-GraphRAG is a new framework designed to improve information retrieval from structured datasets in natural language queries by utilizing multiple knowledge graphs. These graphs capture complex relationships between entities, enabling more accurate and comprehensive information retrieval compared to traditional methods. By grounding responses in structured data, Structured-GraphRAG enhances the reliability of language model outputs. In a case study on soccer data, it showed improved query processing efficiency and reduced response times, demonstrating its broad applicability across various structured data domains. Retrieval Improvement September 2024
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery MemoRAG is a new approach to Retrieval-Augmented Generation (RAG) that enhances long-term memory capabilities for handling complex tasks, where conventional RAG struggles. It uses a dual-system setup: a lightweight long-range model to generate draft answers and guide retrieval, and a more powerful model to generate the final answer. This design allows MemoRAG to perform better not only on straightforward tasks but also on those involving ambiguous information and unstructured knowledge. MemoRAG shows superior performance in experiments, outperforming traditional RAG systems across a range of tasks. RAG Enhancement September 2024
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval This paper introduces RetrievalAttention, a training-free method to speed up attention computation in large language models by addressing the challenge of high GPU memory consumption and inference latency in long-context scenarios. It uses approximate nearest neighbor search (ANNS) to retrieve relevant key-value vectors during generation, significantly reducing the memory needed while maintaining accuracy. With an attention-aware vector search algorithm, RetrievalAttention reduces the data accessed to just 1-3%, achieving faster performance with sub-linear time complexity, and requires only 16GB GPU memory for serving 128K tokens on models with 8B parameters. Retrieval Improvement September 2024
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models Promptriever is a new retrieval model designed to be prompted like a language model, providing a more intuitive interface for users. Trained on nearly 500k instances from MS MARCO, it excels in standard retrieval tasks and instruction-following. Key results include achieving state-of-the-art performance on relevance tasks, increased robustness to query phrasing, and the ability to improve performance through prompting for hyperparameter search. This work bridges LM prompting techniques with information retrieval, opening up new possibilities for future research. Retrieval Improvement September 2024
Graph Retrieval-Augmented Generation: A Survey This paper introduces GraphRAG, a novel approach that enhances RAG by leveraging the structural relationships among entities in databases to improve the precision and context-awareness of LLM outputs. Unlike traditional RAG systems, GraphRAG captures relational knowledge to address challenges like hallucination, lack of domain-specific knowledge, and outdated information. The paper provides the first comprehensive overview of GraphRAG methodologies, formalizing its workflow, key technologies, and training methods. It also reviews application domains, evaluation strategies, and industrial use cases, and suggests future research directions to advance the field. Domain-Specific RAG August 2024
Agentic Retrieval-Augmented Generation for Time Series Analysis This paper introduces a novel agentic Retrieval-Augmented Generation framework for time series analysis, designed to overcome challenges like complex spatio-temporal dependencies and distribution shifts. The framework uses a hierarchical, multi-agent architecture where a master agent coordinates specialized sub-agents, each fine-tuned for specific time series tasks. These sub-agents leverage smaller, pre-trained language models (SLMs) and retrieve relevant prompts from a shared repository to enhance predictions. The proposed modular RAG approach offers flexibility and achieves state-of-the-art performance across various time series tasks, outperforming traditional task-specific methods. Domain-Specific RAG August 2024
Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models This paper explores the impact of different noise types on Retrieval-Augmented Generation (RAG) systems, challenging the assumption that all noise is detrimental to large language models (LLMs). By defining seven distinct linguistic noise types, the authors introduce NoiserBench, a benchmark framework for evaluating RAG systems across various datasets and reasoning tasks. Empirical analysis of eight LLMs reveals that noise can be categorized into beneficial and harmful types, with beneficial noise potentially enhancing model performance. The findings provide insights for developing more robust RAG solutions and reducing hallucinations in diverse retrieval scenarios. RAG Survey August 2024
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation This paper presents RAG Foundry, an open-source framework designed to simplify the implementation of Retrieval-Augmented Generation (RAG) systems. RAG Foundry integrates data creation, training, inference, and evaluation into a unified workflow, enabling rapid prototyping and experimentation with various RAG techniques. The framework is demonstrated by augmenting and fine-tuning Llama-3 and Phi-3 models, resulting in consistent improvements across multiple knowledge-intensive datasets. The code is available on GitHub. RAG Framework August 2024
Searching for Best Practices in Retrieval-Augmented Generation The paper explores the effectiveness of Retrieval-Augmented Generation techniques in providing up-to-date information, reducing hallucinations, and improving response quality, especially in specialized fields. Despite their benefits, RAG methods often face challenges with complexity and slow response times. Through comprehensive experiments, the authors propose strategies to optimize RAG practices, balancing performance and efficiency. Additionally, the study highlights how multimodal retrieval techniques can enhance question-answering for visual inputs and expedite multimodal content generation using a "retrieval as generation" approach. RAG Survey July 2024
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs The paper introduces RankRAG, an innovative instruction fine-tuning framework for LLMs that optimizes both context ranking and answer generation in Retrieval-Augmented Generation. By incorporating a small amount of ranking data, RankRAG surpasses traditional expert ranking models and even performs better than LLMs fine-tuned exclusively on extensive ranking data. The Llama3-RankRAG model outperforms Llama3-ChatQA-1.5 and GPT-4 across nine knowledge-intensive benchmarks and matches GPT-4's performance on five biomedical RAG benchmarks without domain-specific fine-tuning, showcasing its strong generalization abilities. Retrieval Improvement July 2024
Context Embeddings for Efficient Answer Generation in RAG The paper introduces COCOM, a context compression method that accelerates the performance of Retrieval-Augmented Generation (RAG) by reducing lengthy contextual inputs to a few Context Embeddings. This approach significantly decreases decoding time, allowing for different compression rates that balance speed and answer quality. COCOM outperforms previous methods by effectively managing multiple contexts and demonstrates a speed-up of up to 5.69× while maintaining superior performance. RAG Enhancement July 2024
Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach The paper compares Retrieval Augmented Generation (RAG) and long-context (LC) capabilities of modern LLMs, such as Gemini-1.5 and GPT-4, which excel in understanding extended contexts. The findings indicate that LC models generally outperform RAG in average performance when adequately resourced, although RAG is more cost-effective. To optimize efficiency and maintain performance, the authors propose "Self-Route," a method that directs queries to either RAG or LC based on model self-reflection, reducing computation costs while achieving results similar to LC. Comparison Papers July 2024
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models The emergence of Medical Large Vision Language Models (Med-LVLMs) has improved medical diagnosis, but these models often produce factually inaccurate responses. The paper introduces RULE, a method to enhance factual accuracy by calibrating the number of retrieved contexts in Retrieval-Augmented Generation (RAG) and fine-tuning the model with a preference dataset. RULE significantly improves factual accuracy, achieving an average improvement of 20.8% on three medical VQA datasets. The benchmark and code are publicly available on GitHub. Domain-Specific RAG July 2024
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities The paper introduces ChatQA 2, a Llama3-based model designed to rival proprietary models like GPT-4-Turbo in long-context understanding and retrieval-augmented generation. By extending Llama3's context window from 8K to 128K tokens and employing a three-stage instruction tuning process, ChatQA 2 achieves comparable accuracy to GPT-4-Turbo and surpasses it on RAG benchmarks. The study highlights how state-of-the-art retrievers can mitigate context fragmentation, improving performance on long-context tasks. RAG Enhancement July 2024
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems LLMs and RAG systems can now handle large input sizes, but evaluating their performance on long-context tasks remains difficult. To address this, the "Summary of a Haystack" (SummHay) task is introduced, requiring systems to summarize insights from synthesized document collections, with precise citations. This approach allows for automatic evaluation based on coverage and citation. Testing across multiple domains revealed that current systems struggle with SummHay, often scoring below human performance benchmarks. SummHay can also be used to examine enterprise RAG systems and position bias in long-context models. RAG Evaluation July 2024
Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework The paper addresses challenges in evaluating Retrieval-Augmented Generation (RAG) QA systems, focusing on domain-specific knowledge hallucination and the lack of suitable benchmarks for internal tasks at Infineon Technologies. To tackle these issues, the authors propose a comprehensive evaluation framework using Large Language Models (LLMs) to generate synthetic queries, assess retrieved documents and answers with LLM-based judging, and rank RAG variants using an automated Elo-based competition called RAGElo. RAG Evaluation June 2024
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs In the traditional RAG framework, short retrieval units like those from DPR work with 100-word Wikipedia paragraphs, leading to inefficiencies. To address this, LongRAG introduces a "long retriever" and "long reader" framework, processing Wikipedia into 4K-token units, 30 times longer than before. This reduces the number of units significantly while achieving higher retrieval scores: answer recall@1=71% on NQ and answer recall@2=72% on HotpotQA (full-wiki). LongRAG feeds these retrieved units to a long-context LLM for zero-shot answer extraction, achieving EM scores of 62.7% on NQ and 64.3% on HotpotQA, showcasing state-of-the-art performance without training. RAG Enhancement June 2024
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers This paper introduces the Decision QA benchmark (DQA) for video game scenarios like Europa Universalis IV and Victoria 3, addressing decision-making tasks. It also proposes PlanRAG, a new RAG technique where a language model generates decision plans and uses a retriever for data queries. PlanRAG outperforms existing iterative RAG methods by 15.8% in the Locating scenario and 7.4% in the Building scenario. RAG Enhancement June 2024
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries In this paper, the authors mechanistically examine RAG, revealing that language models predominantly rely on contextual information rather than parametric memory to answer questions. They employ Causal Mediation Analysis to illustrate minimal utilization of parametric memory and analyze Attention Contributions and Knockouts to show that the last token residual stream enriches from informative context tokens rather than directly from the question's subject token. RAG Enhancement June 2024
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models The paper introduces Buffer of Thoughts (BoT), a novel approach enhancing LLMs by using meta-buffer to store and adapt informative thought-templates for efficient reasoning across tasks. BoT achieves significant performance improvements over SOTA methods on 10 reasoning-intensive tasks, demonstrating superior generalization and robustness while maintaining lower computational costs compared to multi-query prompting methods. RAG Enhancement June 2024
SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation This paper introduces Self-aware Knowledge Retrieval (SeaKR), a novel adaptive RAG model that leverages LLMs' self-aware uncertainty to enhance knowledge retrieval and integration. SeaKR activates retrieval when LLMs exhibit high uncertainty, re-ranking retrieved snippets based on their potential to reduce this uncertainty. For tasks requiring multiple retrievals, SeaKR uses self-aware uncertainty to select optimal reasoning strategies. Experimental results on diverse Question Answering datasets demonstrate SeaKR's superiority over existing adaptive RAG methods. RAG Enhancement June 2024
RE-AdaptIR: Improving Information Retrieval through Reverse Engineered Adaptation This paper explores the use of reverse engineered adaptation (RE-AdaptIR) to enhance large language models (LLMs) for information retrieval (IR) without the need for numerous labeled examples. By applying RE-AdaptIR, the research demonstrates improved performance in both training domains and zero-shot scenarios where models encounter previously unseen queries. The findings highlight significant performance improvements and provide actionable insights for practitioners in the field. Retrieval Improvement June 2024
CRAG -- Comprehensive RAG Benchmark The Comprehensive RAG Benchmark (CRAG) addresses the limitations of existing RAG datasets by providing a diverse and dynamic set of 4,409 question-answer pairs and mock APIs for simulating web and Knowledge Graph searches. CRAG evaluates LLMs' QA capabilities across various domains and question categories, revealing that even state-of-the-art RAG solutions struggle with accuracy, especially on questions with higher dynamism, lower popularity, or higher complexity. The benchmark has already fostered significant engagement, paving the way for future research and improvements in RAG and general QA solutions. RAG Evaluation June 2024
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems Contrary to common practices that favor "instructed" LLMs fine-tuned for instruction-following, this study finds that base models outperform instructed ones by 20% on average in RAG tasks. This challenges prevailing assumptions about instructed LLMs' superiority in RAG applications and highlights the need for further investigation and discussion. RAG Enhancement June 2024
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts The work highlights the limitations of current large language models in knowledge-intensive tasks due to issues like untimeliness, high costs of knowledge updates, and hallucinations. It introduces METRAG, a Multi-layered Thoughts enhanced Retrieval-Augmented Generation framework, which goes beyond traditional similarity-oriented methods by incorporating both similarity- and utility-oriented thoughts, and uses an LLM as a task-adaptive summarizer. Extensive experiments demonstrate that METRAG significantly improves the performance of retrieval-augmented generation in knowledge-intensive tasks. RAG Enhancement May 2024
HippoRAG Neurobiologically Inspired Long-Term Memory for Large Language Models The paper introduces HippoRAG, a retrieval framework inspired by the hippocampal indexing theory to enhance knowledge integration in large language models. By combining LLMs, knowledge graphs, and the Personalized PageRank algorithm, HippoRAG mimics human memory processes. Experiments show it significantly outperforms existing methods in multi-hop question answering, offering improved performance, cost efficiency, and speed. RAG Enhancement May 2024
Don't Forget to Connect! Improving RAG with Graph-based Reranking The paper addresses challenges in Retrieval Augmented Generation when documents have partial information or less obvious connections to the context. Introducing G-RAG, a reranker based on graph neural networks (GNNs), the method combines document connections and semantic information to enhance RAG. G-RAG outperforms state-of-the-art approaches with a smaller computational footprint, and significantly outperforms PaLM 2 as a reranker, highlighting the importance of effective reranking in RAG. Retrieval Improvement May 2024
GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning The paper introduces GNN-RAG, a method that combines LLMs and Graph Neural Networks (GNNs) for Knowledge Graph Question Answering (KGQA). GNN-RAG uses GNNs to retrieve answer candidates from dense KG subgraphs and LLMs to reason over extracted paths. This approach significantly improves performance on KGQA benchmarks, outperforming state-of-the-art models, including GPT-4, especially in multi-hop and multi-entity questions. Domain-Specific RAG May 2024
Observations on Building RAG Systems for Technical Documents Retrieval augmented generation (RAG) for technical documents creates challenges as embeddings do not often capture domain information. The paper reviews prior art for important factors affecting RAG and perform experiments to highlight best practices and potential challenges to build RAG systems for technical documents. RAG Survey May 2024
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing The paper surveys how LLMs tackle NLP challenges, integrating external information to boost performance. It explores Retrieval-Augmented Language Models (RALMs) like RAG and RAU, detailing their evolution, taxonomy, and applications in various NLP tasks. Key components and evaluation methods are discussed, emphasizing strengths, limitations, and avenues for future research to enhance retrieval quality and efficiency. Overall, it offers structured insights into RALMs' potential for advancing NLP. RAG Survey April 2024
When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively The paper illustrates how LLMs can effectively integrate with information retrieval (IR) systems, especially when additional context is necessary for answering questions. It suggests that while popular questions are often answered by LLMs' parametric memory, less popular ones benefit from IR usage. A tailored training approach introduces a special token, ⟨RET⟩, for questions where LLMs lack answers, leading to improvements demonstrated by the Adaptive Retrieval LLM (ADAPT-LLM) on the PopQA dataset. Evaluation reveals ADAPT-LLM's ability to use ⟨RET⟩ for questions needing IR, while maintaining high accuracy relying solely on parametric memory. RAG Enhancement April 2024
A Survey on Retrieval-Augmented Text Generation for Large Language Models The paper introduces Retrieval-Augmented Generation which combines retrieval methods with deep learning to overcome the static limitations of large language models by integrating real-time external information. Focusing on text, RAG mitigates LLMs' tendency to generate inaccurate responses, enhancing reliability through real-world data. Organized into pre-retrieval, retrieval, post-retrieval, and generation stages, the paper outlines RAG's evolution and evaluates its performance, aiming to consolidate research, clarify its technology, and broaden LLMs' applicability. RAG Survey April 2024
RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback RA-ISF proposes Retrieval Augmented Iterative Self-Feedback to enhance large language models' problem-solving abilities by iteratively decomposing tasks and processing them in three submodules. Experiments demonstrate its superiority over existing benchmarks like GPT3.5 and Llama2, notably improving factual reasoning and reducing hallucinations. RAG Enhancement March 2024
RAFT: Adapting Language Model to Domain Specific RAG This paper introduces RAFT (Retrieval Augmented FineTuning), a training approach designed to enhance a pre-trained Large Language Model's ability to answer questions in domain-specific contexts. RAFT focuses on adapting the model to gain new knowledge by fine-tuning it to ignore irrelevant documents retrieved during the question-answering process. By selectively citing relevant information from retrieved documents, RAFT improves the model's reasoning capabilities and performance across various datasets like PubMed, HotpotQA, and Gorilla. RAG Enhancement March 2024
Fine Tuning vs. Retrieval Augmented Generation for Less Popular Knowledge This paper investigates the effectiveness of Retrieval Augmented Generation and fine-tuning (FT) approaches in improving the performance of Large Language Models on low-frequency entities in question answering tasks. While FT shows significant improvement across entities of different popularity levels, RAG outperforms other methods. Furthermore, advancements in retrieval and data augmentation techniques enhance the success of both RAG and FT approaches in customizing LLMs for handling low-frequency entities. Comparison Paper March 2024
Improving language models by retrieving from trillions of tokens This paper introduces RETRO, a Retrieval-Enhanced Transformer, which enhances auto-regressive language models by conditioning on document chunks retrieved from a massive corpus. Despite using significantly fewer parameters compared to existing models like GPT-3 and Jurassic-1, RETRO achieves comparable performance on tasks like question answering after fine-tuning. By combining a frozen Bert retriever, a differentiable encoder, and a chunked cross-attention mechanism, RETRO leverages an order of magnitude more data during prediction. This approach presents new possibilities for improving language models through explicit memory at an unprecedented scale. RAG Enhanced LLMs March 2024
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation The RAT method enhances large language models' reasoning and generation capabilities in long-horizon tasks by iteratively revising a chain of thoughts with relevant information retrieved through information retrieval. By incorporating retrieval-augmented thoughts into models like GPT-3.5, GPT-4, and CodeLLaMA-7b, RAT significantly improves performance across various tasks, including code generation, mathematical reasoning, creative writing, and embodied task planning, with average rating score increases of 13.63%, 16.96%, 19.2%, and 42.78%, respectively. RAG Enhancement March 2024
Instruction-tuned Language Models are Better Knowledge Learners Instruction-tuned Language Models are Better Knowledge Learners introduces pre-instruction-tuning (PIT), a method that instruction-tunes on questions before training on documents, contrary to the standard approach. PIT significantly enhances LLMs' ability to absorb knowledge from new documents, outperforming standard instruction-tuning by 17.8%, as demonstrated in extensive experiments and ablation studies. Instruction Tuning February 2024
Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models Hallucinations present a significant challenge for large language models often resulting from limited internal knowledge. While incorporating external information can mitigate this, it also risks introducing irrelevant details, leading to external hallucinations. In response, The authors introduce Rowen, which selectively augments LLMs with retrieval when detecting inconsistencies across languages, indicative of hallucinations. This semantic-aware process balances internal reasoning with external evidence, effectively mitigating hallucinations. Empirical analysis shows Rowen surpasses existing methods in detecting and mitigating hallucinated content in LLM outputs. RAG Enhancement February 2024
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering The paper introduces GraphQA, a framework enabling users to interactively query textual graphs through conversational interfaces for various real-world applications. They propose G-Retriever, which combines graph neural networks, large language models, and Retrieval-Augmented Generation to navigate large textual graphs effectively. Through soft prompting and optimization techniques, G-Retriever achieves superior performance and scalability while mitigating issues like hallucination. Empirical evaluations across multiple domains demonstrate its effectiveness, showcasing its potential for practical applications. Retriever Improvement February 2024
Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks Retrieval-Augmented Data Augmentation (RADA) is a method aimed at improving model performance in low-resource settings with limited training data. RADA addresses the challenge of suboptimal and less diverse synthetic data generation by incorporating examples from other datasets. It retrieves relevant instances based on similarities with the given seed data and prompts Large Language Models to generate new samples with contextual information from both original and retrieved samples. Experimental results demonstrate the effectiveness of RADA in training and test-time data augmentation scenarios, outperforming existing LLM-powered data augmentation methods. Domain Specific RAG February 2024
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval RAPTOR presents a new approach to retrieval-augmented language modeling by introducing a method that constructs a hierarchical summary tree from large documents, enabling more nuanced and comprehensive retrieval of information. Unlike conventional methods that pull short, direct excerpts from texts, RAPTOR's recursive process embeds, clusters, and summarizes text chunks at multiple abstraction levels. This structured retrieval allows for a deeper understanding and integration of information across entire documents, significantly enhancing performance on complex tasks requiring multi-step reasoning. Demonstrated improvements on various benchmarks, including a remarkable 20% absolute accuracy increase on the QuALITY benchmark with GPT-4, underline RAPTOR's potential to revolutionize how models access and leverage extensive knowledge bases, setting new standards for question-answering and beyond. RAG Enhancement January 2024
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture The paper explores two methods used by developers to integrate proprietary and domain-specific data into Large Language Models : Retrieval-Augmented Generation and Fine-Tuning. It presents a detailed pipeline for applying these methods to LLMs like Llama2-13B, GPT-3.5, and GPT-4, focusing on extracting information, generating questions and answers, fine-tuning, and evaluation. The paper demonstrates the capacity of fine-tuned models to leverage cross-geographic information, enhancing answer similarity significantly, and underscores the broader applicability and benefits of LLMs in various industrial domains. Comparison Paper January 2024
Corrective Retrieval Augmented Generation CRAG introduces a novel strategy to enhance the robustness and accuracy of large language models during retrieval-augmented generation processes. Addressing the potential pitfalls of relying on the relevance of retrieved documents, CRAG employs a retrieval evaluator to gauge the quality and relevance of documents for a given query, enabling adaptive retrieval strategies based on confidence scores. To overcome the limitations of static databases, CRAG integrates large-scale web searches, providing a richer pool of documents. Additionally, its unique decompose-then-recompose algorithm ensures the model focuses on pertinent information while discarding the irrelevant, thereby refining the quality of generation. Designed as a versatile, plug-and-play solution, CRAG significantly enhances RAG-based models' performance across a range of generation tasks, demonstrated through substantial improvements in four diverse datasets. RAG Enhancement January 2024
UniMS-RAG: A Unified Multi-source Retrieval-Augmented Generation for Personalized Dialogue Systems The paper introduces UniMS-RAG, a novel framework designed to address the personalization challenge in dialogue systems by incorporating multiple knowledge sources. It decomposes the task into three sub-tasks: Knowledge Source Selection, Knowledge Retrieval, and Response Generation, and unifies them into a single sequence-to-sequence paradigm during training. This allows the model to dynamically retrieve and evaluate relevant evidence using special tokens, facilitating interaction with diverse knowledge sources. Furthermore, a self-refinement mechanism is proposed to iteratively refine generated responses based on consistency and relevance scores. Domain Specific RAG January 2024
Retrieval-Augmented Generation for Large Language Models: A Survey This survey delves into Retrieval-Augmented Generation as a solution to challenges faced by Large Language Models, including hallucination and outdated knowledge. RAG integrates external databases to enhance accuracy and credibility, particularly for knowledge-intensive tasks, and enables continuous knowledge updates. The paper reviews the evolution of RAG paradigms, covering Naive RAG, Advanced RAG, and Modular RAG, while examining the retrieval, generation, and augmentation techniques. It discusses state-of-the-art technologies and introduces an updated evaluation framework and benchmark, concluding with insights into current challenges and future research directions. RAG Survey December 2023
Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models Chain-of-Noting (CoN) introduces an innovative approach to enhance the robustness and reliability of retrieval-augmented language models (RALMs) by addressing the issue of processing irrelevant or noisy information and improving the model's ability to recognize when it lacks sufficient knowledge to answer a question. CoN's strategy involves creating sequential reading notes on retrieved documents, facilitating a more detailed assessment of their relevance and integrating this evaluation into the answer generation process. This method not only helps in filtering out unhelpful information but also empowers RALMs to more confidently identify and admit when an answer is beyond their current knowledge or data scope. Leveraging ChatGPT for training data creation and implementing CoN on a LLaMa-2 7B model, this approach has demonstrated significant performance improvements over traditional RALMs in open-domain question answering tasks. The results include a notable increase in Exact Match (EM) scores amidst noisy document retrieval and enhanced rejection rates for questions outside the model's pre-training knowledge, underscoring CoN's potential in making RALMs more reliable and trustworthy. RAG Enhanced LLMs November 2023
From Classification to Generation: Insights into Crosslingual Retrieval Augmented ICL The paper introduces CREA-ICL, an innovative method designed to enhance the zero-shot learning capabilities of multilingual pre-trained language models (MPLMs) in low-resource languages through cross-lingual retrieval-augmented in-context learning. By retrieving semantically similar prompts from high-resource languages, this approach seeks to bolster the models' performance across a range of tasks. The findings indicate consistent improvements in classification tasks; however, the approach encounters obstacles when applied to generation tasks. These outcomes provide valuable insights into the distinctions in effectiveness between classification and generation domains when utilizing retrieval-augmented in-context learning, highlighting the nuanced challenges and potential strategies for advancing the application of MPLMs in multilingual settings. Domain Specific RAG November 2023
REST: Retrieval-Based Speculative Decoding The paper introduces REST, a novel algorithm called Retrieval-Based Speculative Decoding, aimed at accelerating language model generation. Unlike prior methods, REST leverages retrieval to generate draft tokens based on common phases and patterns observed during text generation. It seamlessly integrates with existing language models without additional training, achieving notable speedups of 1.62X to 2.36X on code or text generation tasks when benchmarked against 7B and 13B language models in a single-batch setting. RAG Enhancement November 2023
Learning to Filter Context for Retrieval-Augmented Generation The FILCO method is introduced to enhance the quality of context provided to generation models in retrieval-augmented systems. By identifying useful context and training context filtering models, FILCO aims to mitigate issues arising from irrelevant passages during generation. Experimental results across various knowledge-intensive tasks demonstrate the effectiveness of FILCO in improving output quality, surpassing existing approaches in tasks such as question answering, fact verification, and dialog generation. This method proves beneficial regardless of whether the retrieved context aligns perfectly with the desired output. RAG Enhancement November 2023
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection Self-RAG introduces a novel approach to enhance the quality and accuracy of large language models by incorporating a process of retrieval and self-reflection. Unlike traditional retrieval-augmented generation models that may retrieve and use external passages indiscriminately, Self-RAG employs a more dynamic method. It enables an LLM to adaptively decide when to retrieve information and critically assess the relevance of retrieved content and its own generated responses through the use of special "reflection tokens." This innovative mechanism allows the model to adjust its behavior based on the specifics of the task at hand, offering a higher degree of control during inference. Testing on a variety of tasks, including open-domain question answering, reasoning, and fact verification, demonstrates that Self-RAG models (with 7B and 13B parameters) surpass both conventional LLMs and other retrieval-augmented models in performance, showcasing notable improvements in generating factual and accurately cited long-form content. RAG Enhancement October 2023
Benchmarking Large Language Models in Retrieval-Augmented Generation This paper tackles the critical task of evaluating how Retrieval-Augmented Generation influences the performance of large language models across a spectrum of capabilities essential for effective RAG application. Through the establishment of the Retrieval-Augmented Generation Benchmark (RGB), a novel corpus designed for RAG evaluation in both English and Chinese, the study meticulously assesses LLMs against four core abilities: noise robustness, negative rejection, information integration, and counterfactual robustness. The analysis of six representative LLMs using RGB exposes their relative strengths and weaknesses, revealing that while these models demonstrate resilience against noise, they falter significantly in rejecting irrelevant information, integrating diverse information sources, and countering false information. The findings underscore the need for further advancements in LLMs to harness the full potential of RAG, highlighting the complexity and challenges of improving LLMs' factual accuracy and decision-making processes. RAG Evaluation October 2023
Knowledge-Augmented Language Model Verification The paper introduces a novel method aimed at improving the factual accuracy of language model responses by incorporating a verification step into the knowledge-augmentation process. Recognizing that LMs often produce factually incorrect answers due to the limitations of their internalized knowledge, this approach enhances text generation by identifying and correcting errors in both the retrieval of relevant external knowledge and the reflection of this knowledge in the generated text. A specialized verifier, a smaller LM trained via instruction-finetuning, is employed to detect inaccuracies in both retrieval and generation. Errors identified by the verifier can be corrected by updating the retrieved knowledge or modifying the generated text. Moreover, the use of an ensemble of outputs guided by different instructions, combined with a single verifier, boosts the verification's reliability. Tested across multiple question answering benchmarks, this method significantly increases the factual accuracy of responses, demonstrating the verifier's effectiveness in pinpointing and addressing errors in knowledge retrieval and text generation. RAG Enhancement October 2023
Optimizing Retrieval-augmented Reader Models via Token Elimination This study introduces an approach to enhance the efficiency of Fusion-in-Decoder (FiD), a retrieval-augmented language model widely used in open-domain tasks like question answering and fact checking. By analyzing the importance of each retrieved passage to the model's performance, the researchers propose a method for selectively eliminating non-critical information at the token level. This token elimination strategy significantly reduces decoding time—by up to 62.2%—with minimal impact on the model's effectiveness, only reducing performance by 2%. Surprisingly, in some instances, this approach not only maintains but also improves the model's performance. This method offers a promising direction for optimizing the balance between computational efficiency and accuracy in retrieval-augmented reader models. RAG Enhanced LLMs October 2023
Self-Knowledge Guided Retrieval Augmentation for Large Language Models SKR (Self-Knowledge guided Retrieval) is a novel method designed to enhance the performance of large language models by intelligently incorporating external knowledge. Recognizing the limitations of LLMs in terms of the completeness and updatability of their knowledge, SKR focuses on improving LLMs' ability to discern what they know and what they don't, allowing them to selectively seek external information. This approach aims to mitigate the issues with retrieval-based methods that sometimes detract from the model's original responses. By enabling LLMs to refer to previously encountered questions and judiciously utilize external resources for new queries, SKR has shown to outperform existing methods in various datasets, leveraging models like InstructGPT or ChatGPT for improved question-answering capabilities. Retriever Improvement October 2023
Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models The "Tree of Clarifications" (ToC) framework addresses the challenge of ambiguous questions in open-domain question answering by creating a structured tree of potential interpretations, allowing for the generation of comprehensive long-form answers. This method leverages few-shot prompting and external knowledge to recursively disambiguate questions and gather relevant information. ToC not only surpasses other few-shot methods across various metrics but also outperforms fully-supervised approaches in Disambig-F1 and Disambig-ROUGE scores, offering a robust solution to understanding and answering ambiguously posed questions effectively. RAG Enhanced LLMs October 2023
Retrieval-Generation Synergy Augmented Large Language Models The paper introduces a novel iterative framework that combines retrieval and generation processes to enhance large language models for knowledge-intensive tasks. This collaborative approach allows the model to access both parametric knowledge (built into the model itself) and non-parametric knowledge (from external sources) and iteratively refine its understanding and output through interactions between retrieval and generation phases. This synergy is particularly beneficial for complex tasks requiring multi-step reasoning. Tested across single-hop and multi-hop question-answering datasets, the method demonstrates a marked improvement in LLMs' reasoning capabilities, surpassing existing approaches in performance. RAG Enhancement October 2023
RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation RECOMP introduces a method to optimize the efficiency of retrieval-augmented language models by compressing retrieved documents into concise summaries before integrating them into the model's context. This approach aims to make the inference process less resource-intensive and helps LMs more effectively discern pertinent information from retrieved documents. RECOMP employs two types of compressors: an extractive compressor, which identifies and uses key sentences from documents, and an abstractive compressor, which creates summaries by combining information from various sources. These compressors are designed to enhance LMs' task performance while generating brief summaries, even capable of omitting augmentation when retrieved documents are not beneficial. RAG Enhancement October 2023
Retrieval meets Long Context Large Language Models The paper delves into the comparative benefits of retrieval-augmentation and extended context windows in large language models, and whether their combination could yield superior results for various downstream tasks. Using two advanced LLMs for analysis, the findings reveal that a model with a smaller context window (4K tokens) supplemented by retrieval-augmentation can match the performance of a model with a larger context window (16K tokens) fine-tuned for long-context tasks, but with significantly lower computational demand. Moreover, incorporating retrieval into LLMs enhances performance across all context window sizes. The standout model, a retrieval-augmented Llama2-70B with a 32K context window, notably outperformed leading models like GPT-3.5-turbo-16k and Davinci003 across various tasks, including question answering and summarization, while also achieving faster generation speeds. This research underscores the effectiveness of retrieval-augmentation in improving LLMs' efficiency and accuracy, offering valuable guidance for future model development strategies. Comparison Paper October 2023
Making Retrieval-Augmented Language Models Robust to Irrelevant Context This paper addresses the challenge of ensuring that retrieval-augmented language models (RALMs) remain effective and accurate, especially when confronted with irrelevant information during multi-hop reasoning tasks. Through an extensive analysis across five open-domain question answering benchmarks, the authors identify instances where retrieval augmentation actually hampers model performance. To combat this, they introduce two strategies: first, a baseline approach using a natural language inference model to filter out passages that don't support the question-answer pairs, ensuring the model isn't misled by irrelevant data. While effective in reducing inaccuracies, this method risks excluding useful information. To refine this approach, the authors develop a technique for enhancing the language model's ability to discern and appropriately use retrieved passages, by training with a combination of relevant and irrelevant contexts. Remarkably, they demonstrate that a modest dataset of just 1,000 examples can significantly improve the model's resilience to irrelevant information without compromising its performance on pertinent examples. RAG Enhanced LLMs October 2023
RA-DIT: Retrieval-Augmented Dual Instruction Tuning RA-DIT presents a novel approach to enhancing retrieval-augmented language models (RALMs) by introducing a two-step, lightweight fine-tuning process that can be applied to any large language model to equip it with retrieval capabilities. The first step focuses on fine-tuning the LLM to better utilize retrieved information, while the second step optimizes the retriever to fetch more relevant information as determined by the LLM's needs. This method stands out by not requiring costly modifications to the model's pre-training phase or relying on less effective post-hoc integration of data stores. Tested across various zero- and few-shot learning benchmarks, RA-DIT achieves unprecedented performance improvements, showcasing its effectiveness in knowledge-intensive tasks and significantly surpassing existing models in both zero-shot and few-shot scenarios. RAG Enhanced LLMs October 2023
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining InstructRetro builds on the idea of enhancing auto-regressive large language models through retrieval-augmented pretraining, presenting the largest model of its kind, Retro 48B. This model, an expansion of a 43B GPT model pretrained with an additional 100 billion tokens and leveraging Retro's method from 1.2 trillion tokens, demonstrates remarkable improvements in perplexity and factual accuracy while requiring minimal additional computational resources. The process not only showcases the scalability of retrieval-augmented pretraining but also significantly enhances instruction tuning and zero-shot generalization capabilities. InstructRetro, when fine-tuned with instructions, surpasses its GPT counterpart across various tasks, including short-form QA, reading comprehension, long-form QA, and summarization, with notable margins. Interestingly, the study also reveals that removing the encoder and utilizing only the decoder of InstructRetro yields comparable results, suggesting a promising route for optimizing GPT decoders through retrieval-augmented pretraining followed by instruction tuning. RAG Enhanced LLMs October 2023
GAR-meets-RAG Paradigm for Zero-Shot Information Retrieval The GAR-meets-RAG approach innovatively combines two paradigms—generation-augmented retrieval (GAR) and retrieval-augmented generation (RAG)—to address the zero-shot information retrieval challenge, where no labeled data from the target domain is available. This method iteratively enhances both the retrieval and rewriting stages, significantly improving recall and precision in document ranking without requiring domain-specific training data. By integrating the generative capabilities of large language models with embedding-based retrieval, the proposed methodology not only addresses the common pitfalls of high-recall retrieval and high-precision ranking in a zero-shot context but also sets new benchmarks on the BEIR and TREC-DL datasets. It achieves remarkable improvements in key metrics like Recall@100 and nDCG@10, showing up to 17% relative gains over prior state-of-the-art results, demonstrating its effectiveness in zero-shot passage retrieval tasks. Retriever Improvement October 2023
Retrieve Anything To Augment Large Language Models The paper proposes LLM-Embedder, a unified model designed to address the challenges faced by large language models by leveraging retrieval augmentation. Unlike conventional methods, LLM-Embedder optimizes retrieval for diverse LLM needs with one model. Training this unified model poses challenges due to the varied semantic relationships targeted by different retrieval tasks. To overcome this, the paper presents optimized training methodologies, including reward formulation, stabilized knowledge distillation, multi-task fine-tuning, and homogeneous negative sampling. These strategies lead to outstanding empirical performance, offering a promising solution for enhancing LLM capabilities through retrieval augmentation. Retriever Improvement October 2023
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines DSPy introduces a systematic approach for developing and optimizing language model (LM) pipelines, abstracting them as text transformation graphs. These imperative computational graphs enable declarative modules to invoke LMs, which can then learn to apply various techniques through parameterization. With a compiler that optimizes DSPy pipelines, it maximizes given metrics, allowing for sophisticated LM pipelines to be expressed and optimized efficiently. Case studies demonstrate DSPy's effectiveness in outperforming standard prompting and expert-created demonstrations across various tasks, showcasing its competitive performance even with smaller LM models. RAG Enhancement October 2023
RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder for Language Modeling RegaVAE, a novel retrieval-augmented language model, addresses the challenges of determining relevant information retrieval and effective integration during generation. By considering both source and target text, it encodes them into a latent space using a variational auto-encoder (VAE). Leveraging this compact representation, RegaVAE outperforms existing models in text generation quality and hallucination removal, as demonstrated through theoretical analysis and experiments across diverse datasets. RAG Enhanced LLMs October 2023
Text Embeddings Reveal (Almost) As Much As Text The paper explores text embedding inversion, aiming to reconstruct original text from embeddings. While a basic model performs poorly, a multi-step approach achieves 92% accuracy in recovering 32-token text inputs. This method, trained on two embedding models, successfully retrieves personal information like full names from clinical notes, highlighting potential privacy risks associated with text embeddings. Embeddings October 2023
Understanding Retrieval Augmentation for Long-Form Question Answering This paper investigates the effects of retrieval-augmented language models on long-form question answering. By comparing answers generated from LMs using the same evidence documents, the impact of retrieval augmentation on different LMs is analyzed. The study also examines various attributes of generated answers and evaluates methods for automatically judging attribution to evidence documents. Insights are provided on how retrieval augmentation influences long, knowledge-rich text generation, including attribution patterns and analysis of attribution errors, offering directions for future research in this area RAG Enhanced LLMs October 2023
Generate rather than Retrieve: Large Language Models are Strong Context Generators This study introduces GenRead, a novel approach for handling knowledge-intensive tasks like open-domain question answering, by leveraging large language models to generate rather than retrieve contextual documents. This method prompts the language model to produce context relevant to the given question, which is then used to determine the final answer. Additionally, GenRead employs a novel clustering-based prompting technique that ensures the diversity of generated documents, covering a broader range of perspectives and thereby enhancing the accuracy of answers. Through rigorous testing across multiple tasks, including QA, fact checking, and dialogue systems, GenRead has shown to significantly surpass traditional retrieval-based methods, achieving notably higher exact match scores on benchmarks like TriviaQA and WebQ without relying on external knowledge sources. This marks a significant advancement in efficiently accessing and utilizing knowledge for AI tasks. RAG Enhancement September 2023
RAGAS: Automated Evaluation of Retrieval Augmented Generation RAGAs introduces a new way to evaluate Retrieval Augmented Generation systems without the need for human-annotated references. RAG systems enhance language models by fetching information from textual databases, which helps to minimize inaccuracies or "hallucinations" in generated text. Evaluating these systems is complex due to the need to assess the retrieval's relevance, the LLM's ability to use the retrieved information accurately, and the overall quality of the generated text. RAGAs offers a comprehensive set of metrics for assessing these aspects quickly and without human annotations, facilitating more efficient development and refinement of RAG technologies. This is particularly valuable in the rapidly evolving field of large language models. RAG Evaluation September 2023
RaLLe: A Framework for Developing and Evaluating Retrieval-Augmented Large Language Models RaLLe introduces an open-source framework aimed at enhancing the development and evaluation of retrieval-augmented large language models (R-LLMs), specifically for tasks requiring a high degree of factual accuracy, like question-answering. Addressing the lack of transparency in current tools, RaLLe provides a detailed view into each step of the R-LLM process, from retrieval to generation. This enables developers to refine prompts, evaluate the efficacy of different components, and quantitatively measure the performance improvements in their models. Essentially, RaLLe offers a comprehensive toolkit for boosting the effectiveness and precision of R-LLMs in handling complex, knowledge-based tasks. RAG Enhanced LLMs August 2023
RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models The paper presents RAVEN, an approach to improving in-context learning in encoder-decoder language models through retrieval augmentation. By analyzing the ATLAS model, the authors pinpoint challenges like mismatches between training and usage, and limited context availability. RAVEN addresses these by integrating retrieval-augmented masked and prefix language modeling, alongside a novel technique called Fusion-in-Context Learning. This method boosts few-shot learning capabilities without extra training or changes to the model structure. Testing shows RAVEN surpassing ATLAS and holding its ground against some of the most sophisticated models, even with fewer parameters. This study highlights the efficacy and potential of retrieval-augmented models in enhancing in-context learning, paving the way for future advancements in the field. RAG Enhanced LLMs August 2023
KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases KnowledGPT introduces a novel framework aimed at overcoming the limitations of large language models regarding completeness, timeliness, faithfulness, and adaptability by integrating them with knowledge bases. This integration allows for enhanced retrieval and storage of knowledge, making LLMs more powerful and versatile. The framework uses "program of thought" prompting to generate search queries in code format, facilitating precise operations within KBs. Additionally, KnowledGPT enables the creation of personalized KBs to store user-specific knowledge. Through comprehensive testing, KnowledGPT has shown to significantly expand the range of questions LLMs can answer by utilizing both public and personalized knowledge sources, marking a significant step forward in making LLMs more informed and adaptable. Input Preprocessing August 2023
Learning to Retrieve In-Context Examples for Large Language Models This paper introduces a novel framework for improving in-context learning for large language models by iteratively training dense retrievers to identify high-quality examples. The framework involves training a reward model based on LLM feedback to evaluate candidate examples, followed by knowledge distillation to train a bi-encoder based dense retriever. Experimental results across 30 tasks demonstrate significant performance enhancements, showcasing the framework's generalization ability to unseen tasks. Analysis reveals that the model improves performance by retrieving examples with similar patterns, consistently benefiting LLMs of different sizes. Retriever Improvement July 2023
Active Retrieval Augmented Generation This paper explores how to enhance large language models through Active Retrieval Augmented Generation, addressing the common issue of factual inaccuracies or "hallucinations" in generated content. The proposed method, FLARE (Forward-Looking Active REtrieval augmented generation), innovates by not just retrieving information once before generation but actively deciding when and what to retrieve as the generation progresses. This process involves predicting future content needs and using those predictions to fetch relevant information dynamically. Tested across four long-form, knowledge-intensive generation tasks, FLARE shows either superior or competitive performance compared to baseline methods. This approach proves particularly useful in generating lengthy texts where the need for external information can arise at multiple points, showcasing a significant advancement in generating more accurate and reliable content. Retriever Improvement May 2023
Augmented Large Language Models with Parametric Knowledge Guiding The paper presents a novel Parametric Knowledge Guiding (PKG) framework aimed at improving the performance of Large Language Models on domain-specific tasks. By integrating a knowledge-guiding module, PKG allows LLMs to access specialized knowledge without needing to modify the original model parameters. This approach is particularly advantageous for enhancing "black-box" LLMs, which are typically not open for modification or fine-tuning. The PKG framework leverages open-source models for creating an offline knowledge base, addressing both the transparency issues and data privacy concerns associated with proprietary LLMs. The effectiveness of PKG is showcased through significant performance improvements across a variety of knowledge-intensive tasks. Domain Specific RAG May 2023
Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory This paper introduces "selfmem," a framework for retrieval-augmented text generation that addresses the limitations of traditional memory retrieval methods by leveraging the model's own outputs as an unbounded memory pool. This self-memory approach allows for iterative improvements in text generation tasks by using the model's generated content as new memory sources for subsequent generations. Tested across neural machine translation, abstractive text summarization, and dialogue generation tasks, the selfmem framework has shown remarkable performance, setting new benchmarks in several domains. The study also provides a detailed analysis of the framework's components, offering valuable insights for future research in retrieval-augmented text generation. Memory Improvement May 2023
Query Rewriting for Retrieval-Augmented Large Language Models The study proposes a framework for improving retrieval-augmented Large Language Models through query rewriting, named Rewrite-Retrieve-Read. Unlike conventional approaches that focus on enhancing either the retrieval process or the reading comprehension capabilities of LLMs, this framework emphasizes refining the search queries themselves to bridge the gap between the input text and the necessary knowledge for retrieval. By generating an initial query with an LLM and then refining it using a trainable small language model, the approach uses web search engines for more accurate context retrieval. The rewriter is further optimized with reinforcement learning based on feedback from the LLM reader. Demonstrated across open-domain and multiple-choice QA tasks, this method shows significant performance improvements, highlighting its effectiveness and scalability for retrieval-augmented LLM applications Input Preprocessing May 2023
Knowledge Graph-Augmented Language Models for Knowledge-Grounded Dialogue Generation The paper introduces SURGE, a framework designed to enhance knowledge-grounded dialogue generation by integrating Knowledge Graphs (KGs) into the language model's response process. SURGE improves the relevance and factual accuracy of dialogue responses by retrieving context-specific subgraphs from KGs and ensuring consistency in the generated text through innovative word embedding perturbations and contrastive learning. This approach guarantees that the dialogue is grounded in accurate and relevant knowledge. Tested on the OpendialKG and KOMODIS datasets, SURGE demonstrates its ability to produce high-quality, knowledge-rich dialogues, addressing the challenge of ensuring the use of pertinent knowledge in dialogue generation. Retriever Improvement May 2023
Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data The SANTA model focuses on improving the retrieval of structured data through a unique approach that educates language models on the intricacies of structured content. By aligning structured and unstructured data and honing in on entities within structured data, SANTA creates a shared embedding space for both types of data, enhancing its retrieval capabilities. This method has shown impressive results in tasks like code and product searches, even in scenarios where it hasn't been directly trained, thanks to its specialized pretraining techniques. Essentially, SANTA stands out by teaching language models to better understand and utilize structured data's distinct characteristics. Retriever Improvement May 2023
Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In The paper introduces an new approach to retrieval augmentation for language models through the Augmentation-Adapted Retriever (AAR). Unlike previous methods that tightly integrate the retriever and LM, AAR acts as a flexible plug-in, capable of working with various LMs without requiring joint fine-tuning. This adaptability allows AAR to provide relevant external information to enhance LMs on knowledge-intensive tasks, even if these LMs were not part of its initial training set. Tested across a range of model sizes, AAR shows remarkable ability to boost zero-shot generalization capabilities of LMs from small to very large, demonstrating that learning from one LM's preferences can benefit a wide array of others. This research highlights the potential of making retrieval augmentation more universally applicable across different LMs. Retriever Improvement May 2023
Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy The paper introduces Iter-RetGen, a method that enhances retrieval-augmented large language models by initiating a dynamic interaction between retrieval and generation processes. This iterative synergy allows the model to refine its search for external knowledge based on initial outputs and then improve subsequent generations using the newly retrieved information. Unlike other methods that may impose structural constraints by interleaving retrieval with generation, Iter-RetGen treats retrieved knowledge as a unified whole, maintaining generation flexibility. Tested on tasks like multi-hop question answering, fact verification, and commonsense reasoning, Iter-RetGen not only efficiently combines parametric and non-parametric knowledge but also shows superior or competitive results compared to leading models, all while minimizing retrieval and generation overheads. RAG Enhanced LLMs May 2023
Prompt-Guided Retrieval Augmentation for Non-Knowledge-Intensive Tasks This paper introduces PGRA, a two-stage framework designed to enhance non-knowledge-intensive (NKI) tasks using retrieval-augmented methods. Unlike previous research focused on knowledge-intensive tasks, PGRA addresses the unique challenges of NKI tasks by first using a task-agnostic retriever to efficiently select candidate evidence from a shared static index. Then, a prompt-guided reranker tailors the evidence to the specific task needs. The approach not only surpasses existing retrieval-augmented methods in performance but also showcases flexibility across different tasks, marking a significant step forward in applying retrieval augmentation to a broader range of NLP tasks. Retriever Innovation May 2023
RET-LLM: Towards a General Read-Write Memory for Large Language Models RET-LLM introduces a framework that integrates a general write-read memory unit into Large Language Models, addressing their limitation in explicitly storing and retrieving knowledge. This approach, rooted in Davidsonian semantics, allows LLMs to handle information more dynamically, storing knowledge in scalable, updatable triplets. The framework enhances LLMs' performance on question answering tasks, particularly those requiring an understanding of time-dependent information, and outperforms traditional models in both effectiveness and interpretability Memory Improvement May 2023
Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources Chain-of-Knowledge (CoK) is a framework designed to enhance Large Language Models by dynamically integrating grounding information from diverse sources, aiming to produce more accurate and hallucination-free content. CoK operates through a three-stage process: starting with reasoning preparation, it moves to dynamic knowledge adapting where it corrects initial rationales by incorporating knowledge from relevant domains, and concludes with answer consolidation. Unique to CoK is its ability to utilize both structured (e.g., Wikidata, tables) and unstructured knowledge, facilitated by an adaptive query generator capable of handling various query languages. This methodology ensures a robust foundation for generating factual responses by minimizing errors through a step-by-step rationale correction process. CoK has demonstrated its effectiveness in improving LLMs' performance on a broad spectrum of knowledge-intensive tasks. Retriever Improvement May 2023
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study The paper investigates whether large autoregressive language models should be pretrained with retrieval. They conduct a comprehensive analysis using RETRO, a scalable retrieval-augmented LM, compared to standard GPT models. Findings reveal that RETRO outperforms GPT in text generation, demonstrating less degeneration and higher factual accuracy, with lower toxicity. Additionally, RETRO excels in knowledge-intensive tasks on the LM Evaluation Harness benchmark. They introduce RETRO++, a variant improving open-domain QA results, showcasing the potential of pretraining autoregressive LMs with retrieval. RAG Enhanced LLMs April 2023
UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation UPRISE aims to enhance the versatility of Large Language Models by introducing a method that automatically retrieves suitable prompts for any given zero-shot task without the need for model or task-specific adjustments. This approach proves effective across various tasks and models, even those not seen during training, and demonstrates its capability to reduce the occurrence of hallucinations in models like ChatGPT. UPRISE's lightweight retriever is trained with GPT-Neo-2.7B but shows remarkable performance improvements on a wide range of larger LLMs, highlighting its potential to universally enhance LLM performance. LLM Generalization March 2023