Skip to content

mruiyangyou/RAG-enhanced-LLM-Agent-for-text-to-SQL-generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Retrieval-Augmented-Generation-Enhanced Large Language Model for Text-to-SQL

This repository contains all the relevant codes for building a RAG enhanced LLM for Text-to-SQL, evaluation data and also instructions on how to evaluate the performance by test-suite-sql-eval through Docker and customize your Text-to-SQL evaluation pipeline based on own data by Langsmith.

Tech Stack

My Skills

Architecture

The text-to-SQL architecture, as shown in the figure below, receives two pivotal inputs: the user's natural language question and the database's Data Definition Language (DDL), which outlines the database's schema. The DDL plays a critical role, in equipping the Language Model (LLM) with the necessary context of the available tables and fields for formulating queries. The process commences with a Question Masking step that selectively conceals schema-specific keywords, thus preserving the integral framework of the question. This masked version then acts as a query to fetch the three most structurally similar question-and-answer pairs from a vector database. Retrieving these pairs facilitates the LLM in generating the SQL commands and accessing related documentation relevant to the user's question. Subsequently, a comprehensive prompt is crafted, integrating DDL, SQL Q&A pairs and documentation, which is presented to the LLM for SQL query generation. After the LLM generates the SQL query, it is executed within the user’s database. In the event of execution errors, the LLM is equipped with a self-corrective feature that allows for reiteration, refining the query until a successful execution is achieved. Architecture

Play with the codes

Install relevant packages

git clone /~https://github.com/mruiyangyou/RAG-enhanced-LLM-Agent-for-text-to-SQL-generation.git
cd RAG-enhanced-LLM-Agent-for-text-to-SQL-generation
pip install -r requirements.txt

Prepare api keys:

  1. OpenAI(Chat)
  2. Cohere(rerank)
  3. Pinecone(vectordb)
  4. Langsmith(monitoring)
export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
export LANGCHAIN_API_KEY="xxx"
export LANGCHAIN_PROJECT="xxx"
export PINECONE_API_KEY="xxx"
export COHERE_API_KEY="xxx"
export OPENAI_API_KEY="xxx"
export VANNA_API_KEY="xxx"

Project Structure

data

code

  • baseline: baseline model only relies on schema
  • chain: containing our RAG and RAG-rerank models
  • self-rag: self correct RAG model
  • routing: RAG based router(multi indexes)
  • pinconedb: a built class wrapped Pinecone-client's and langchain-pinecone's api.
  • utils: database related utils function
  • evaluation:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published