Your AI-Powered Intelligent Search Assistant for Insurance Documents
- Introduction
- Problem Statement
- Objectives
- Approach
- Features
- Technologies/Libraries Used
- Installation
- Usage
- Demonstration
- Conclusions
- Glossary
- Acknowledgements
- Contributing
- License
- Author
Generative Search Help Mate AI is a system designed to perform intelligent document searches using semantic search and re-ranking techniques. It processes PDF documents, retrieves relevant information, and generates responses based on the search results.
Traditional keyword-based searches often fail to retrieve accurate information from complex insurance policy documents due to ambiguous terms and lack of contextual understanding.
This project aims to build a Retrieval-Augmented Generation (RAG) based generative search system that enhances search accuracy by:
- Using efficient text chunking for better document processing.
- Leveraging semantic search and re-ranking for relevant results.
- Generating context-aware answers using a robust AI model.
This system will provide precise, efficient, and user-friendly access to policy information, overcoming the limitations of conventional search methods.
- To develop an AI-powered search system that retrieves contextually relevant information.
- To integrate semantic search and re-ranking techniques for better search results.
- To provide an efficient and user-friendly document search experience.
- Extract and preprocess text from PDF documents.
- Generate embeddings for document chunks using transformer-based models.
- Perform semantic search with caching for efficiency.
- Re-rank search results to improve relevance.
- Generate responses based on retrieved information.
- Semantic Search: Retrieves contextually relevant document excerpts.
- Re-ranking: Enhances search result relevance using AI-based ranking models.
- Efficient Caching: Stores frequently searched results for faster retrieval.
- Automated Response Generation: Provides summarized answers based on search results.
- Python 3.11 or higher
- pdfplumber
- tiktoken
- OpenAI API
- ChromaDB
- Sentence-Transformers
- Python 3.11 or higher
- pip (Python package installer)
- Clone the repository:
git clone /~https://github.com/yourusername/Generative-Search-Help-Mate-AI.git cd Generative-Search-Help-Mate-AI
- Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
- Install the required packages:
pip install -r requirements.txt
- Set up environment variables:
- Create a
.env
file in the root directory. - Add your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key
- Create a
The notebook installs and imports the necessary libraries such as pdfplumber
, tiktoken
, openai
, chromadb
, and sentence-transformers
.
The project uses pdfplumber
to read and process PDF files. The path to the PDF file is defined, and the file is read and chunked for further processing.
The project performs a semantic search of a query in the collection embeddings to retrieve the top semantically similar results. The search results are cached for efficiency.
The top search results are re-ranked based on their relevance to the input query.
The project generates responses based on the top re-ranked search results.
Sample queries are evaluated to demonstrate the functionality of the generative search system.
Performs a semantic search for the given input query and returns the search results.
Re-ranks the search results based on their relevance to the input query.
Combines semantic search and re-ranking to generate the final response for the input query.
Generate a response using GPT-3.5's ChatCompletion based on the user query and retrieved information.
You can view the demo materials
Generative Search Help Mate AI enhances document search accuracy by utilizing semantic search and re-ranking techniques. This project enables efficient retrieval of relevant information from large document repositories.
- Semantic Search: AI-based search method that considers contextual meaning rather than just keywords.
- Re-ranking: A technique to improve search result relevance using AI-based ranking models.
- ChromaDB: A database optimized for embedding-based search.
- Transformer Models: Advanced deep learning models used for NLP tasks.
- The project references presentations in upGrad’s recorded module given by Aditya Bhattacharya.
- The project references presentations in upGrad’s recorded module given by Akshay Ginodia.
- The project references insights and inferences from presentations in upGrad’s doubt clear session given by Shridhar Galande.
- The project references presentations in upGrad's live class given by Sheshanth AS.
Contributions are welcome! Please fork the repository and submit a pull request for improvements or bug fixes.
This project is licensed under the MIT License. See the LICENSE
file for details.