The goal of this repo is to develop and deliver GenAI solutions to enable and accelerate customer GenAI PoC Projects with Databricks
- Databricks workspace with Serverless and Unity Catalog enabled
- Python 3.9+
Input Data Types | Input Data Store | chunking performed | OSS Technolgoy | Component Asset |
---|---|---|---|---|
JSON Text Transcripts | Unity Catalog Volum | None | None | Json Data Ingestion with DLT (python), DLT (SQL) |
PDF Doc (with tables) | Unity Catalog Volum | Unstructured chunking strategy | Unstructured | PDF Doc Ingestion |
Image Doc (text extraction) | Unity Catalog Volum | Unstructured chunking strategy | Unstructured | Image Doc Ingestion |
Input Data | Model | Tasks | GenAI Use Case | Orchestration | Customer Persona | PoC Template |
---|---|---|---|---|---|---|
JSON Text Transcripts | Foundation LLM (e.g. Llama3p1) | ummarization, Sentiment, classification | AI Function, DBSQL Agent | DLT, LangChain | Data Analyist, Data Scientist | Call Center Transcript Analytics with AI |
JSON Text Transcripts | Foundation LLM (e.g. Llama3p1) | ummarization, Sentiment | RAG | DLT, LangChain | Data Scientist, MLE, Data Engineer | Call Center Transcript RAG Apps |
wav Audio | Foundation LLM (e.g. Llama3p1) | Speech Transcription, Summarization, Sentiment | RAG | DLT, LangChain | Data Scientist, MLE, Data Engineer | Call Center Audio to Text RAG Apps |
PDF Documents | Foundation LLM (e.g. Llama3p1) | Unstructured Data Processing, Name Entity Recognition | NER | DLT, Function calling | Data Scientist, MLE, Data Engineer | PDF_Doc Ingestion |
You have a business use case that can potentially apply generative AI technology and fall into one of the PoC accelerator template. You have access to a unity catalog enabled Databricks Workspace.
You may have some existing data available in the workspace to use as input data. If you don't have any data, the PoC accelerator templates contains synthetic sample datasets to enable the demonstration of genAI application's functionalities
Clone this repo and add the repo to your Databricks Workspace. Refer to Databricks Repo Setup for instuctions on how to create Databricks repo on your workspace
- Got into the folder of the selected PoC accelerator template
- Review the architecture diagram in the README
- Start with the
instruction
notebook - Follow the instructions in the
instruction
notebook. - Most of notebook can run by click
Run/Run ALL
but some may require additional steps of using databricks UI so be sure to read the instruction
- Databricks GenAI Cookbook
- Databricks Foundation Model
- Model Serving
- Vector Search
- Inference Table
- Delta Live Table
- Databricks Python SDK
- MLFlow
- The PoC accelerator template is designed for use Unit Catalog managed workspace only.
- The synthetic dataset provided by Databricks are generated algorithmatically based on assumptions and they are not real-life data.
- Delta Live Table technology from Databricks is used in some of PoC Accelerator Template, Currently the live table (a.k.a materialized view) from Delta Live Table cannot only be accessed by shared clusters, therefore, a copy of the materialized views are being used in some of notebooks. The limitation will be addressed in the future product releases