Skip to content

qian-yu-db/Fins-SSA-GenAi-Offerings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fins SSA GenAI Offering

The goal of this repo is to develop and deliver GenAI solutions to enable and accelerate customer GenAI PoC Projects with Databricks

Table of Content

Requirements

  • Databricks workspace with Serverless and Unity Catalog enabled
  • Python 3.9+

PoC Accelerator Short Assets

Data Ingestion and Preprocessing Architecture Patterns

Input Data Types Input Data Store chunking performed OSS Technolgoy Component Asset
JSON Text Transcripts Unity Catalog Volum None None Json Data Ingestion with DLT (python), DLT (SQL)
PDF Doc (with tables) Unity Catalog Volum Unstructured chunking strategy Unstructured PDF Doc Ingestion
Image Doc (text extraction) Unity Catalog Volum Unstructured chunking strategy Unstructured Image Doc Ingestion

End to End GenAI Application Architecture Patterns

Input Data Model Tasks GenAI Use Case Orchestration Customer Persona PoC Template
JSON Text Transcripts Foundation LLM (e.g. Llama3p1) ummarization, Sentiment, classification AI Function, DBSQL Agent DLT, LangChain Data Analyist, Data Scientist Call Center Transcript Analytics with AI
JSON Text Transcripts Foundation LLM (e.g. Llama3p1) ummarization, Sentiment RAG DLT, LangChain Data Scientist, MLE, Data Engineer Call Center Transcript RAG Apps
wav Audio Foundation LLM (e.g. Llama3p1) Speech Transcription, Summarization, Sentiment RAG DLT, LangChain Data Scientist, MLE, Data Engineer Call Center Audio to Text RAG Apps
PDF Documents Foundation LLM (e.g. Llama3p1) Unstructured Data Processing, Name Entity Recognition NER DLT, Function calling Data Scientist, MLE, Data Engineer PDF_Doc Ingestion

When to Use

You have a business use case that can potentially apply generative AI technology and fall into one of the PoC accelerator template. You have access to a unity catalog enabled Databricks Workspace.

You may have some existing data available in the workspace to use as input data. If you don't have any data, the PoC accelerator templates contains synthetic sample datasets to enable the demonstration of genAI application's functionalities

Getting Started

Clone this repo and add the repo to your Databricks Workspace. Refer to Databricks Repo Setup for instuctions on how to create Databricks repo on your workspace

  1. Got into the folder of the selected PoC accelerator template
  2. Review the architecture diagram in the README
  3. Start with the instruction notebook
  4. Follow the instructions in the instruction notebook.
  5. Most of notebook can run by click Run/Run ALL but some may require additional steps of using databricks UI so be sure to read the instruction

Resources

Limitations

  • The PoC accelerator template is designed for use Unit Catalog managed workspace only.
  • The synthetic dataset provided by Databricks are generated algorithmatically based on assumptions and they are not real-life data.
  • Delta Live Table technology from Databricks is used in some of PoC Accelerator Template, Currently the live table (a.k.a materialized view) from Delta Live Table cannot only be accessed by shared clusters, therefore, a copy of the materialized views are being used in some of notebooks. The limitation will be addressed in the future product releases

About

Fins SSA Offering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages