Project name: DishDecode

Project description: This project aims to extract and organize structured recipe information from unstructured YouTube videos using audio, text, and visual cues. Combining ML, LLMs, and video analysis, DishDecode delivers PDF-formatted recipes and a searchable database with single-click data download, ensuring easy access to complete recipe details.

Details: Our preferred case: YouTube video + English audio description for the recipe (This is the case for the majority of the known chefs making videos like Gordon Ramsay, like this video: https://www.youtube.com/watch?v=xIQXLTFup6M and https://www.youtube.com/watch?v=mhDJNfV7hjk )

We will be extracting the following details from the input video

Recipe name
Ingredients used
Methodology The above three can be done using the video

Below are a few things that will be done using the LLM, like Gemini and llama3.1/3.2 free-tier versions. 4. Nutritional value (Macros only) 5. Serving size (based on the quantity of ingredients used in other recipes with similar quantity of ingredients) 6. Other recipe-specific details (Extracted based on both the video and LLM knowledge)

Deliverables. Goal: Create structured data from the Unstructured data like video and texts

Output a Pdf formatted recipe
Database management of the curated recipe details(In MongoDB or SQL either one of them will be used)
Single-click recipe data download allows users to get the recipe data without much scraping hassle.
A searchable database to allow users to search for specific recipes based on the recipe name.

Team members and their expertise: Aniket Panchal: Python, ML and LLMs, Streamlit, PostgreSQL, MongoDB Drashy: Python, MongoDB, SQL, MERN, Web hostings Aryan Sharma: Python, MongoDB, MERN, SQL, ML algorithms, Web hostings, Django

Tech stack to be used: For programming language: Python For frontend: Either MERN OR Streamlit OR Django For hosting purposes: Either local host OR Vercel/Render OR Hugging Face For database: Either MongoDB OR SQL For LLM tasks: Gemini APIs, LLAMA APIs, For Information retrieval from Video: Either WhisperAI OR Google APIs OR self-made Custom code (We will see which tech stack provides seamless integration with ML and LLM apis)

Possible Challenges If there is no audio description in the video, we observed two cases.

there are no text blocks in the video that can give us information on the recipe. We can open CV or YOLO models for image detection and can get further details using the GPT or Gemini models.
Text blocks are present in the video(like this one: https://www.youtube.com/watch?v=j1Jq8JjvSMc ) that can provide us with some information on the recipe. We can extract those blocks using OpenCV object and text detection models to extract information from them, and then we can use the GPT and Gemini LLM-based approaches to complete the relevant information in a structured format.

If time allows, we will try to overcome these challenges.

HOW TO RUN

Download latest python version
Download an IDE (VS code, pycharm)
Create a new python environment
Run this command pip install openai-whisper torch
Downlaod ffmpeg from the official website
Install the executable zip file as per your OS
Extract it in C directory C:/ffmpeg< version number >
Open this folder, go in bin folder and copy the path of this location
Open environment variables setting on your system -> Edit environment variables -> Under system variables -> Click Path -> Click Edit button New window appears -> Click new -> Paste the location path that you copied -> Click OK to save
Restart the IDE
Run the files.
Next, Get the Google Gemini 1.5FLash API key
Run this command pip install -q -U google-generativeai

Testing with deepgram nova for whisperAI API tasks :: Successfull

Step 1: Get the api key from https://console.deepgram.com/ Step 2: pip install "deepgram-sdk>=3.*,<4.0" Step 3: Get the video converted to .wav at backend using the ffmpeg subprocess command and send it over the API to its server, and it returns a transcript for that video Step 4: Then this transcript is sent over the gemini pro API (google based LLM) along with a promt which tells the LLM to structure the unstructured transcript. Step 5: Then this structured information is sent over to the backend which sense the
response to frontend.

Testing with Text extraction using OpenCV python to extract information from Video

which has text description in the video https://www.geeksforgeeks.org/text-detection-and-extraction-using-opencv-and-ocr/

Download the tesseract-OCR executable file from it's github (/~https://github.com/UB-Mannheim/tesseract/wiki) install in the desired directory, add this directory to the path variables in system environments

This is done to handle the case of videos where audio description is not present in the video This, OCR extracts the textual data and present in the video, extracts it from the frames at an interval of 2 seconds.

Final response

It is made using the information from the audio transcription (DeepgramNovo) and video information extraction(tesseract-OCR) combined we are handling a much wider range of types of recipe videos

The information is combined and then sent to the Gemini1.5 Flash API to get the structured information we desire, with the help of AI we are able to get the data in proper format and meaningful extraction was possible.

🍽️ Recipe Extraction API

This project is a Flask-based API that extracts structured recipe information from cooking tutorial videos! It uses the Deepgram API for audio transcription, Tesseract OCR for text extraction from video frames, and the Gemini API to generate a well-structured recipe document. 🚀

📦 Project Setup

Follow these steps to set up and run the project on your local machine.

1️⃣ Clone the Repository

git clone <your-repo-url>
cd <your-repo-folder>

2️⃣ Install Dependencies

Make sure you have Python installed (Python 3.8 or above is recommended). Install the required libraries using pip:

pip install -r requirements.txt

3️⃣ Install Tesseract OCR

Ensure Tesseract OCR is installed on your system. You can download it here: Tesseract GitHub

Add Tesseract to your system path and make sure to note its installation location.

On Windows:

Add the path to tesseract.exe to your environment variables, e.g.:

C:\Program Files\Tesseract-OCR

On MacOS (using Homebrew):

brew install tesseract

On Ubuntu:

sudo apt-get install tesseract-ocr

4️⃣ Setup Environment Variables

Create a .env file in the root directory and add your API keys:

FIRST_API_KEY=<Your Gemini API Key>
SECOND_API_KEY=<Your Deepgram API Key>

5️⃣ Install FFmpeg

This project uses FFmpeg for converting MP4 videos to WAV audio. Install it via the following:

On MacOS (using Homebrew):

brew install ffmpeg

On Ubuntu:

sudo apt-get install ffmpeg

On Windows:

Download FFmpeg from FFmpeg.org and add it to your system path.

🚀 Running the Project

Start the Flask server with the following command:

python app.py

If everything is set up correctly, you should see:

 * Running on http://127.0.0.1:5000/

📡 API Endpoints

✅ Health Check

Endpoint: GET /

Check if the API is running.

curl http://127.0.0.1:5000/

Response:

{
    "status": "success",
    "message": "API is running successfully!"
}

🍲 Recipe Extraction

Endpoint: POST /process-video

Request Body:

Send a JSON payload with a video URL:

{
    "videoUrl": "<URL-of-the-cooking-video>"
}

Example Using `curl`:

curl -X POST http://127.0.0.1:5000/process-video \
-H "Content-Type: application/json" \
-d '{"videoUrl": "https://example.com/video.mp4"}'

Sample Response:

{
    "**1. Recipe Name:**": "Beef Wellington",
    "**2. Ingredients List:**": "* Fillet of beef\n* Olive oil\n* Salt\n* Pepper",
    "**3. Steps for Preparation:**": "1. Sear the beef fillet\n2. Brush with mustard",
    "**4. Cooking Techniques Used:**": "* Searing\n* Wrapping",
    "**5. Equipment Needed:**": "* Hot pan\n* Blender",
    "**6. Nutritional Information:**": "High in protein and fat",
    "**7. Serving size:**": "2-4 people",
    "**8. Special Notes or Variations:**": "Use horseradish instead of mustard",
    "**9. Festive or Thematic Relevance:**": "Christmas alternative to roast turkey"
}

🛠️ Key Features

Deepgram API for accurate audio transcription.
Tesseract OCR for extracting text from video frames.
Gemini API for generating structured recipe information.
FFmpeg for seamless MP4-to-WAV conversion.
Supports both audio and video analysis for enhanced accuracy. 🎯

🧪 Testing

Use tools like Postman or curl to test the API endpoints.

🤝 Contributions

Contributions are welcome! Feel free to submit a pull request or open an issue for any enhancements or bug fixes.

📄 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
.env		.env
.gitignore		.gitignore
API_hit_first.py		API_hit_first.py
CHRISTMAS RECIPE_Christmas Beef Wellington.mp4		CHRISTMAS RECIPE_Christmas Beef Wellington.mp4
CHRISTMAS RECIPE_Christmas Beef Wellington.wav		CHRISTMAS RECIPE_Christmas Beef Wellington.wav
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
Sample response.txt		Sample response.txt
Text_detection_openCV.ipynb		Text_detection_openCV.ipynb
Text_to_information_gemini.ipynb		Text_to_information_gemini.ipynb
Video_processing_API.py		Video_processing_API.py
Video_text_recipe_opencv.mp4		Video_text_recipe_opencv.mp4
WhisperAI_for_All_audio.ipynb		WhisperAI_for_All_audio.ipynb
Youtube_Audio_to_text_extraction.ipynb		Youtube_Audio_to_text_extraction.ipynb
app.py		app.py
deepGramNovaTranscript.txt		deepGramNovaTranscript.txt
deepgramNovaAPI.py		deepgramNovaAPI.py
extracted_text.txt		extracted_text.txt
ffmpeg-7.1-full_build.7z		ffmpeg-7.1-full_build.7z
output.txt.txt		output.txt.txt
recipe_image.png		recipe_image.png
recipe_info_text.txt		recipe_info_text.txt
requirements.txt		requirements.txt
sampleImage2.png		sampleImage2.png
tesseract-sample-image.png		tesseract-sample-image.png
thirdVideo.mp4		thirdVideo.mp4
thirdVideo.wav		thirdVideo.wav
videoplayback.mp4		videoplayback.mp4
videoplayback.wav		videoplayback.wav
youtube_video_downloader.py		youtube_video_downloader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project name: DishDecode

HOW TO RUN

Testing with deepgram nova for whisperAI API tasks :: Successfull

Testing with Text extraction using OpenCV python to extract information from Video

Final response

🍽️ Recipe Extraction API

📦 Project Setup

1️⃣ Clone the Repository

2️⃣ Install Dependencies

3️⃣ Install Tesseract OCR

On Windows:

On MacOS (using Homebrew):

On Ubuntu:

4️⃣ Setup Environment Variables

5️⃣ Install FFmpeg

On MacOS (using Homebrew):

On Ubuntu:

On Windows:

🚀 Running the Project

📡 API Endpoints

✅ Health Check

🍲 Recipe Extraction

Request Body:

Example Using `curl`:

Sample Response:

🛠️ Key Features

🧪 Testing

🤝 Contributions

📄 License

🌟 Happy Coding and Bon Appétit! 👨‍🍳👩‍🍳

About

Releases

Packages

Languages

License

Aniket2021448/Dish-Decode

Folders and files

Latest commit

History

Repository files navigation

Project name: DishDecode

HOW TO RUN

Testing with deepgram nova for whisperAI API tasks :: Successfull

Testing with Text extraction using OpenCV python to extract information from Video

Final response

🍽️ Recipe Extraction API

📦 Project Setup

1️⃣ Clone the Repository

2️⃣ Install Dependencies

3️⃣ Install Tesseract OCR

On Windows:

On MacOS (using Homebrew):

On Ubuntu:

4️⃣ Setup Environment Variables

5️⃣ Install FFmpeg

On MacOS (using Homebrew):

On Ubuntu:

On Windows:

🚀 Running the Project

📡 API Endpoints

✅ Health Check

🍲 Recipe Extraction

Request Body:

Example Using curl:

Sample Response:

🛠️ Key Features

🧪 Testing

🤝 Contributions

📄 License

🌟 Happy Coding and Bon Appétit! 👨‍🍳👩‍🍳

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Example Using `curl`:

Packages