Merge pull request #84 from AI4WA/doc

Doc
AI4WA · Jul 25, 2024 · 74b3e4c · 74b3e4c
2 parents c728350 + d87999b
commit 74b3e4c
Show file tree

Hide file tree

Showing 19 changed files with 199 additions and 9 deletions.
diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml
@@ -25,12 +25,6 @@ jobs:
           source: "omni.tar"
           target: "/root/"
 
-      - name: Archive code deployment artifacts
-        uses: actions/upload-artifact@v3
-        with:
-          name: code-zip
-          path: omni.tar
-
       - name: Setup AWS Credentials
         uses: appleboy/ssh-action@master
         with:

diff --git a/docs/LeaderBoard.md b/docs/LeaderBoard.md
diff --git a/docs/Tutorial.md b/docs/Tutorial.md
@@ -0,0 +1,195 @@
+# Tutorial
+
+We will present setup and run the end to end pipeline.
+
+Mainly will include these sections:
+
+- [Setup and run the pipeline successfully](#setup-and-run-the-pipeline-successfully)
+- [Annotation and Evaluation Benchmark](#annotation-and-evaluation-benchmark)
+- [Pipeline customisation](#pipeline-customisation)
+- [Annotation customisation](#annotation-customisation)
+
+---
+
+## Setup and run the pipeline successfully
+
+Deployment mode will be **All in One Local Machine** for demonstration purposes.
+This means all of your components will be running on your local machine or your PC.
+To get started, you will need a decent machine (as we will run some local LLMs) with camera, microphone and speaker,
+which most of the laptops have.
+
+And you will also need to have Python, Docker installed on your machine.
+
+### How to get started?
+
+**Step 1**: Clone the repository
+
+```bash
+# switch to a proper directory
+git clone git@github.com:AI4WA/OpenOmniFramework.git
+```
+
+**Step 2**: Get API running
+
+```bash
+cd ./OpenOmniFramework
+cd ./API
+# Run it inside docker, this is the easiest way to get started
+docker compose up
+```
+
+After this, you should be able to access the API at `http://localhost:8000`.
+Username/Password will be `admin/password`.
+
+**Step 3**: Grab the Token for Authentication
+
+Login to the API admin, go to `http://localhost:8000/authtoken/tokenproxy/` and click `Add Token`.
+
+![Add Token](./images/grab_token.png)
+
+**Step 4**: Collect Audio and Video Data
+
+```bash
+cd ./OpenOmniFramework
+cd ./Client/Listener
+
+# create the virtual environment if this is your first time run this
+python3 -m venv venv
+source venv/bin/activate
+pip3 install -r requirements.txt
+pip3 install -r requirements.dev.txt # if you are doing further development
+
+# run video acquire
+python3 videos_acquire.py --token your_token_from_step_3
+```
+
+You should be able to see something like this:
+![video_cli](./images/video_cli.png)
+
+Then open a new terminal
+
+```bash
+cd ./OpenOmniFramework
+cd ./Client/Listener
+
+# create the virtual environment if this is your first time run this
+python3 -m venv venv
+source venv/bin/activate
+pip3 install -r requirements.txt
+pip3 install -r requirements.dev.txt # if you are doing further development
+
+# run audio acquire
+python3 audios_acquire.py --token your_token_from_step_3 --track_cluster CLUSTER_GPT_4O_ETE_CONVERSATION 
+# you can change the cluster to the one your need
+```
+
+You will see something like this:
+![audio_cli](./images/audio_cli.png)
+
+If everything works, you should be able to check the newly create `Data Audios`, `Data Videos` and `Speech2Text` `Tasks`
+in API Admin page.
+Something like below:
+![tasks](./images/Tasks.png)
+![audio](./images/Audio.png)
+![video](./images/video.png)
+
+**Step 5**: Run AI models
+Now we need to start AI module to consume the `Tasks`.
+
+```bash
+cd ./OpenOmniFramework
+cd ./AI
+
+python3 -m venv venv
+source venv/bin/activate
+pip3 install -r requirements.txt
+pip3 install -r requirements.dev.txt # if you are doing further development
+```
+
+Before we start the AI module, there are some pre configurations we need to do.
+
+As provided functionalities within AI modules support OpenAI call, HuggingFace call, and there is also our provided
+emotion detection module.
+
+We need to get them setup first.
+
+*Setup OpenAI and HuggingFace Environment Variable*
+
+Create a `.env` file in `./AI` folder, and add the following content:
+
+```bash
+HF_TOKEN=Your_HuggingFace_Token
+OPENAI_API_KEY=Your_OpenAI_API_KEY
+```
+
+Otherwise, you can run
+
+```bash
+export HF_TOKEN=Your_HuggingFace_Token
+export OPENAI_API_KEY=Your_OpenAI_API_KEY
+```
+
+For the model part, if you want to get our emotion detection model running, you will need to download the model
+from [download link](https://openomni.s3.eu-west-1.amazonaws.com/models/emotion_detection.zip)
+
+And put it in the folder: `./AI/data/models/emotion_detection/model_data`.
+It should be like this
+
+![emotion_model](./images/model_data.png)
+
+Then you should be ready to run the AI module.
+
+```bash
+# run the AI module
+python3 main.py --token your_token_from_step_3
+```
+
+You can also skip the steps to install the requirements, directly run the AI module with docker.
+
+```bash
+TOKEN=XXX docker compose up
+```
+
+This will allow you to utilise the GPU resources on your machine if you have one.
+
+![ai_running](./images/ai_running.png)
+
+Until now, you will have the client side to feed the video/audio data to the API, and the AI module to consume the data.
+
+**Step 6**: Play speech audio in client side
+
+```bash
+cd ./OpenOmniFramework
+cd ./Client/Responder
+
+# create the virtual environment if this is your first time run this
+python3 -m venv venv
+source venv/bin/activate
+pip3 install -r requirements.txt
+pip3 install -r requirements.dev.txt # if you are doing further development
+
+# run the audio player
+
+python3 play_speech.py --token your_token_from_step_3
+```
+
+You will see something like this:
+
+![audio_play](./images/audio_speech.png)
+
+Until now, you should have the whole pipeline running on your local machine.
+
+You should see new tasks created as expected in the `Tasks` page in the API admin page.
+As shown below:
+
+![tasks](./images/full_tasks.png)
+
+And in the Detailed Latency Benchmark page, you should be able to see the latency of each round of conversation.
+
+![latency](./images/detailed_latency.png)
+
+## Evaluation and Annotation Benchmark
+
+## Pipeline customisation
+
+## Annotation customisation
diff --git a/docs/images/Audio.png b/docs/images/Audio.png
diff --git a/docs/images/GPT-4o.jpg b/docs/images/GPT-4o.jpg
diff --git a/docs/images/Tasks.png b/docs/images/Tasks.png
diff --git a/docs/images/Triangle.jpg b/docs/images/Triangle.jpg
diff --git a/docs/images/ai_running.png b/docs/images/ai_running.png
diff --git a/docs/images/audio_cli.png b/docs/images/audio_cli.png
diff --git a/docs/images/audio_speech.png b/docs/images/audio_speech.png
diff --git a/docs/images/client.jpg b/docs/images/client.jpg
diff --git a/docs/images/detailed_latency.png b/docs/images/detailed_latency.png
diff --git a/docs/images/full_tasks.png b/docs/images/full_tasks.png
diff --git a/docs/images/grab_token.png b/docs/images/grab_token.png
diff --git a/docs/images/model_data.png b/docs/images/model_data.png
diff --git a/docs/images/video.png b/docs/images/video.png
diff --git a/docs/images/video_cli.png b/docs/images/video_cli.png
diff --git a/docs/index.md b/docs/index.md
@@ -19,7 +19,7 @@ Multimodal Open Source Framework for Conversational AI Research and Development.
         - [AI](#ai)
 - [Benchmark Examples](#benchmark-examples)
 - [Deployment Options](#deployment-options)
-- [Tutorial](#tutorial)
+- [Tutorial](./Tutorial.md)
 
 ----
 

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -12,14 +12,15 @@ theme:
 repo_url: /~https://github.com/AI4WA/OpenBenang
 nav:
   - Home: index.md
-  - Leaderboard: LeaderBoard.md
+  - Tutorial: Tutorial.md
+  - Deployment: Deployment/index.md
   - Modules:
       - Client:
           - Introduction: Client/main.md
           - Setup: Client/setup.md
       - API: API/main.md
       - AI: AI.md
-  - Deployment: Deployment/index.md
+
   - Source:
       - Client:
           - Introduction: Source/index.md