The repository provides a collection of vision language models, benchmarks, and related applications, released as part of Project MONAI (Medical Open Network for Artificial Intelligence).
- [2024/12/04] The arXiv version of VILA-M3 is now available here.
- [2024/10/31] We released the VILA-M3-3B, VILA-M3-8B, and VILA-M3-13B checkpoints on HuggingFace.
- [2024/10/24] We presented VILA-M3 and the VLM module in MONAI at MONAI Day (slides, recording)
- [2024/10/24] Interactive VILA-M3 Demo is available online!
VILA-M3 is a vision language model designed specifically for medical applications. It focuses on addressing the unique challenges faced by general-purpose vision-language models when applied to the medical domain and integrated with existing expert segmentation and classification models.
For details, see here.
Please visit the VILA-M3 Demo to try out a preview version of the model.
- To run the demo, we recommend building a Docker container with all the requirements.
We use a base image with cuda preinstalled.
docker build --network=host --progress=plain -t monai-m3:latest -f m3/demo/Dockerfile .
- Run the container
docker run -it --rm --ipc host --gpus all --net host monai-m3:latest bash
Note: If you want to load your own VILA checkpoint in the demo, you need to mount a folder using
-v <your_ckpts_dir>:/data/checkpoints
in yourdocker run
command. - Next, follow the steps to start the Gradio Demo.
-
Linux Operating System
-
CUDA Toolkit 12.2 (with
nvcc
) for VILA.To verify CUDA installation, run:
nvcc --version
If CUDA is not installed, use one of the following methods:
- Recommended Use the Docker image:
nvidia/cuda:12.2.2-devel-ubuntu22.04
docker run -it --rm --ipc host --gpus all --net host nvidia/cuda:12.2.2-devel-ubuntu22.04 bash
- Manual Installation (not recommended) Download the appropiate package from NVIDIA offical page
- Recommended Use the Docker image:
-
Python 3.10 Git Wget and Unzip:
To install these, run
sudo apt-get update sudo apt-get install -y wget python3.10 python3.10-venv python3.10-dev git unzip
NOTE: The commands are tailored for the Docker image
nvidia/cuda:12.2.2-devel-ubuntu22.04
. If using a different setup, adjust the commands accordingly. -
GPU Memory: Ensure that the GPU has sufficient memory to run the models:
- VILA-M3: 8B: ~18GB, 13B: ~30GB
- CXR: This expert dynamically loads various TorchXRayVision models and performs ensemble predictions. The memory requirement is roughly 1.5GB in total.
- VISTA3D: This expert model dynamically loads the VISTA3D model to segment a 3D-CT volume. The memory requirement is roughly 12GB, and peak memory usage can be higher, depending on the input size of the 3D volume.
- BRATS: (TBD)
-
Setup Environment: Clone the repository, set up the environment, and download the experts' checkpoints:
git clone /~https://github.com/Project-MONAI/VLM --recursive cd VLM python3.10 -m venv .venv source .venv/bin/activate make demo_m3
-
Navigate to the demo directory:
cd m3/demo
-
Start the Gradio demo:
This will automatically download the default VILA-M3 checkpoint from Hugging Face.
python gradio_m3.py
-
Alternative: Start the Gradio demo with a local checkpoint, e.g.:
python gradio_m3.py \ --source local \ --modelpath /data/checkpoints/<8B-checkpoint-name> \ --convmode llama_3
For details, see the available commmandline arguments.
- This is still a work in progress. Please refer to the README for more details.
To lint the code, please install these packages:
pip install -r requirements-ci.txt
Then run the following command:
isort --check-only --diff . # using the configuration in pyproject.toml
black . --check # using the configuration in pyproject.toml
ruff check . # using the configuration in ruff.toml
To auto-format the code, run the following command:
isort . && black . && ruff format .
If you find this work useful in your research, please consider citing:
@article{nath2024vila,
title={VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge},
author={Nath, Vishwesh and Li, Wenqi and Yang, Dong and Myronenko, Andriy and Zheng, Mingxin and Lu, Yao and Liu, Zhijian and Yin, Hongxu and Law, Yee Man and Tang, Yucheng and others},
journal={arXiv preprint arXiv:2411.12915},
year={2024}
}