Experimenting with public breast cancer mammography datasets (InBreast, MIAS, CBIS-DDSM). Deep learning project focused on mammography screening and expert aid.
- Deep learning model for mass detection (using Detectron2 and Ultralytics)
- Web GUI
- Train and evaluate
- Explainable AI
- ResNet
- Chatbot LLM API
- Combining previous model prediction with expert opinion
- Radiomics
- Nvidia CUDA drivers
- Install a PyTorch compatible version of CUDA from:
- Your Linux repository
apt install nvidia-cuda-toolkit
- NVIDIA website for Windows and Linux
- Pytorch with CUDA support
- Visit PyTorch website for more information
The two above must be installed manually or else will break installation of other requirements later on.
- Linux (Ubuntu/Debian)
sudo apt install libmagic1
- Windows
pip install python-magic-bin
(optional) If you're going to run inside docker, install nvidia-container-toolkit on host. The following is how you install it on Linux:
sudo apt install nvidia-container-toolkit
Supported datasets:
- InBreast
- CBIS-DDSM (Curated Breast Imaging Subset of DDSM)
- MIAS (Mammography Image Analysis Society)
Supported models:
- Generally supported models
- Faster R-CNN (Detectron)
- Any model that supports YOLO / COCO style dataset
- Customized UaNet for 2D mammography images
- Use download_datasets_colab.ipynb jupyter notebook in Google Colab to download all datasets.
- You will need to upload your 'kaggle.json' when the notebook gives you an upload dialog.
- After logging in to kaggle, you can get your kaggle json in API section of https://www.kaggle.com/settings.
- The notebook will clone this repository and download all datasets.
Dataset links:
- https://www.kaggle.com/datasets/ramanathansp20/inbreast-dataset
- https://www.kaggle.com/datasets/awsaf49/cbis-ddsm-breast-cancer-image-dataset
- https://www.kaggle.com/datasets/kmader/mias-mammography
Download the above datasets and after cloning this repository, create the following directories:
- breast_cancer_detection/
- datasets/
- all-mias/
- mdb001.pgm
- ...
- csv/
- jpeg/
- INbreast Release 1.0/
- AllDICOMs/
- ...
Copy datasets to directories accordingly.
After converting the datasets to COCO / YOLO style in the next section (Usage), you may visualize the standardized dataset using the following methods.
python visualizer.py -m coco -d train/images -l train.json
python visualizer.py -m yolo -d train/images -l train/labels
1. Clone this repository
git clone /~https://github.com/monajemi-arman/breast_cancer_detection
2. Install prerequisites
cd breast_cancer_detection
pip install --no-build-isolation -r requirements.txt
2. Download the following datasets
3. Move dataset files
First create 'datasets' directory:
mkdir datasets/
Then, extract and move the files to this directory so as to have the following inside datasets/:
- INbreast Release 1.0/
- all-mias/
4. Convert datasets to YOLO (and COCO) format
python convert_dataset.py
After completion, images/, labels/, dataset.yaml, annotations.json would be present in the working directory.
5. (optional) Apply additional filters to images
If necessary, you may apply these filters to images using our script: _canny, clahe, gamma, histogram, unsharp
You may enter one of the above filters in command line (-f).
The purpose of detectron.py is to train and evaluate a Faster R-CNN model and predict using detectron2 platform.
python detectron.py -c train
- Visualize model prediction
- Show ground truth and labels
- Filter predictions by confidence score
# After training is complete
python detectron.py -c predict -w output/model_final.pth -i <image path>
# -w: path to model weights
- Run train step as explained above
- Copy 'detectron.cfg.pkl' and the last model checkpoint to webapp/ directory.
* Last model checkpoint file name is written in output/last_checkpoint - Run the following:
cd webapp/
python web.py
- Then visit
- (optional) Use API
If you wish, API is also available, example:
# Run server
cd webapp/
python web.py
# Get predictions
curl -X POST \
-F "file=@input.jpg" \
http://localhost:33517/api/v1/predict \
| jq -r '.data.inferred_image' | base64 --decode > prediction.jpg
# You may also pass several files for batch prediction
curl -X POST \
-F "file=@sample1.jpg" \
-F "file=@sample2.jpg" \
http://localhost:33517/api/v1/predict # Returns prediction array
- Calculate mAP
- Uses test dataset by default
python detectron.py -c evaluate -w output/model_final.pth
- Suitable for later offline metrics calculation
- All predictions of the test dataset will be written to predicions.json
- Follows COCO format
python detectron.py -c evaluate_test_to_coco -w output/model_final.pth
First you must train classification model on the data.
The datasets contain data suitable for object detection. Therefore, you must first convert into classification dataset:
- Convert dataset for classification
python coco_to_classification.py train.json train_class.json
- Train classification model
python classification_model.py -a train_class.json -d train/images --save_dir classification_output -c train
- Generate XAI
python classification_model.py --save_dir classification_output -c predict -i train/images/cb_1.jpg
- Run & Test API
python classification_model.py --save_dir classification_output -c api
# Save sample to heatmap.jpg
curl -X POST -F "file=@test/images/20586986.jpg" http://localhost:33519/predict | jq -r '.activation_map' | base64 -d >~/heatmap.jpg
Setup config
Inside llm/ directory, create 'config.json' based on 'config.json.default' template. -
Run & Test LLM API
python llm/llm_api_server.py
curl -X POST http://localhost:33518/generate-response \
-H "Content-Type: application/json" \
-d '{
"prompt": "What is BI-RADS 4?", "predictions": "Some preds"
- Install Ultralytics
pip install ultralytics
- Train your desired YOLO model
yolo train data=dataset.yaml model=yolov8n
Example of prediction using YOLO ultralytics framework:
yolo predict model=runs/detect/train/weights/best.pt source=images/cb_1.jpg conf=0.1
- Clone UaNet repository (patched)
# Make sure you cd to breast_cancer_detection first
# cd breast_cancer_detection
git clone /~https://github.com/monajemi-arman/UaNet_2D
- Prepare dataset
# Convert datasets to images/ masks/
python convert_dataset.py -m mask
# Convert to 3D NRRD files
python to_3d_nrrd.py
- Move dataset to model directory
# While in breast_cancer_detection directory
mv UaNet-dataset/* UaNet_2D/data/preprocessed/
# Remove old default configs of UaNet
mv split/* UaNet_2D/src/split/
- Start training
cd UaNet_2D/src
python train.py