[Semantic Segmentation] - Add SegFormer Architecture, Weight Conversion Script and Presets #1883

DavidLandup0 · 2024-09-26T13:05:08Z

This PR adds:

SegFormerBackbone and presets
Preprocessor flow
SegFormerImageSegmenter and presets for Cityscapes and ADE20k (B0...B5 each)
Conversion script
Tests

Basic Usage

preprocessor = keras_hub.models.ImageSegmenterPreprocessor.from_preset("segformer_b0_ade20k_512")
segmenter = keras_hub.models.SegFormerImageSegmenter.from_preset("segformer_b0_ade20k_512")
segmenter(np.random.rand(1, 512, 512, 3))

End-to-end example with preprocessor:

import urllib.request 
from PIL import Image 
import numpy as np
import keras_hub

preprocessor = keras_hub.models.ImageSegmenterPreprocessor.from_preset("segformer_b0_ade20k_512")
segmenter = keras_hub.models.SegFormerImageSegmenter.from_preset("segformer_b0_ade20k_512")

img_url = "https://www.vanorohotel.com/wp-content/uploads/2021/07/drz-vanoro_6737.jpg"  
urllib.request.urlretrieve(img_url, "image.png") 
  
img = np.array(Image.open("image.png").resize((512, 512)))
img = np.expand_dims(img, 0)
inputs = preprocessor(img)
outs = segmenter(inputs)

With Image Converter

converter = keras_hub.layers.ImageConverter(image_size=(512, 512))
preprocessor = keras_hub.models.ImageSegmenterPreprocessor.from_preset("segformer_b0_ade20k_512", image_converter=converter)
segmenter = keras_hub.models.SegFormerImageSegmenter.from_preset("segformer_b0_ade20k_512")

Training Pipeline Example

A few examples in the notebook below:

Instantiation of Backbone and Segmenter with MiT Encoder
Running on input images
Training pipeline with TFDS on Oxford IIIT Pets as an example

https://colab.research.google.com/drive/1EBNg6nPKx_KzyRuQQtHZ_PG_Nsf2pAg2#scrollTo=V9Ub4NHKCx9e

After a few minutes of training from scratch (both encoder and segmenter):

keras_hub/src/models/segformer/segformer_backbone.py

keras_hub/src/models/segformer/segformer_image_segmenter.py

DavidLandup0 · 2024-10-16T04:55:44Z

Found the issue - a transpose call shuffling the order of a latent in the encoder incorrectly. I'll get the presets up on Kaggle now

divyashreepathihalli

Amazing!! the PR looks great. Just a few comments.

keras_hub/src/models/segformer/segformer_backbone_tests.py

keras_hub/src/models/segformer/segformer_image_segmenter.py

keras_hub/src/models/segformer/__init__.py

keras_hub/src/models/segformer/segformer_backbone.py

keras_hub/src/models/segformer/segformer_backbone_presets.py

keras_hub/src/models/segformer/segformer_image_segmenter.py

divyashreepathihalli · 2024-10-16T20:24:26Z

keras_hub/src/models/segformer/segformer_image_segmenter_preprocessor.py

+    image_converter_cls = SegFormerImageConverter
+
+    @preprocessing_function
+    def call(self, x, y=None, sample_weight=None):


how about transformations to y - which would be masks in the training set. If the images are resized/transformed, so should the masks

Good point. In this case, since we don't want to apply either normalization or rescaling to the masks, the image converter would just resize the images. Do we have a standard practice of defining a default converter or let the user decide this?

for segformer, I think we should add the default resizing of mask here if we are performing resizing on images.

deeplabv3 code did something similar

Added - if a converter is present, it resizes both the images and the masks. It also rescales and normalizes the images to match the HuggingFace preprocessor

…segmenter

divyashreepathihalli · 2024-10-18T18:07:35Z

keras_hub/src/models/segformer/segformer_image_segmenter_preprocessor.py

+    def call(self, x, y=None, sample_weight=None):
+        if self.image_converter:
+            x = self.image_converter(x)
+            y = self.image_converter(y)


wouldn't this also rescale the masks if rescale is part of image_converter?

Yes, it would. The ImageConverter should only resize - rescaling and normalization is enabled in the preprocessor by default, since it's required for the weights to function as intended. This should at the very least be documented, or ideally set as the default to avoid confusion.

If I add a default ImageConverter with just resizing and export the SegFormer model as a task again - the ImageConverter's from_preset() would restore it to that state as well? In that case, it may make the most sense to export the SegFormer models with the ImageConverter configs already baked in. Thoughts?

…on Script and Presets (keras-team#1883) * initial commit - tf-based, kcv * porting to keras_hub structure - removing aliases, presets, etc. * enable instantiation of segformer backbone with custom MiT backbone * remove num_classes from backbone * fix input * add imports to __init__ * update preset * update docstrings * add basic tests * remove redundant imports * update docstrings * remove unused import * running api_gen.py * undo refactor of mit * update docstrings * add presets for mit * add standin paths * add presets for segformer backbone * register presets in __init__.py * addressing comments * addressing comments * addressing comments * update most tests * add remaining tests * remove copyright * fix test * override from_config * fix op in overlapping patching and embedding, start adding conversion utils * style * add padding to MiT patchingandembedding * update to support other presets * update conversin script * fix link for b5 * add cityscapes weights * update presets * update presets * update conversion script to make directories * use save_preset * change name of output dir * add preprocessor flow * api gen and add preprocessor to mits * conform to new image classifier style * format * resizing image converter -> ImageConverter * merge mit branch into segformer branch * add preprocessor and converter * address comments * clarify backbone usage * add conversion script * numerical equivalence changes * fix numerical inaccuracies * update conversion script * update conversion script * remove transpose * add preprocessor to segformer class * fix preset path * update test shape * update presets * update test shape * expand docstrings * add rescaling and normalization to preprocessor * remove backbone presets, remove copyrights, remove backbone cls from segmenter * remove copyright and unused import * apply same transformation to masks as input images * fix import * fix shape in tests

BytePairTokenizer must not split sequences of \n (keras-team#1910) * fix for loading of special tokens in Llama tokenizer * fix for Llama tokenizer which can have multiple end tokens * bug fix * adding some missing tokens to Llama3 tokenizer * fixed tests and Llama3Tokenizer init. * now loading correct eos_token config from Hugging Face checkpoint. Using hack for Keras checkpoint because it does not have this info * fix for BytePairTokenizer to make Lllama3-instruct work in chat: \n\n sequences are significant in the chat template and must be preserved by the tokenizer --------- Co-authored-by: Martin Görner <martin@huggingface.co> fix for generation that never stops in Llama3-Instruct variants (keras-team#1904) * fix for loading of special tokens in Llama tokenizer * fix for Llama tokenizer which can have multiple end tokens * bug fix * adding some missing tokens to Llama3 tokenizer * fixed tests and Llama3Tokenizer init. * now loading correct eos_token config from Hugging Face checkpoint. Using hack for Keras checkpoint because it does not have this info --------- Co-authored-by: Martin Görner <martin@huggingface.co> fix failing JAX GPU test (keras-team#1911) * fix tests * fix test Refactor `MMDiT`, add `ImageToImage` and `Inpaint` for SD3 (keras-team#1909) * Refactor `MMDiT` and add `ImageToImage` * Update model version * Fix minor bugs. * Add `Inpaint` for SD3. * Fix warnings of MMDiT. * Addcomment to Inpaint * Simplify `MMDiT` implementation and info of `summary()`. * Refactor `generate()` API of `TextToImage`, `ImageToImage` and `Inpaint`. Minor bug fix (keras-team#1915) Change to image_converter.image_size since it is a tuple and it's not a callable function. [Mix Transformer] Add Presets for MiTB0...MiTB5 (keras-team#1893) * add presets for mit * add standin paths * register presets in __init__.py * fix op in overlapping patching and embedding, start adding conversion utils * style * add padding to MiT patchingandembedding * update to support other presets * update conversin script * fix link for b5 * add cityscapes weights * update presets * update presets * update conversion script to make directories * use save_preset * change name of output dir * add preprocessor flow * api gen and add preprocessor to mits * conform to new image classifier style * format * resizing image converter -> ImageConverter * address comments refactoring remove default resizing for vision backbones (keras-team#1916) * remove defailt resizing * fix GPU test Update VGG model to be compatible with HF and add conversion scripts (keras-team#1914) Deeplab presets (keras-team#1918) * add preset configurations for deeplabv3 * fix uri * Add training details update presets to point to the main Keras Kaggle page (keras-team#1921) * update presets to point to the main keras page * update mit path Added test for the way BytePairTokenizer handles the \n\n sequence, which is important in Lama chat templates (keras-team#1912) * added test for the way BytePairTokenizer handles the \n\n sequence, which is important in Lama chat templates * un commented the test lines that were commented by mistake * fixed linter errors Task models fix (keras-team#1922) * added test for the way BytePairTokenizer handles the \n\n sequence, which is important in Lama chat templates * fix for wrongly configured task models LLama, PaliGemma, Mistral and Phi3 + test * comments * un commented the test lines that were commented by mistake * fixed linter errors adding option strip_prompt to generate() (keras-team#1913) * added test for the way BytePairTokenizer handles the \n\n sequence, which is important in Lama chat templates * un commented the test lines that were commented by mistake * fixed linter errors * added options strip_prompt to generate() * fix for tensorflow: the compiled version of generate(strip_prompt=True) now works + code refactoring to make it more understandable * added test for generate(strip_prompt=True) * minor edits Layout map for Llama (keras-team#1923) * added test for the way BytePairTokenizer handles the \n\n sequence, which is important in Lama chat templates * un commented the test lines that were commented by mistake * fixed linter errors * added default layout map for Llama * minor fixes in tests Update deeplab_v3_presets.py (keras-team#1924) Add paths to get SAM weights from (keras-team#1925) Two fixes for image resizing in preprocessing (keras-team#1927) 1. Properly display when are not resizing the input image in `model.summary()` 2. Allow setting the `image_size` directly on a preprocessing layer. 2. is just to allow a more consistent way to set the input shape across tasks. We now have: ```python text_classifier = keras_hub.models.TextClassifer.from_preset( "bert_base_en", ) text_classifier.preprocessor.sequence_length = 256 image_classifier = keras_hub.models.TextClassifer.from_preset( "bert_base_en", ) image_classifier.preprocessor.image_size = (256, 256) multi_modal_lm = keras_hub.models.CausalLM.from_preset( "some_preset", ) multi_modal_lm.preprocessor.sequence_length = 256 multi_modal_lm.preprocessor.image_size = (256, 256) ``` add back default image resizing (keras-team#1926) Update deeplab_v3_presets.py (keras-team#1928) * Update deeplab_v3_presets.py * Update deeplab_v3_presets.py Update PaliGemma to remove `include_rescaling` arg (keras-team#1917) * update PaliGemma * update conversion script * fix GPU tests fix path (keras-team#1929) * fix path * nit Fix paligemma checkpoint conversion script (keras-team#1931) * add back default image resizing * fix bug in image converter * fix paligemma checkpoint conversion file * fix preset name * remove debug code * revert unintended changes update preset path to point to latest version of models (keras-team#1932) Update sdv3 path (keras-team#1934) update sam docstring to show correct backbone in docstring (keras-team#1936) Convert input dict to tensors during train_on_batch (keras-team#1919) Register VGG presets. (keras-team#1935) * register vgg preset * nit * nit * nit Add ResNetVD presets (keras-team#1897) * Add ResNetVD presets * Updated Kaggle handles * Add weight conversion script for ResNet_vd * Add usage rebase conflict resolved conflict resolve Update sam_presets.py (keras-team#1940) Update vit_det_backbone.py (keras-team#1941) fix gpu test (keras-team#1939) * fix gpu test * cast input * update dtype * change to resnet preset * remove arg Added Support for Returning Attention Scores in TransformerEncoder call (keras-team#1879) * Added: Return attention scores argument to transformer encoder * Added: docstring for return_attention_scores and added a test to chek the working of the argument * Fixed: Test case by removing print stmts and using self.assertAllEqual * Fixed: Linting Mark preset tests as large (keras-team#1942) * fix tests * fix test * Update preset_utils_test.py version bump to 0.17.0.dev0 (keras-team#1944) Update stable_diffusion_3_presets.py (keras-team#1946) [Semantic Segmentation] - Add SegFormer Architecture, Weight Conversion Script and Presets (keras-team#1883) * initial commit - tf-based, kcv * porting to keras_hub structure - removing aliases, presets, etc. * enable instantiation of segformer backbone with custom MiT backbone * remove num_classes from backbone * fix input * add imports to __init__ * update preset * update docstrings * add basic tests * remove redundant imports * update docstrings * remove unused import * running api_gen.py * undo refactor of mit * update docstrings * add presets for mit * add standin paths * add presets for segformer backbone * register presets in __init__.py * addressing comments * addressing comments * addressing comments * update most tests * add remaining tests * remove copyright * fix test * override from_config * fix op in overlapping patching and embedding, start adding conversion utils * style * add padding to MiT patchingandembedding * update to support other presets * update conversin script * fix link for b5 * add cityscapes weights * update presets * update presets * update conversion script to make directories * use save_preset * change name of output dir * add preprocessor flow * api gen and add preprocessor to mits * conform to new image classifier style * format * resizing image converter -> ImageConverter * merge mit branch into segformer branch * add preprocessor and converter * address comments * clarify backbone usage * add conversion script * numerical equivalence changes * fix numerical inaccuracies * update conversion script * update conversion script * remove transpose * add preprocessor to segformer class * fix preset path * update test shape * update presets * update test shape * expand docstrings * add rescaling and normalization to preprocessor * remove backbone presets, remove copyrights, remove backbone cls from segmenter * remove copyright and unused import * apply same transformation to masks as input images * fix import * fix shape in tests Update readme (keras-team#1949) * Update README.md * Update README.md Update llama_backbone.py docstring (keras-team#1950) Update path (keras-team#1953) Update preset path for keras.io. There is no LLaMA2 in keras.io https://keras.io/api/keras_hub/models/llama2 This is the actual link: https://keras.io/api/keras_hub/models/llama2 For Vicuna it does not have it's own model direcotry, since it is also the part of Llama,, updated the path. Update SD3 init parameters (replacing `height`, `width` with `image_shape`) (keras-team#1951) * Replace SD3 `height` and `width` with `image_shape` * Update URI * Revert comment * Update SD3 handle * Replace `height` and `width` with `image_shape` * Update docstrings * Fix CI Update docstring (keras-team#1954) AudioConverter is registered as "keras_hub.layers.WhisperAudioConverter" and not as part of models. updated Mobilenet backbone to match it with torch implementation timm script added checkpoint conversion added Refactoring

initial commit - tf-based, kcv

716ae64

DavidLandup0 marked this pull request as draft September 26, 2024 13:05

DavidLandup0 and others added 6 commits September 27, 2024 17:35

porting to keras_hub structure - removing aliases, presets, etc.

71bd40b

enable instantiation of segformer backbone with custom MiT backbone

8894a86

remove num_classes from backbone

b66c659

fix input

392ec36

add imports to __init__

d80d8d0

Merge branch 'master' into feature/segformer

538adf7

DavidLandup0 marked this pull request as ready for review September 29, 2024 09:52

DavidLandup0 requested review from mattdangerw, divyashreepathihalli and fchollet September 29, 2024 09:52

DavidLandup0 changed the title ~~[Semantic Segmentation] - SegFormer (and MiTs)~~ [Semantic Segmentation] - SegFormer (MixTransformer-based) Sep 29, 2024

DavidLandup0 added 4 commits September 29, 2024 18:55

update preset

1571677

update docstrings

4b82a16

add basic tests

9b260e7

remove redundant imports

b93954f

DavidLandup0 changed the title ~~[Semantic Segmentation] - SegFormer (MixTransformer-based)~~ [Semantic Segmentation] - Add SegFormer Sep 29, 2024

DavidLandup0 added 7 commits September 29, 2024 19:20

update docstrings

159dca5

remove unused import

3ec02dd

running api_gen.py

7b6286e

undo refactor of mit

c40fdcd

update docstrings

9a13544

add presets for mit

4dc3fff

add standin paths

191656c

DavidLandup0 commented Sep 30, 2024

View reviewed changes

keras_hub/src/models/segformer/segformer_backbone.py Show resolved Hide resolved

DavidLandup0 commented Sep 30, 2024

View reviewed changes

keras_hub/src/models/segformer/segformer_image_segmenter.py Show resolved Hide resolved

DavidLandup0 commented Sep 30, 2024

View reviewed changes

keras_hub/src/models/segformer/segformer_image_segmenter.py Outdated Show resolved Hide resolved

DavidLandup0 added 2 commits September 30, 2024 09:36

add presets for segformer backbone

9e47564

register presets in __init__.py

98bb69d

remove transpose

04ba1eb

DavidLandup0 added 6 commits October 16, 2024 14:13

add preprocessor to segformer class

9e04b6e

fix preset path

e9e8ed5

update test shape

a7a21f6

update presets

28e1297

update test shape

fc8fffe

expand docstrings

fa89a09

DavidLandup0 requested a review from divyashreepathihalli October 16, 2024 06:31

DavidLandup0 changed the title ~~[Semantic Segmentation] - Add SegFormer Architecture (+ configs for random initialization)~~ [Semantic Segmentation] - Add SegFormer Architecture, Weight Conversion Script and Presets Oct 16, 2024

add rescaling and normalization to preprocessor

6e3d3d1

divyashreepathihalli reviewed Oct 16, 2024

View reviewed changes

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Oct 16, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Oct 16, 2024

DavidLandup0 added 3 commits October 17, 2024 13:16

remove backbone presets, remove copyrights, remove backbone cls from …

5092d71

…segmenter

remove copyright and unused import

ec3e1ec

apply same transformation to masks as input images

54c24a9

DavidLandup0 requested a review from divyashreepathihalli October 17, 2024 07:22

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Oct 18, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Oct 18, 2024

divyashreepathihalli reviewed Oct 18, 2024

View reviewed changes

DavidLandup0 added 3 commits October 20, 2024 15:36

merge master into feature branch

cbb6f8a

fix import

225942d

fix shape in tests

c7a6166

divyashreepathihalli approved these changes Oct 22, 2024

View reviewed changes

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Oct 22, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Oct 22, 2024

divyashreepathihalli merged commit 55da400 into keras-team:master Oct 22, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Semantic Segmentation] - Add SegFormer Architecture, Weight Conversion Script and Presets #1883

[Semantic Segmentation] - Add SegFormer Architecture, Weight Conversion Script and Presets #1883

DavidLandup0 commented Sep 26, 2024 •

edited

Loading

DavidLandup0 commented Oct 16, 2024

divyashreepathihalli left a comment

divyashreepathihalli Oct 16, 2024

DavidLandup0 Oct 17, 2024

divyashreepathihalli Oct 17, 2024

divyashreepathihalli Oct 17, 2024

DavidLandup0 Oct 18, 2024

divyashreepathihalli Oct 18, 2024

DavidLandup0 Oct 20, 2024

[Semantic Segmentation] - Add SegFormer Architecture, Weight Conversion Script and Presets #1883

[Semantic Segmentation] - Add SegFormer Architecture, Weight Conversion Script and Presets #1883

Conversation

DavidLandup0 commented Sep 26, 2024 • edited Loading

Basic Usage

With Image Converter

Training Pipeline Example

DavidLandup0 commented Oct 16, 2024

divyashreepathihalli left a comment

Choose a reason for hiding this comment

divyashreepathihalli Oct 16, 2024

Choose a reason for hiding this comment

DavidLandup0 Oct 17, 2024

Choose a reason for hiding this comment

divyashreepathihalli Oct 17, 2024

Choose a reason for hiding this comment

divyashreepathihalli Oct 17, 2024

Choose a reason for hiding this comment

DavidLandup0 Oct 18, 2024

Choose a reason for hiding this comment

divyashreepathihalli Oct 18, 2024

Choose a reason for hiding this comment

DavidLandup0 Oct 20, 2024

Choose a reason for hiding this comment

DavidLandup0 commented Sep 26, 2024 •

edited

Loading