Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separating MBConvBlock, FusedMBConvBlock and Refactoring EfficientNetV2 #1146

Merged
merged 20 commits into from
Dec 21, 2022

Conversation

DavidLandup0
Copy link
Contributor

@DavidLandup0 DavidLandup0 commented Dec 15, 2022

What does this PR do?

There's a design issue that's not aligned with KCV's principles. MBConv blocks, as well as well as Fused MBConv blocks are general-purpose blocks, introduced with MobileNets and then re-used in practically all mobile-oriented/efficient-desired architectures since. Currently, we only have them as part of EfficientNets, in which they're proprietary methods.

If we don't separate them, we'll accumulate tech debt that'll be only harder to recover from down the line.

Since MaxViT (currently WIP), MobileNets and various other future architectures will use them (and we'd want to expose these to the user as well as a common building block), I propose that we separate them into standalone layers, which is this PR.

I've refactored EfficientNetV2 to use layers instead of methods and they're identical in param counts, as validation that they behave the same. One issue to figure out is:

  • Original effnets list each layer within each MBConvBlock
  • New MBConvBlock layer encapsulates them within a single MBConvBlock

Because of this, the total number of layers differs (151 vs 26). We can:

  1. Expose all layers within the MBConvBlock and FusedMBConvBlock and add them to the EffNet while it's being built (so as to be able to load weights as they are right now)
  2. Keep layers encapsulated and port our own weights (or re-train effnets)

Here's the difference:

input_3 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 rescaling_2 (Rescaling)     (None, 224, 224, 3)       0         
                                                                 
 stem_conv (Conv2D)          (None, 112, 112, 32)      864       
                                                                 
 stem_bn (BatchNormalization  (None, 112, 112, 32)     128       
 )                                                               
                                                                 
 stem_activation (Activation  (None, 112, 112, 32)     0         
 )                                                               
                                                                 
 block1a_ (FusedMBConvBlock)  (None, 112, 112, 16)     576       
                                                                 
 block2a_ (FusedMBConvBlock)  (None, 112, 112, 32)     3456      
...

VS:

input_1 (InputLayer)           [(None, 224, 224, 3  0           []                               
                                )]                                                                
                                                                                                  
 rescaling (Rescaling)          (None, 224, 224, 3)  0           ['input_1[0][0]']                
                                                                                                  
 stem_conv (Conv2D)             (None, 112, 112, 32  864         ['rescaling[0][0]']              
                                )                                                                 
                                                                                                  
 stem_bn (BatchNormalization)   (None, 112, 112, 32  128         ['stem_conv[0][0]']              
                                )                                                                 
                                                                                                  
 stem_activation (Activation)   (None, 112, 112, 32  0           ['stem_bn[0][0]']                
                                )                                                                 
# NOW PART OF FusedMBConvBlock                                      
 block1a_project_conv (Conv2D)  (None, 112, 112, 16  4608        ['stem_activation[0][0]']        
                                )                                                                 
                                                                                                  
 block1a_project_bn (BatchNorma  (None, 112, 112, 16  64         ['block1a_project_conv[0][0]']   
 lization)                      )                                                                 
                                                                                                  
 block1a_project_activation (Ac  (None, 112, 112, 16  0          ['block1a_project_bn[0][0]']    

@LukeWood @tanzhenyu @ianstenbit @bhack

@DavidLandup0 DavidLandup0 mentioned this pull request Dec 15, 2022
Copy link
Contributor

@tanzhenyu tanzhenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
Now that they will become public API, please add:

  1. unit test files, regular case + corner case
  2. serialization test under serialization_test.py
  3. better documentation for each layer, i.e., what's the paper reference, what does it do, what inputs / outputs shape does it require, what is the python example usage, etc

@DavidLandup0
Copy link
Contributor Author

Deal! Wanted to check whether this was an okay change to begin with :)
Docs and tests coming up tomorrow

@tanzhenyu
Copy link
Contributor

tanzhenyu commented Dec 15, 2022

We can still easily remap some of the layers weights when the total number of layer weights changes, I guess?
i.e., new_model.new_layer.mb_conv_1.set_weights(original_model.mb_conv_1.get_weights()

@DavidLandup0
Copy link
Contributor Author

DavidLandup0 commented Dec 15, 2022

Yeah, I think that's the best approach (what I meant by porting our own weights). No need to re-train since it's identical, just wrapped. Has a nicer summary call now too :)

@DavidLandup0
Copy link
Contributor Author

Should I do the remapping or should someone here do it instead, since you can directly upload the new weights and verify on the same eval set as before?
I'd have to test on V2 and open a PR which unnecessarily slows the process down

@ianstenbit
Copy link
Contributor

Should I do the remapping or should someone here do it instead, since you can directly upload the new weights and verify on the same eval set as before? I'd have to test on V2 and open a PR which unnecessarily slows the process down

I will re-map the weights today and upload the new ones to GCS

@ianstenbit
Copy link
Contributor

ianstenbit commented Dec 15, 2022

I've re-mapped the weights to the new implementation (with changes I made to your PR based on my comments here). You can see my updates version of your new ConvBlock+FusedMBConvBlocks here: /~https://github.com/ianstenbit/keras-cv/tree/mbconv

Feel free to copy those changes in. The only change I commented here and didn't update in my branch is the unused depthwise conv filter in the Fused block.

My script to convert the weights can be found here, and I've uploaded the weights to GCS. Once this PR is merged, I will update our pre-trained weights config to point to the converted weights.

ianstenbit added a commit to ianstenbit/keras-cv that referenced this pull request Dec 16, 2022
@ianstenbit
Copy link
Contributor

PR to update our pretrained weights after this is merged: #1151

@DavidLandup0
Copy link
Contributor Author

Thanks for the changes and porting Ian!
My bad for not catching them myself. I didn't want to fiddle with the arguments since they come from the current EffNetV2s, so I a priori assumed that they're okay.

I'll copy the changes and attribute them to you and update this PR. Excited to see these disentangled, making EffNets more modular! We should do a sweep like this from time to time to reasses whether architectures are modular enough. :)

@tanzhenyu
Copy link
Contributor

All tests pass locally and the PR doesn't change anything regarding CutMix and BaseAugmenter. Could you test this locally as well @tanzhenyu?

I don't think this PR breaks any CutMix related test, is there any way you can re-sync and run it again?

@DavidLandup0
Copy link
Contributor Author

Merged the latest changes, tried a new PR, ran tests locally again - the CI is still showing failed tests on some preprocessing layers. No clue why 🤔

@tanzhenyu
Copy link
Contributor

Merged the latest changes, tried a new PR, ran tests locally again - the CI is still showing failed tests on some preprocessing layers. No clue why 🤔

Any luck with the new PR 1168?

@DavidLandup0
Copy link
Contributor Author

Nope, same issue...
Any ideas on why this might be happening?

@bhack
Copy link
Contributor

bhack commented Dec 20, 2022

Merged the latest changes, tried a new PR, ran tests locally again - the CI is still showing failed tests on some preprocessing layers. No clue why thinking

@DavidLandup0 You can reproduce the same issue/failures locally with the full test sequence:

pytest keras_cv/ --ignore keras_cv/models --ignore keras_cv/datasets/waymo

@DavidLandup0
Copy link
Contributor Author

DavidLandup0 commented Dec 20, 2022

@DavidLandup0 You can reproduce the same issue/failures locally with the full test sequence:

pytest keras_cv/ --ignore keras_cv/models --ignore keras_cv/datasets/waymo

Hmm, the individual test runs:

keras_cv/layers/preprocessing/base_image_augmentation_layer_test.py .............s

But it fails when done with the rest of the tests. I'll create a fresh branch and make the changes there...

Edit: I identified what files cause it to fail. the mbconv_test.py and fusedmbconv_test.py, existing, somehow makes other preprocessing tests fail, when run with the entire sequence, but not individually (individually, they all run fine). Looking further into it

@DavidLandup0
Copy link
Contributor Author

DavidLandup0 commented Dec 20, 2022

Found it. When calling:

output = layer(inputs)

It has to be:

output = layer(inputs, training=True)

Otherwise, the tests pass individually, but don't pass in the suite, presumably because some state might be reused between unit tests. When run for a single file, the state is fine because it's localized and can't affect other files. When run in the suite, the change in the state triggers a failure in other tests.

If this is true, we might want to add a before clause in the unit tests to refresh whatever caused this? @LukeWood @bhack @tanzhenyu

P.S. Might be worth adding this in the contributor guidelines if it's unfixable for whatever reason, for contributors to know upfront, because it's a very silent failure across 150 tests caused by one keyword in a fully separate test. 🤔

@tanzhenyu
Copy link
Contributor

Found it. When calling:

output = layer(inputs)

It has to be:

output = layer(inputs, training=True)

Otherwise, the tests pass individually, but don't pass in the suite, presumably because some state might be reused between unit tests. When run for a single file, the state is fine because it's localized and can't affect other files. When run in the suite, the change in the state triggers a failure in other tests.

If this is true, we might want to add a before clause in the unit tests to refresh whatever caused this? @LukeWood @bhack @tanzhenyu

P.S. Might be worth adding this in the contributor guidelines if it's unfixable for whatever reason, for contributors to know upfront, because it's a very silent failure across 150 tests caused by one keyword in a fully separate test. 🤔

I suspect this might be caused by a bug. The preprocessing layer uses training=True by default, whereas the regular layers uses training=None by default.
Other preprocessing layers don't use this argument in test: /~https://github.com/keras-team/keras-cv/blob/master/keras_cv/layers/preprocessing/cut_mix_test.py
Other regular layers don't use this argument in test: /~https://github.com/keras-team/keras-cv/blob/master/keras_cv/layers/object_detection/non_max_suppression_test.py

@DavidLandup0
Copy link
Contributor Author

When was training set to True by default for the KPLs?
A few months back, I remember having an issue where they'd be True for the first execution, but False for all subsequent ones, so when visualizing training batches, the augmentations would be off during training, and I had to manually set training as True during batch visualization to avoid this.

It was tricky to reproduce, and only happened some times on Google Colab (but not locally) so I never opened an issue, and it eventually stopped. Might be tied to this somehow?

@tanzhenyu
Copy link
Contributor

When was training set to True by default for the KPLs? A few months back, I remember having an issue where they'd be True for the first execution, but False for all subsequent ones, so when visualizing training batches, the augmentations would be off during training, and I had to manually set training as True during batch visualization to avoid this.

It was tricky to reproduce, and only happened some times on Google Colab (but not locally) so I never opened an issue, and it eventually stopped. Might be tied to this somehow?

/~https://github.com/keras-team/keras-cv/blob/master/keras_cv/layers/preprocessing/base_image_augmentation_layer.py#L362

@tanzhenyu
Copy link
Contributor

tanzhenyu commented Dec 20, 2022

When was training set to True by default for the KPLs? A few months back, I remember having an issue where they'd be True for the first execution, but False for all subsequent ones, so when visualizing training batches, the augmentations would be off during training, and I had to manually set training as True during batch visualization to avoid this.

It was tricky to reproduce, and only happened some times on Google Colab (but not locally) so I never opened an issue, and it eventually stopped. Might be tied to this somehow?

Can you try something like this? (instead of training=True)
/~https://github.com/keras-team/keras-cv/blob/master/keras_cv/models/object_detection/retina_net/retina_net_test.py#L27

@bhack
Copy link
Contributor

bhack commented Dec 20, 2022

I think It is better to introduce a pytest fixture

/~https://github.com/BenWhetton/keras-surgeon/blob/master/tests/test_surgeon.py#L31-L38

@DavidLandup0
Copy link
Contributor Author

Added fixture to clean the session - good call 👍
@bhack @tanzhenyu

@tanzhenyu
Copy link
Contributor

/gcbrun

@tanzhenyu
Copy link
Contributor

Consider adding this cmd to the contribution guideline :-)

@bhack
Copy link
Contributor

bhack commented Dec 21, 2022

@tanzhenyu Can we find a more general placement? Is this still used in keras keras-team/keras#11170?

@bhack
Copy link
Contributor

bhack commented Dec 21, 2022

@tanzhenyu
Copy link
Contributor

The failing gcp test is due to custom ops build issue. Manually merging

@tanzhenyu tanzhenyu merged commit 52a53d5 into keras-team:master Dec 21, 2022
@tanzhenyu
Copy link
Contributor

@tanzhenyu Can we find a more general placement? Is this still used in keras keras-team/keras#11170?

Looks like a good thing to add to our codebase

ghost pushed a commit to y-vectorfield/keras-cv that referenced this pull request Nov 16, 2023
…V2 (keras-team#1146)

* initial port

* fusedmb and mb

* update

* refactored

* fixed args

* formatting

* removed import

* Fix kernel size + filter count issues

* strides

* docs for mbconv

* fixed SE block, updated docs

* serialization test

* added basic tests

* fixed test

* renamed test case

* fixed tests

* fixtures

Co-authored-by: ianjjohnson <3072903+ianstenbit@users.noreply.github.com>
ghost pushed a commit to y-vectorfield/keras-cv that referenced this pull request Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants