-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separating MBConvBlock, FusedMBConvBlock and Refactoring EfficientNetV2 #1146
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Now that they will become public API, please add:
- unit test files, regular case + corner case
- serialization test under serialization_test.py
- better documentation for each layer, i.e., what's the paper reference, what does it do, what inputs / outputs shape does it require, what is the python example usage, etc
Deal! Wanted to check whether this was an okay change to begin with :) |
We can still easily remap some of the layers weights when the total number of layer weights changes, I guess? |
Yeah, I think that's the best approach (what I meant by porting our own weights). No need to re-train since it's identical, just wrapped. Has a nicer summary call now too :) |
Should I do the remapping or should someone here do it instead, since you can directly upload the new weights and verify on the same eval set as before? |
I will re-map the weights today and upload the new ones to GCS |
I've re-mapped the weights to the new implementation (with changes I made to your PR based on my comments here). You can see my updates version of your new ConvBlock+FusedMBConvBlocks here: /~https://github.com/ianstenbit/keras-cv/tree/mbconv Feel free to copy those changes in. The only change I commented here and didn't update in my branch is the unused depthwise conv filter in the Fused block. My script to convert the weights can be found here, and I've uploaded the weights to GCS. Once this PR is merged, I will update our pre-trained weights config to point to the converted weights. |
PR to update our pretrained weights after this is merged: #1151 |
Thanks for the changes and porting Ian! I'll copy the changes and attribute them to you and update this PR. Excited to see these disentangled, making EffNets more modular! We should do a sweep like this from time to time to reasses whether architectures are modular enough. :) |
I don't think this PR breaks any CutMix related test, is there any way you can re-sync and run it again? |
Merged the latest changes, tried a new PR, ran tests locally again - the CI is still showing failed tests on some preprocessing layers. No clue why 🤔 |
Any luck with the new PR 1168? |
Nope, same issue... |
@DavidLandup0 You can reproduce the same issue/failures locally with the full test sequence: pytest keras_cv/ --ignore keras_cv/models --ignore keras_cv/datasets/waymo
|
Hmm, the individual test runs:
But it fails when done with the rest of the tests. I'll create a fresh branch and make the changes there... Edit: I identified what files cause it to fail. the |
Found it. When calling:
It has to be:
Otherwise, the tests pass individually, but don't pass in the suite, presumably because some state might be reused between unit tests. When run for a single file, the state is fine because it's localized and can't affect other files. When run in the suite, the change in the state triggers a failure in other tests. If this is true, we might want to add a before clause in the unit tests to refresh whatever caused this? @LukeWood @bhack @tanzhenyu P.S. Might be worth adding this in the contributor guidelines if it's unfixable for whatever reason, for contributors to know upfront, because it's a very silent failure across 150 tests caused by one keyword in a fully separate test. 🤔 |
I suspect this might be caused by a bug. The preprocessing layer uses |
When was training set to True by default for the KPLs? It was tricky to reproduce, and only happened some times on Google Colab (but not locally) so I never opened an issue, and it eventually stopped. Might be tied to this somehow? |
|
Can you try something like this? (instead of |
I think It is better to introduce a pytest fixture /~https://github.com/BenWhetton/keras-surgeon/blob/master/tests/test_surgeon.py#L31-L38 |
Added fixture to clean the session - good call 👍 |
/gcbrun |
Consider adding this cmd to the contribution guideline :-) |
@tanzhenyu Can we find a more general placement? Is this still used in keras keras-team/keras#11170? |
Do we have switched over |
The failing gcp test is due to custom ops build issue. Manually merging |
Looks like a good thing to add to our codebase |
…V2 (keras-team#1146) * initial port * fusedmb and mb * update * refactored * fixed args * formatting * removed import * Fix kernel size + filter count issues * strides * docs for mbconv * fixed SE block, updated docs * serialization test * added basic tests * fixed test * renamed test case * fixed tests * fixtures Co-authored-by: ianjjohnson <3072903+ianstenbit@users.noreply.github.com>
What does this PR do?
There's a design issue that's not aligned with KCV's principles. MBConv blocks, as well as well as Fused MBConv blocks are general-purpose blocks, introduced with MobileNets and then re-used in practically all mobile-oriented/efficient-desired architectures since. Currently, we only have them as part of EfficientNets, in which they're proprietary methods.
If we don't separate them, we'll accumulate tech debt that'll be only harder to recover from down the line.
Since MaxViT (currently WIP), MobileNets and various other future architectures will use them (and we'd want to expose these to the user as well as a common building block), I propose that we separate them into standalone layers, which is this PR.
I've refactored EfficientNetV2 to use layers instead of methods and they're identical in param counts, as validation that they behave the same. One issue to figure out is:
MBConvBlock
Because of this, the total number of layers differs (151 vs 26). We can:
MBConvBlock
andFusedMBConvBlock
and add them to the EffNet while it's being built (so as to be able to load weights as they are right now)Here's the difference:
VS:
@LukeWood @tanzhenyu @ianstenbit @bhack