Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds efficientnet2 presets #1983

Merged
merged 19 commits into from
Dec 7, 2024

Conversation

pkgoogle
Copy link
Collaborator

Adds these variants:

efficientnet2_rw_m_agc_imagenet
efficientnet2_rw_s_ra2_imagenet
efficientnet2_rw_t_ra2_imagenet

Branched from Edge Presets: #1976

Can just merge this one and close that one or merge that one first for more modularity.

@pkgoogle pkgoogle added the kokoro:force-run Runs Tests on GPU label Nov 12, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Nov 12, 2024
@pkgoogle pkgoogle added the kokoro:force-run Runs Tests on GPU label Nov 18, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Nov 18, 2024
Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Piseth! Left a few comments

batch_norm_epsilon=1e-3,
activation="swish",
dropout=0.2,
nores=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add missing args in docstring - example : nores

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,136 @@
import keras
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename file to be more readable - convolution_batch_norm_activation.py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this conflicts w/ the below review, so I chose the later review.

@@ -0,0 +1,22 @@
import keras
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename file to be more readable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this conflicts w/ the below review, so I chose the later review.

@@ -77,6 +77,7 @@ def __init__(
activation="swish",
dropout=0.2,
nores=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docstring args are not matching the arg list here - please make sure they are consistent

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a partially "stackwise" arg that isn't name "stackwise_" is confusing. Left a suggestion to be more consistent in our arg names here. Less if branching.

}


class ConvBNActBlock(keras.layers.Layer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just call this CBABlock? And rename the file to match?

Kind of confusing the different forms of abbreviation we are using here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return x

def get_config(self):
config = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -260,11 +271,26 @@ def __init__(
name=block_name,
)
x = block(x)
else: # cba block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this? you extend get_conv_constructor to handle "cba" then don't use it. I'm confused.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing my mind mid-change 😄. Refactored to make block_kwargs more dynamic now.

for i in range(len(stackwise_kernel_sizes)):
num_stacks = len(stackwise_kernel_sizes)

if isinstance(depth_coefficient, tuple):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing, most that can be passed "stackwise" have stackwise_ as a prefix, except depth_coefficient now.

It's probably better UX to just rename these first arguments stackwise_depth_coefficient and stackwise_width_coefficient. At least then our argument names are consistent. Remember to update docstrings.

You could keep backwards compat by adding a few lines at the top of the constructor here. Something like this. You have to allow stackwise_depth_coefficient=None and stackwise_width_coefficient=None in the arg list.

num_stacks = len(stackwise_kernel_sizes)
if "depth_coefficient" in kwargs:
    stackwise_depth_coefficient = kwargs.pop("depth_coefficient") * num_stacks
if "width_coefficient" in kwargs:
    stackwise_width_coefficient = kwargs.pop("width_coefficient") * num_stacks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -189,15 +236,22 @@ def port_batch_normalization(keras_layer, hf_weight_prefix):

# Stages
num_stacks = len(backbone.stackwise_kernel_sizes)

depth_coefficient = VARIANT_MAP[variant]["depth_coefficient"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we follow the suggestion about arg renaming above, we can get rid of this whole block. Just update above from

"width_coefficient": 0.8,
"depth_coefficient": 0.9,

to

"stackwise_width_coefficient": [0.8] * 6,
"stackwise_depth_coefficient": [0.9] * 6,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it so both ways should work.

@pkgoogle pkgoogle self-assigned this Nov 21, 2024
@pkgoogle
Copy link
Collaborator Author

I should note I'm noticing a different prediction for one variant:

python tools/checkpoint_conversion/convert_efficientnet_checkpoints.py --preset efficientnet2_rw_s_ra2_imagenet
✅ Loaded TIMM model.
I1121 13:45:18.860764 8329934912 _builder.py:196] Loading pretrained weights from Hugging Face hub (timm/efficientnetv2_rw_s.ra2_in1k)
I1121 13:45:19.120539 8329934912 _hub.py:184] [timm/efficientnetv2_rw_s.ra2_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
✅ Loaded KerasHub model.
1/1 ━━━━━━━━━━━━━━━━━━━━ 2s 2s/step
🔶 Keras output: [-0.575618   -0.4144789  -0.73163635 -0.8395867  -0.5637497   0.87436163
 -1.3687949  -0.1337489  -0.0662252   0.01332632]
🔶 TIMM output: [-0.7259099   0.3929606  -0.05580516  0.11954936  0.02080455  0.996287
  0.3669875   0.5366461   0.01827425 -0.37024903]
🔶 Keras label: 349
🔶 TIMM label: 345
🔶 Modeling difference: 0.52596486
🔶 Preprocessing difference: 0.0044535464
🏁 Preset saved to ./efficientnet2_rw_s_ra2_imagenet

Which I'm looking into

@mattdangerw
Copy link
Member

@pkgoogle thanks! Yeah a different label and that level of different output floats worth looking at!

@pkgoogle pkgoogle added the kokoro:force-run Runs Tests on GPU label Nov 25, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Nov 25, 2024
@pkgoogle
Copy link
Collaborator Author

Should be good now:

python tools/checkpoint_conversion/convert_efficientnet_checkpoints.py     --preset efficientnet2_rw_s_ra2_imagenet
✅ Loaded TIMM model.
I1125 14:46:30.043703 8329934912 _builder.py:196] Loading pretrained weights from Hugging Face hub (timm/efficientnetv2_rw_s.ra2_in1k)
I1125 14:46:30.252438 8329934912 _hub.py:184] [timm/efficientnetv2_rw_s.ra2_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
✅ Loaded KerasHub model.
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 1s/step
🔶 Keras output: [-0.37625     0.6832866   0.28495154  0.5878901   0.20605645  1.0627874
  0.40687525  0.40137056 -0.118939   -0.30322495]
🔶 TIMM output: [-0.7259099   0.3929606  -0.05580516  0.11954936  0.02080455  0.996287
  0.3669875   0.5366461   0.01827425 -0.37024903]
🔶 Keras label: 345
🔶 TIMM label: 345
🔶 Modeling difference: 0.15455398
🔶 Preprocessing difference: 0.0044535464
🏁 Preset saved to ./efficientnet2_rw_s_ra2_imagenet

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Just little comments.


BN_AXIS = 3

CONV_KERNEL_INITIALIZER = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda funky to return this as a dict, any particular reason for it?

More in line with other models would be to just add a function here...

def conv_kernel_initializer(scale=2.):
    return keras.initializers.VarianceScaling(
        scale=scale,
        ...
    )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason, I was following the original style of the fusedmbconv/mbconv blocks. Done.

@@ -99,8 +106,8 @@ class EfficientNetBackbone(FeaturePyramidBackbone):
def __init__(
self,
*,
width_coefficient,
depth_coefficient,
stackwise_width_coefficient=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably make these plural, in keeping with other args? stackwise_width_coefficients and stackwise_depth_coefficients

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"num_features": 1792,
},
"rw_s": {
"width_coefficient": 1.0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd pass these the new way instead of the legacy and now undocumented path. "stackwise...": [1.0] * 6

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

},
"rw_s": {
"width_coefficient": 1.0,
"depth_coefficient": 1.0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, and elsewhere in this file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

activation: activation function to use between each convolutional layer.
input_shape: optional shape tuple, it should have exactly 3 input
channels.
stackwise_width_coefficient: list[float] or float, scaling coefficient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this need to be a list[float]?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, done

@pkgoogle pkgoogle added the kokoro:force-run Runs Tests on GPU label Dec 3, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Dec 3, 2024
@pkgoogle
Copy link
Collaborator Author

pkgoogle commented Dec 4, 2024

Remaining CI/CD failure appears unrelated to this PR.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@mattdangerw mattdangerw merged commit 62b3834 into keras-team:master Dec 7, 2024
9 of 10 checks passed
@pkgoogle pkgoogle deleted the add_efficientnet2_presets branch December 9, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants