-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parameter naming: Separate l0_lambda from l1_coefficient #376
Conversation
I'm a bit worried about adding lots of architecture-specific parameters to the config as this seems like it will get out of hand pretty quickly. I agree that using If we do decide to add this parameter, I'm not a fan of the name The benefit of an approach like in #360 where we just have a What do others think? cc @curt-tigges @hijohnnylin @anthonyduong9 @jbloomAus |
thanks for the comments & totally agree with add |
Refactor l1/l0 variable names to coefficient
@chanind I've made the adjustment, it should work fine now |
Thank you for doing this! There's a few more complications though:
I'm not sure if it's worth it though if Curt is going to be refactoring this soon anyway. Up to you if you think it's worth doing! |
Good to hear there will be a refactoring of the configs, in this case I think I'll just wait for the new version! closing the PR |
Resolves #360
Description
Currently, when using the JumpReLU architecture which involves l0 regularization, the code uses
l1_coefficient
parameter name for the l0 loss calculation. This naming is confusing and makes the configuration less intuitive. This PR introduces a dedicatedl0_lambda
parameter to clearly separate these two regularization terms.Changes
l0_lambda
in config for l0 loss calculation (default: 0.0)l0_lambda
specification whenarchitecture="jumprelu"
is selectedl0_lambda
will raise an error to prevent silent failuresTesting
l0_lambda
with JumpReLU architectureNotes
This change improves code clarity while ensuring backward compatibility is maintained through proper error handling rather than silent parameter reuse.