Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parameter naming: Separate l0_lambda from l1_coefficient #376

Closed
wants to merge 7 commits into from

Conversation

muyo8692
Copy link

@muyo8692 muyo8692 commented Nov 14, 2024

Resolves #360

Description

Currently, when using the JumpReLU architecture which involves l0 regularization, the code uses l1_coefficient parameter name for the l0 loss calculation. This naming is confusing and makes the configuration less intuitive. This PR introduces a dedicated l0_lambda parameter to clearly separate these two regularization terms.

Changes

  • Added new parameter l0_lambda in config for l0 loss calculation (default: 0.0)
  • Added validation to enforce l0_lambda specification when architecture="jumprelu" is selected
  • Maintained existing behavior: using JumpReLU without specifying l0_lambda will raise an error to prevent silent failures

Testing

  • Verified numerical equivalence: models trained with the original code and modified code produce identical metrics
  • Added error handling test for missing l0_lambda with JumpReLU architecture

Notes

This change improves code clarity while ensuring backward compatibility is maintained through proper error handling rather than silent parameter reuse.

Screenshot 2024-11-14 at 10 41 13

@chanind
Copy link
Collaborator

chanind commented Nov 14, 2024

I'm a bit worried about adding lots of architecture-specific parameters to the config as this seems like it will get out of hand pretty quickly. I agree that using l1_coefficient for JumpReLU doesn't feel great, but I worry about the confusion burden of having lots of different parameters that apply only to 1 architecture in the config. Longer-term, we should probably move to a config structure with something like a architecture_params dictionary in the config where each architecture can have separate settings or something.

If we do decide to add this parameter, I'm not a fan of the name l0_lambda since we have l1_coefficient, and these feel like they should be consistent.

The benefit of an approach like in #360 where we just have a sparsity_coefficient is we don't need to have multiple params and multiple schedulers even though only 1 is actually relevant.

What do others think? cc @curt-tigges @hijohnnylin @anthonyduong9 @jbloomAus

.vscode/settings.json Outdated Show resolved Hide resolved
sae_lens/training/optim.py Outdated Show resolved Hide resolved
@muyo8692
Copy link
Author

I'm a bit worried about adding lots of architecture-specific parameters to the config as this seems like it will get out of hand pretty quickly. I agree that using l1_coefficient for JumpReLU doesn't feel great, but I worry about the confusion burden of having lots of different parameters that apply only to 1 architecture in the config. Longer-term, we should probably move to a config structure with something like a architecture_params dictionary in the config where each architecture can have separate settings or something.

If we do decide to add this parameter, I'm not a fan of the name l0_lambda since we have l1_coefficient, and these feel like they should be consistent.

The benefit of an approach like in #360 where we just have a sparsity_coefficient is we don't need to have multiple params and multiple schedulers even though only 1 is actually relevant.

thanks for the comments & totally agree with add sparsity_coefficient instead of individual configs.
I should have time to work on it this weekend

@muyo8692
Copy link
Author

@chanind I've made the adjustment, it should work fine now

@chanind
Copy link
Collaborator

chanind commented Nov 20, 2024

Thank you for doing this! There's a few more complications though:

  • @curt-tigges might be refactoring the configs / SAE classes soon to have separate architecture-specific config sections, which would make this unnecessary. @curt-tigges can talk to the timeline for that
  • We should make the old options still work for backwards compatibility with a deprecation message instructing the user to use sparsity_coefficient if their config has l1_coefficient still. Otherwise this would be a breaking change.
  • Docs/notebooks also need to be updated to talk about sparsity_coefficient instead of l1_coefficient. This is probably just a find/replace.

I'm not sure if it's worth it though if Curt is going to be refactoring this soon anyway. Up to you if you think it's worth doing!

@muyo8692
Copy link
Author

Good to hear there will be a refactoring of the configs, in this case I think I'll just wait for the new version! closing the PR

@muyo8692 muyo8692 closed this Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] Rename l1_coefficient to sparsity_coefficient
2 participants