Populate the Config with default values before saving #5367

gabeorlanski · 2021-08-19T14:33:12Z

Is your feature request related to a problem? Please describe.
An issue I have encountered when using AllenNLP with Weights and Biases and more, in general, is that it is hard to replicate results without knowing the default values for non-specified keys. One solution is to comb the logs to find the default values, but that is very laborious. Take, for example, a config that uses the TransformerSquadReader:

"dataset_reader": {
    "type": "transformer_squad"
}

This is then saved, as is, during allennlp train. Unfortunately, it does not reflect the default values for the reader such as transformer_model_name="bert-base-cased", skip_impossible_questions=False, etc.

Describe the solution you'd like

Before both the config is saved and training begins, populate the Params object params with the default key:value pairs. The example earlier would become:

"dataset_reader": {
    "type": "transformer_squad",
    "transformer_model_name": "bert-base-cased",
    "length_limit": 384,
    "stride": 128,
    "skip_impossible_questions": false,
    "max_query_length": 64,
    "tokenizer_kwargs": null
}

This would help both reproducibility and experiment tracking. For Weights and Biases (as it is what I have been using), the config is saved to organize runs. Adding the default values before the rest of the train command would populate these fields.

Describe alternatives you've considered

I have previously looked at the logs as the default values appear there and manually inputted them. I have also tried to be as verbose as possible with my configs, which leads to bloated configs.

I have also considered creating a new command compose or populate-config to accomplish this task, but that seems like the extra step of running this before the train and managing the new configs would lead to more confusion.

Additional context

Ideally, this would be done without creating objects, but I have yet to figure out how to do that.

The text was updated successfully, but these errors were encountered:

amitkparekh · 2021-08-20T08:06:58Z

This seems useful, but if implemented I'd ask that it's non-default behaviour as it would likely break my current training configs.

I also tried to be verbose with my configuration. However, with all the repeated code, I went all in on using the features of Jsonnet, including functions and object-oriented compositions. I now compose my configs with functions and have a separate config file containing global options and flags to control multiple settings, like when debugging.

If populating the training config file was default behaviour, I can only assume it wouldn't know how to propagate to all the other configurations and functions, meaning it will likely break the configuration of anyone like me who went all in on Jsonnet.

gabeorlanski · 2021-08-20T12:32:37Z

Breaking older configs is definitely problematic and thus I think now that this should not be a part of the default train loop, it would probably be better to have it as a compose command that users can use before train

AkshitaB · 2021-08-23T02:45:24Z

@gabeorlanski That's a good idea -- especially the point about reproducibility! When the config is saved in the model.tar.gz, things like std.extVar(...) are resolved to their actual values. We can make sure that the default values are also populated, similarly.

@amitkparekh Thank you for bringing up the point about using jsonnet functions and compositions. Can you give a short example of a config that would break with this new addition?

amitkparekh · 2021-08-24T08:29:14Z

@AkshitaB I created an example config which is very similar to what I am currently using. It's a multitask model with multiple heads, some of which I turn on/off for ablation study and finding/controlling confounding variables.

The main.jsonnet is the file that linked when running allennlp train. I use config.jsonnet to handle globals and flags.

It won't run out-of-the-box because I've got custom modules that aren't in the core AllenNLP library or in models, but the Jsonnet does compile without error (into the all-together.json file).

github-actions · 2021-09-02T16:13:59Z

This issue is being closed due to lack of activity. If you think it still needs to be addressed, please comment on this thread 👇

gabeorlanski · 2021-09-10T12:11:59Z

@AkshitaB so I got a basic version working here: /~https://github.com/gabeorlanski/allennlp-hydra/blob/main/allennlp_hydra/config/fill_defaults.py

But some weird things started occurring when actually training with the full config. Like instances would no longer be read, and some other odd stuff. I think that the only real solution to the messy recursive inspect implementation would be to save the params passed to each call of from_params. Then inside of to_params get the saved values.

It would look something like:

class FromParams(CustomDetHash):

    @classmethod
    def from_params(
        cls: Type[T],
        params: Params,
        constructor_to_call: Callable[..., T] = None,
        constructor_to_inspect: Union[Callable[..., T], Callable[[T], None]] = None,
        **extras,
    ) -> T:
            ...
            params_copy = deepcopy(params)
            out = constructor_to_call(**kwargs)
            out._called_params = params_copy
            return out  

    def _to_params(self) -> Dict[str, Any]:
        return self._called_params

Something along those lines but that is a rough idea

gabeorlanski added the Feature request label Aug 19, 2021

github-actions bot added the stale label Sep 2, 2021

github-actions bot closed this as completed Sep 2, 2021

gabeorlanski mentioned this issue Sep 10, 2021

Registrable _to_params default functionality #5403

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Populate the Config with default values before saving #5367

Populate the Config with default values before saving #5367

gabeorlanski commented Aug 19, 2021

amitkparekh commented Aug 20, 2021

gabeorlanski commented Aug 20, 2021

AkshitaB commented Aug 23, 2021

amitkparekh commented Aug 24, 2021

github-actions bot commented Sep 2, 2021

gabeorlanski commented Sep 10, 2021

Populate the Config with default values before saving #5367

Populate the Config with default values before saving #5367

Comments

gabeorlanski commented Aug 19, 2021

amitkparekh commented Aug 20, 2021

gabeorlanski commented Aug 20, 2021

AkshitaB commented Aug 23, 2021

amitkparekh commented Aug 24, 2021

github-actions bot commented Sep 2, 2021

gabeorlanski commented Sep 10, 2021