Experiment cannot reserve trial of parent experiment #576

bouthilx · 2021-03-23T14:43:42Z

This is very problematic as non-completed trials of parents cannot be execute anymore unless the environment state is reverted to the one used for parent experiment (ex: resetting code). It should look for executable trials across the EVC tree.

From @hbertrand

bouthilx · 2021-07-23T13:44:56Z

Running trials from parent experiments may cause issue if the child experiment has a different script path, different code version or different cmdline call. We should attempt running the trial with the corresponding experiment configuration. It's not clear what to do if it fails. If we simply leave the trial status to interrupted the child experiment will try it again.

Perhaps another option would be to copy the trial to the child experiment and run it with child configuration. This may cause issues however if the trial was partially executed and the user checkpointed it using the id trial.hash_params. This means the trial may be resumed with a different code version.

During execution of the experiment the producer verifies that suggested trials do not already exist in parent or children, but race conditions can lead to duplicates. Also, in attempt to solve Epistimio#576, we will need to duplicate trials that are not completed in parents into executed experiments to allow reserving and executing the trials. This will lead to more potential duplicated trials and raise the important of handling duplicate properly. When fetching from the EVC, we should ignore duplicates from parent or children if the trials are available in current experiment. This will recursively solve the issue during recursive fetch from EVC. This will also simplify the handling of potential duplicates during {naive-}algorithm updates, as there will simply be no duplicates. How: During the call to adaptors, a set of hash is generated from trials of current nodes based on hyperparameter values (ignores experiment id). Any trials from the parents or child that has a hash found in this set of hash will be filtered out. When there is a duplicate, only the trial of the current node is kept. This also applies recursively to call from children experiments to grand-children.

bouthilx added bug Indicates an unexpected problem or unintended behavior high The bug makes a feature unusable labels Mar 23, 2021

bouthilx added this to the v0.1.16 milestone May 26, 2021

bouthilx mentioned this issue Jul 23, 2021

Avoid fetching duplicates in EVC #628

Closed

This was referenced Jul 29, 2021

Feature/filter duplicate in evc #630

Merged

Duplicate pending trials from parent/child for exc #631

Merged

bouthilx closed this as completed in #631 Aug 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment cannot reserve trial of parent experiment #576

Experiment cannot reserve trial of parent experiment #576

bouthilx commented Mar 23, 2021 •

edited

Loading

bouthilx commented Jul 23, 2021

Experiment cannot reserve trial of parent experiment #576

Experiment cannot reserve trial of parent experiment #576

Comments

bouthilx commented Mar 23, 2021 • edited Loading

bouthilx commented Jul 23, 2021

bouthilx commented Mar 23, 2021 •

edited

Loading