-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment cannot reserve trial of parent experiment #576
Comments
Running trials from parent experiments may cause issue if the child experiment has a different script path, different code version or different cmdline call. We should attempt running the trial with the corresponding experiment configuration. It's not clear what to do if it fails. If we simply leave the trial status to interrupted the child experiment will try it again. Perhaps another option would be to copy the trial to the child experiment and run it with child configuration. This may cause issues however if the trial was partially executed and the user checkpointed it using the id |
During execution of the experiment the producer verifies that suggested trials do not already exist in parent or children, but race conditions can lead to duplicates. Also, in attempt to solve Epistimio#576, we will need to duplicate trials that are not completed in parents into executed experiments to allow reserving and executing the trials. This will lead to more potential duplicated trials and raise the important of handling duplicate properly. When fetching from the EVC, we should ignore duplicates from parent or children if the trials are available in current experiment. This will recursively solve the issue during recursive fetch from EVC. This will also simplify the handling of potential duplicates during {naive-}algorithm updates, as there will simply be no duplicates. How: During the call to adaptors, a set of hash is generated from trials of current nodes based on hyperparameter values (ignores experiment id). Any trials from the parents or child that has a hash found in this set of hash will be filtered out. When there is a duplicate, only the trial of the current node is kept. This also applies recursively to call from children experiments to grand-children.
During execution of the experiment the producer verifies that suggested trials do not already exist in parent or children, but race conditions can lead to duplicates. Also, in attempt to solve Epistimio#576, we will need to duplicate trials that are not completed in parents into executed experiments to allow reserving and executing the trials. This will lead to more potential duplicated trials and raise the important of handling duplicate properly. When fetching from the EVC, we should ignore duplicates from parent or children if the trials are available in current experiment. This will recursively solve the issue during recursive fetch from EVC. This will also simplify the handling of potential duplicates during {naive-}algorithm updates, as there will simply be no duplicates. How: During the call to adaptors, a set of hash is generated from trials of current nodes based on hyperparameter values (ignores experiment id). Any trials from the parents or child that has a hash found in this set of hash will be filtered out. When there is a duplicate, only the trial of the current node is kept. This also applies recursively to call from children experiments to grand-children.
This is very problematic as non-completed trials of parents cannot be execute anymore unless the environment state is reverted to the one used for parent experiment (ex: resetting code). It should look for executable trials across the EVC tree.
From @hbertrand
The text was updated successfully, but these errors were encountered: