Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment cannot reserve trial of parent experiment #576

Closed
bouthilx opened this issue Mar 23, 2021 · 1 comment · Fixed by #631
Closed

Experiment cannot reserve trial of parent experiment #576

bouthilx opened this issue Mar 23, 2021 · 1 comment · Fixed by #631
Labels
bug Indicates an unexpected problem or unintended behavior high The bug makes a feature unusable
Milestone

Comments

@bouthilx
Copy link
Member

bouthilx commented Mar 23, 2021

This is very problematic as non-completed trials of parents cannot be execute anymore unless the environment state is reverted to the one used for parent experiment (ex: resetting code). It should look for executable trials across the EVC tree.

From @hbertrand

@bouthilx bouthilx added bug Indicates an unexpected problem or unintended behavior high The bug makes a feature unusable labels Mar 23, 2021
@bouthilx bouthilx added this to the v0.1.16 milestone May 26, 2021
@bouthilx
Copy link
Member Author

Running trials from parent experiments may cause issue if the child experiment has a different script path, different code version or different cmdline call. We should attempt running the trial with the corresponding experiment configuration. It's not clear what to do if it fails. If we simply leave the trial status to interrupted the child experiment will try it again.

Perhaps another option would be to copy the trial to the child experiment and run it with child configuration. This may cause issues however if the trial was partially executed and the user checkpointed it using the id trial.hash_params. This means the trial may be resumed with a different code version.

bouthilx added a commit to bouthilx/orion that referenced this issue Jul 29, 2021
During execution of the experiment the producer verifies that suggested
trials do not already exist in parent or children, but race conditions
can lead to duplicates. Also, in attempt to solve Epistimio#576, we will need to
duplicate trials that are not completed in parents into executed
experiments to allow reserving and executing the trials. This will lead
to more potential duplicated trials and raise the important of handling
duplicate properly.

When fetching from the EVC, we should ignore duplicates from parent or
children if the trials are available in current experiment. This will
recursively solve the issue during recursive fetch from EVC. This will
also simplify the handling of potential duplicates during
{naive-}algorithm updates, as there will simply be no duplicates.

How:

During the call to adaptors, a set of hash is generated from trials of
current nodes based on hyperparameter values (ignores experiment id).
Any trials from the parents or child that has a hash found in this set
of hash will be filtered out. When there is a duplicate, only the trial
of the current node is kept. This also applies recursively to call from
children experiments to grand-children.
lebrice pushed a commit to lebrice/orion that referenced this issue Jun 27, 2022
During execution of the experiment the producer verifies that suggested
trials do not already exist in parent or children, but race conditions
can lead to duplicates. Also, in attempt to solve Epistimio#576, we will need to
duplicate trials that are not completed in parents into executed
experiments to allow reserving and executing the trials. This will lead
to more potential duplicated trials and raise the important of handling
duplicate properly.

When fetching from the EVC, we should ignore duplicates from parent or
children if the trials are available in current experiment. This will
recursively solve the issue during recursive fetch from EVC. This will
also simplify the handling of potential duplicates during
{naive-}algorithm updates, as there will simply be no duplicates.

How:

During the call to adaptors, a set of hash is generated from trials of
current nodes based on hyperparameter values (ignores experiment id).
Any trials from the parents or child that has a hash found in this set
of hash will be filtered out. When there is a duplicate, only the trial
of the current node is kept. This also applies recursively to call from
children experiments to grand-children.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior high The bug makes a feature unusable
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant