Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.1.16rc1 #639

Merged
merged 54 commits into from
Aug 23, 2021
Merged

Release v0.1.16rc1 #639

merged 54 commits into from
Aug 23, 2021

Conversation

bouthilx
Copy link
Member

@bouthilx bouthilx commented Aug 23, 2021

Experiment Version Control (EVC) will now be disabled by default. When the EVC is disabled, any changes in the experiment will be saved to the DB, overriding the previous version. See configuration doc to enable the EVC: https://orion.readthedocs.io/en/stable/user/config.html#experiment-version-control.

🏗 Enhancements

🐛 Bug Fixes

📜 Documentation

bouthilx and others added 30 commits May 19, 2021 18:51
Merge back master in develop after release
fix benchmark ranking and regret visualization
To simplify documentation of algorithm plugins, they have been moved to
separate docs, with only pointers in core documentation.

The algorithms class documentation is also reused to avoid rewriting the
documentation of the arguments in sphinx.
Why:

Suppose we want to branch from a parent running from a different
computer, or for which we lost the execution script. The branching
should not fail because script is missing, we do not need it because we
wont execute trials from the parent anyway.

How:

Use allow_non_existing_files=True when building the cmdline parser to
compare cmdlines of experiments.
The parent may have a script configuration file that is missing at the
time of branching. Branching should not fail in such case and rely on
the saved content of the configuration file to verify changes.
Why:

The space upgrade relies on local files if the experiment's search space
is defined in a configuration file. Parsing these file during the DB
upgrade can break the DB because all experiments may not be executed on
the same file system and thus some configuration files may not be
present. The space should only be upgraded when the user attempts
running an experiment, in which case the configuration file is available.

The space does not need to be upgraded during DB upgrade anyway, because
experiment built is backward compatible with experiments lacking an
explicit space definition in DB (relying on cmdargs and config file to
define space at run-time).

How:

Remove space upgrade from db upgrade script.
…ng_files

Allow branching when parent script isn't available
The DB upgrade does not update the space and priors anymore. The are
handled anyway at runtime, no need to update them in the DB
The tests for the different EVC options were all inter-dependent. This
commit makes them independent, all starting from the same DB state using
the same command. This makes it much easier to make modifications in
these tests without affecting all following tests.
The EVC is a constant source of confusion for users. It should be
disabled by default with warning messages when different versions of
experiments are used. Users who wants using advanced features of the EVC
would still be able to use it by enabling it.

Making it false by default is a breaking change and may cause issues to
user currently using the EVC. Based on discussion with users there does
not seem to have much usage of the EVC so far so this breaking change
should be relatively harmless. Avoiding further confusion by making it
disabled by default is worth the breaking change.
Why:

If the EVC is not enabled, the consumer should always ignore the code
changes. It would not make sense to raise an error between 2 trials
because user's script code changed while EVC is disabled.

How:

If EVC is disabled, force ignore_code_changes to True in Consumer.
Add enable option for EVC, with default = false
During execution of the experiment the producer verifies that suggested
trials do not already exist in parent or children, but race conditions
can lead to duplicates. Also, in attempt to solve #576, we will need to
duplicate trials that are not completed in parents into executed
experiments to allow reserving and executing the trials. This will lead
to more potential duplicated trials and raise the important of handling
duplicate properly.

When fetching from the EVC, we should ignore duplicates from parent or
children if the trials are available in current experiment. This will
recursively solve the issue during recursive fetch from EVC. This will
also simplify the handling of potential duplicates during
{naive-}algorithm updates, as there will simply be no duplicates.

How:

During the call to adaptors, a set of hash is generated from trials of
current nodes based on hyperparameter values (ignores experiment id).
Any trials from the parents or child that has a hash found in this set
of hash will be filtered out. When there is a duplicate, only the trial
of the current node is kept. This also applies recursively to call from
children experiments to grand-children.
Why:

When a dimension is deleted or added, the adaptor should not transform
them with a default value of None if there was no default values. This
would lead to invalid trials if None is an invalid value of this
dimension.

How:

If the default value is the unique NO_DEFAULT_VALUE object, then the
trials should be filtered out.
bouthilx added 24 commits July 29, 2021 16:50
Why:

Experiment cannot reserve trial of parent experiment. This is very
problematic as non-completed trials of parents cannot be execute anymore
unless the environment state is reverted to the one used for parent
experiment (ex: resetting code). It should look for executable trials
across the EVC tree.

Running trials from parent experiments may cause issue if the child
experiment has a different script path, different code version or
different cmdline call. We should attempt running the trial with the
corresponding experiment configuration. It's not clear what to do if it
fails. If we simply leave the trial status to interrupted the child
experiment will try it again.

Another option is to copy the trial to the child experiment and run it
with child configuration. If the user checkpointed the trial state with
trial.hash_params, the checkpoint will be lost as trial.hash_params
will change based on the experiment id. This is safe, protecting users
from resuming with a different code version.

How:

Fetch trials from EVC tree and duplicate any pending trials to current
experiment. A hash of the params is used to avoid duplicating trials
that are already available in the current experiment.
Max and mean strategies were failing when all trials observed have no
valid objectives.
Why:

We cannot use python debugger (or pytest.set_trace()) during the
execution of the workers with joblib backend. We should have a simple
executor backend that is not using multithreading or multi-processing to
enable simple debugging. Also, since client's `workon()` helper function
does not support parallelism, it should use this simple executor.

How:

Use functools.partial to wrap submitted functions for future execution.
Why:

With loguniform the number of possible values is limited if precision is
used. Cardinality computation should account for this otherwise
algorithms may get stuck in suggest(). It happened to a user with a
prior loguniform(1e-4, 1e-2, precision=2). This gives only 181 possible
values.

How:

If real dimension has precision and prior loguniform, then compute
cardinality.

There is a problem with transformed space however. A linearized
dimension for instance would attempt to compute the cardinality with the
linearized bounds. What matters is the smallest cardinality between the
transformed space and the original space. The only case where
cardinality is smaller in transformed space is when real values are
discretized. Therefore, we only compute cardinality of transformed
dimensions if transformation lead to integer, otherwise we use the
cardinality of the original dimension.
Duplicate pending trials from parent/child for exc
Why:

When running an experiment, the parents may have lost trials that are
stuck to status reserved. If the user cannot run the parents, it must
then use `orion db set` to fix this manually. This should be done
automatically instead.

How:

Loop over the EVC and call `fix_lost_trials` for each experiments. Note
that this can increase significantly the cost of the command for large
EVC.

A ugly hack is used to allow running `fix_lost_trials` on the parents
that are in read mode. It would be great to find a better work around...
Compute cardinality for loguniform with precision
A warning about conflicts was always printed when building an
experiment. There should be no warning if there are no difference.
Warn only if diffs exists during exp build
@bouthilx bouthilx added this to the v0.1.16 milestone Aug 23, 2021
@bouthilx bouthilx merged commit 6bc3b79 into master Aug 23, 2021
@bouthilx bouthilx deleted the release-v0.1.16rc1 branch August 23, 2021 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants