Releases: mle-infrastructure/mle-toolbox
Releases · mle-infrastructure/mle-toolbox
Minor fixes 🔧
[v0.3.5] - [08/2024]
- Adds tfcuda behavior deterministic flag
Robustify & Fix os resource setting 🚂
[v0.3.4] - [03/2023]
Added
- Adds PBT, Successive Halving & Hyperband experiment support
- Adds report support for non-search experiments.
- Adds robust
local -> remote
experiment launching.
Changed
- Restructures experiment wrapper
launch_experiment
- Moves PBT experiment utilities to
mle-hyperopt
. - Fix versions of subpackages so that dependencies are static.
- Rename
get_jax_os_ready
toget_os_env_ready
and includedevice_config
in auto-setup ofMLExperiment
.
Fixed
- Updates
mle init
to work with vim editor. - Fixes all broken links in Readme.
Restructure imports - run in colab etc.
v0.3.3 Restructure imports
mle-monitor, GCS sync refactor, examples
Added
- Introduces experimental notebook-friendly
MLELauncher
which allows you to schedule experiments from within a local notebook. - Adds
mle protocol
subcommand to get a quick view of the last experiments and their status. - Adds
mle project
to initialize a new project based on cloning themle-project
repository.
Changed
- Refactors out resource monitoring and protocol database to
mle-monitor
sub-package. - Refactors out job launching and status monitoring to
mle-launcher
sub-package. - Moves population-based training and hypothesis testing into
experimental
submodule. - Moves documentation page to
mle-docs
sub-repository.
mle-hyperopt, mini features & bug fixes
Added
- 3D animation post-processing helpers (
animate_3D_scatter
andanimate_3D_surface
) and test coverage for visualizations (static/dynamic). nevergrad
multi-objective hyperparameter optimization. Checkout the toy example.- Adds
@experiment
decorator for easy integration:
from mle_toolbox import experiment
@experiment("configs/abc.json", model_config={"num_layers": 2})
def run(mle, a):
print(mle.model_config)
print(mle.log)
print(a)
if __name__ == "__main__":
run(a=2)
- Adds
combine_experiments
which loads differentmeta_log
andhyper_log
objects and makes them "dot"-accessible:
experiment_dirs = ["../tests/unit/fixtures/experiment_1",
"../tests/unit/fixtures/experiment_2"]
meta, hyper = combine_experiments(experiment_dirs, aggregate_seeds=False)
- Adds option to run grid search for multiple base configurations without having to create individual experiment configuration files.
Changed
- Configuration loading is now more toolbox specific.
load_json_config
andload_yaml_config
are now part ofmle-logging
. The toolbox now has two "new" functionload_job_config
andload_experiment_config
, which prepare the raw configs for future usage. - The
job_config
file now no longer has to be a.json
file, but can (and probably should) be a.yaml
file. This makes formatting easier. The hyperoptimization pipeline will generate configuration files that are of the same file type. - Moves core hyperparameter optimization functionality to
mle-hyperopt
. At this point the toolbox wraps around the search strategies and handles themle-logging
log loading/data retrieval. - Reduces test suite since all hyperopt strategy-internal tests are taken care of in
mle-hyperopt
.
Fixed
- Fixed unique file naming of zip files stored in GCS bucket. Now based on the time string.
- Grid engine monitoring now also tracks waiting/pending jobs.
- Fixes a bug in the random seed setting for synchronous batch jobs. Previously a new set of seeds was sampled for each batch. This lead to problems when aggregating different logs by their seed id. Now the first set of seeds is stored and provided as an input to all subsequent
JobQueue
startups.
mle-logging, PBT & fixes
Added
- Adds general processing job, which generalizes the post-processing job and enables 'shared'/centralized data pre-processing before a (search) experiment and results post-processing/figure generation afterwards. Checkout the MNIST example.
- Adds population-based training experiment type (still experimental). Checkout the MNIST example and the simple quadratic from the paper.
- Adds a set of unit/integration tests for more robustness and
flake8
linting. - Adds code coverage with secrets token.
- Adds
mle.ready_to_log
based onlog_every_k_updates
inlog_config
. No more modulo confusion. - Adds slack clusterbot integration which allows for notifications and report upload.
Changed
- Allow logging of array data in meta log
.hdf5
file by makingtolerant_mean
work for matrices. - Change configuration .yaml to use
experiment_type
instead ofjob_type
for clear distinction:- job: Single submission process on resource (e.g. single seed for single configuration)
- eval: Single parameter configuration which can be executed/trained for multiple seeds (individual jobs!)
- experiment: Refers to entire sequence of jobs to be executed (e.g. grid search with pre/post-processing)
- Restructure
experiment
subdirectory intojob
for consistent naming. - Refactor out
MLELogger
into separatemle-logging
package. It is a core ingredient that should stand alone.
Fixed
- Fixed
mle retrieve
to be actually useful and work robustly. - Fixed
mle report
to retrieve results if they don't exist (or to use a local directory provided by the user). - Fixed
mle report
to generate reports via.html
file and the dependencyxhtml2pdf
. - Fixed unique hash for experiment results storage. Previously this only used the content of
base_config.json
, which did not result in a unique hash when running different searches viajob_config.yaml
. Now the hash is generated based on a merged dictionary of the time string,base_config
andjob_config
Async Job Scheduling
Added
- Adds monitoring panel for GCP in
mle monitor
dashboard. - Adds asynchronous job launching via new
ExperimentQueue
and monitoring based onmax_running_jobs
budget. This release changes the previous job launching infrastructure. We no longer rely on one process per job, but monitor all scheduled jobs passively in a for-loop. - Adds GitHub Pages hosted documentation using mkdocs and the Material framework. The documentation is hosted under roberttlange.github.io/mle-toolbox. It is still very much work in progress.
Changed
- Adds support for additional setup bash files when launching GCP VM in
single_job_args
. - Adds Q/A for upload/deletion of directory to GCS bucket.
- All GCP-CPU resources are now queried via custom machine types - Default cheap n1.
- Separate different
requirements.txt
for minimal installation, examples and testing. - Restructures the search experiment API in the
.yaml
file. We now differentiate between 3 pillars:search_logging
: General options such as reloading of previous log, verbosity, metrics in.hdf5
log to monitor and how to do so.search_resources
: How many jobs, batches, maximum number of simultaneously running jobs, etc..search_config
: Options regarding the search type (random, grid, smbo) and the parameters to search over (spaces, resolution, etc.).
Google Cloud Platform Experiment Support
Added
- Adds
HypothesisTester
: Simple time average difference comparison between individual runs. With multiple testing correction and p-value plotting. Examplehypothesis_testing.ipynb
notebook. - Adds
MetaLog
andHyperLog
classes: Implement convenient functionalities likehyper_log.filter()
and ease the post-processing analysis. - Adds GCP job launch/monitor support for all experiment types and organizes GCS syncing of results.
Changed
load_result_logs
is now directly imported withimport mle_toolbox
since it is part of the core functionality.- Major restructuring of
experiment
sub-directory (local
,cluster
,cloud
) with easy 3 part extension for new resources:monitor
,launch
,check_job_args
Fixed
- Fixes plotting with new
MetaLog
andHyperLog
classes.
Bash Experiment, Encryption & Extra cmd inputs
Added
- Allows multi-config + multi-seed bash experiments. The user needs to take care of the input arguments (
-exp_dir
,-config_fname
,-seed_id
) themselves and within the bash script. We provide a minimal example of how to do so in examples/bash_configs. - Adds backend functions for
monitor_slurm_cluster
and local version to get resource utilisation. - Adds SHA-256 encryption/decryption of ssh credentials. Also part of initialization setup.
- Adds
extra_cmd_line_inputs
tosingle_job_args
so that you can add a static input via the command line. This will also be incorporated in theMLExperiment
asextra_config
dotmap
dictionary.
Changed
- Changes plots of monitored resource utilisation to
plotext
to avoid gnuplot dependency. - Changes logger interface: Now one has to provide dictionaries as inputs to
update_log
. This is supposed to make the logging more robust. - Changes template files and refactor/name files in
utils
subdirectory:core_experiment
: Includes helpers used in (almost) every experimentcore_files_load
: Helpers used to load various core components (configs)core_files_merge
: Helpers used to merge meta-logshelpers
: Random small functionalities (not crucial)
- Renames
hyperopt
subdirectory:hyperopt_<type>
,hyperspace
,hyperlogger
- Changes the naming of the config from
cc
tomle_config
for easy readability. - Changes the naming of files to be more intuitive: E.g.
abc_1_def.py
,abc_2_def.py
are changed toabc_def_1.py
,abc_def_2.py
Fixed
- Fixed local launch of remote projects via
screen
session and pipping toqrsh
orsrun --pty bash
. If you are on a local machine and runmle run
, you will get to choose the remote resource and can later reattach to that resource. - Fixed 2D plot with
fixed_params
. The naming as well as subtitle of the.png
files/plots accounts for the fixed parameter.
`mle init`, `MLE_Experiment` & refactoring
- Adds
mle init
to configure template toml. The command first searches for an existing config to update. If none is found we go through the process of updating values in a default config. - Print configuration and protocol summary with rich. This gets rid of
tabulate
dependency. - Update
monitor_slurm_cluster
to work with newmle monitor
. This gets rid ofcolorclass
,terminaltables
dependencies. - Fix report generation bug (everything has to be a string for markdown-ification!).
- Fix monitor bug: No longer reload the local database at each update call.
- Adds
get_jax_os_ready
helper for setting up JAX environment variables. - Adds
load_model_ckpt
for smooth reloading of stored checkpoints. - Add
MLE_Experiment
abstraction for minimal imports and smooth workflow. - A lot of internal refactoring: E.g. getting rid of
multi_runner
sub directory.