Release 0.3.0

mle-infrastructure · Aug 21, 2021 · ab038b5 · ab038b5
1 parent 9de3d9b
commit ab038b5
Show file tree

Hide file tree

Showing 9 changed files with 42 additions and 120 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,8 +1,8 @@
-### v0.3.0 - TBD
+### v0.3.0 - 08/21/2021
 
 ##### Added
 - Adds general processing job, which generalizes the post-processing job and enables 'shared'/centralized data pre-processing before a (search) experiment and results post-processing/figure generation afterwards. Checkout the [MNIST example](/~https://github.com/RobertTLange/mle-toolbox/blob/main/examples/torch_mnist/mnist_single.yaml).
-- Adds population-based training experiment type. Checkout the [MNIST example](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/pbt_mnist) and the [simple quadratic from the paper](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/pbt_quadratic).
+- Adds population-based training experiment type (still experimental). Checkout the [MNIST example](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/pbt_mnist) and the [simple quadratic from the paper](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/pbt_quadratic).
 - Adds a set of unit/integration tests for more robustness and `flake8` linting.
 - Adds code coverage with secrets token.
 - Adds `mle.ready_to_log` based on `log_every_k_updates` in `log_config`. No more modulo confusion.

diff --git a/Makefile b/Makefile
@@ -42,8 +42,9 @@ type-check:
 
 testing:
 	# Run unit tests: File loading, job template generation
-	# Run integration tests: Different experiment types
-	pytest -vv --durations=0 --cov=./ --cov-report=term-missing --cov-report=xml
+	pytest -vv --durations=0 --cov=./ --cov-report=term-missing --cov-report=xml tests/unit
+	# Run integration tests: Different experiment types [ignore report test]
+	pytest -vv --durations=0 --cov=./ --cov-report=term-missing --cov-report=xml tests/integration/experiment
 
 deploy-docs:
 	# Deploy documentation homepage: https://roberttlange.github.io/mle-toolbox/

diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-![MLE_Toolbox_Banner](https://github.com/RobertTLange/mle-toolbox/blob/main/docs/thumbnails/mle_thumbnail.png?raw=true)
+![MLE_Toolbox_Banner](https://roberttlange.github.io/mle-toolbox/thumbnails/mle_thumbnail.png)
 [![Pyversions](https://img.shields.io/pypi/pyversions/mle-toolbox.svg?style=flat-square)](https://pypi.python.org/pypi/mle-toolbox)
 [![Docs Latest](https://img.shields.io/badge/docs-dev-blue.svg)](https://roberttlange.github.io/mle-toolbox/)
 [![PyPI version](https://badge.fury.io/py/mle-toolbox.svg)](https://badge.fury.io/py/mle-toolbox)
@@ -10,9 +10,6 @@
 
 ML researchers need to coordinate different types of experiments on separate remote resources. The *Machine Learning Experiment (MLE)-Toolbox* is designed to facilitate the workflow by providing a simple interface, standardized logging, many common ML experiment types (multi-seed/configurations, grid-searches and hyperparameter optimization pipelines). You can run experiments on your local machine, high-performance compute clusters ([Slurm](https://slurm.schedmd.com/overview.html) and [Sun Grid Engine](http://bioinformatics.mdc-berlin.de/intro2UnixandSGE/sun_grid_engine_for_beginners/README.html)) as well as on cloud VMs ([GCP](https://cloud.google.com/gcp/)). The results are archived (locally/[GCS bucket](https://cloud.google.com/products/storage/)) and can easily be retrieved or automatically summarized/reported as `.md`/`.html` files.
 
-<span style="color:red">Add **basic example GIF** for toolbox application</span>.
-
-
 ## What Does The `mle-toolbox` Provide?
 
 1. API for launching jobs on cluster/cloud computing platforms (Slurm, GridEngine, GCP).
@@ -30,7 +27,7 @@ ML researchers need to coordinate different types of experiments on separate rem
 1. Follow the [instructions below](/~https://github.com/RobertTLange/mle-toolbox#installation-memo) to install the `mle-toolbox` and set up your credentials/configurations.
 2. Read the [docs](https://roberttlange.github.io/mle-toolbox) explaining the pillars of the toolbox & the experiment meta-configuration job `.yaml` files .
 3. Check out the [examples 📄](/~https://github.com/RobertTLange/mle-toolbox#examples-school_satchel) to get started: Toy [ODE integration](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/numpy_ode), training [PyTorch MNIST-CNNs](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/torch_mnist) or [VAEs in JAX](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/jax_vae).
-5. Run your own experiments using the [template files, project](/~https://github.com/RobertTLange/mle-project-template) and [`mle run`](https://roberttlange.github.io/mle-toolbox/core_api/mle_run/).
+4. Run your own experiments using the [template files, project](/~https://github.com/RobertTLange/mle-project-template) and [`mle run`](https://roberttlange.github.io/mle-toolbox/core_api/mle_run/).
 
 
 ## Installation ⏳
@@ -70,28 +67,31 @@ The configuration procedure consists of 3 optional steps, which depend on your n
 
 You are now ready to dive deeper into the specifics of [job configuration](https://roberttlange.github.io/mle-toolbox) and can start running your first experiments from the cluster (or locally on your machine) with the following commands:
 
-1. Setup of credentials & toolbox settings: [`mle init`](https://roberttlange.github.io/mle-toolbox/core_api/mle_init/)
-2. Start up an experiment: [`mle run`](https://roberttlange.github.io/mle-toolbox/core_api/mle_run/)
-3. Monitor resource utilisation: [`mle monitor`](https://roberttlange.github.io/mle-toolbox/core_api/mle_monitor/)
-4. Retrieve an experiment result: [`mle retrieve`](https://roberttlange.github.io/mle-toolbox/core_api/mle_retrieve/)
-5. Create an experiment report with figures: [`mle report`](https://roberttlange.github.io/mle-toolbox/core_api/mle_report/)
-6. Extract all GCS-stored results to your local drive: [`mle sync-gcs`](https://roberttlange.github.io/mle-toolbox/core_api/mle_sync_gcs/)
+|   | Command              |        Description                                                        |
+|-----------| -------------------------- | -------------------------------------------------------------- |
+|⏳| [`mle init`](https://roberttlange.github.io/mle-toolbox/core_api/mle_init/)       | Start up an experiment.              |
+|🚀| [`mle run`](https://roberttlange.github.io/mle-toolbox/core_api/mle_run/)       | Setup of credentials & toolbox settings.              |
+|🖥️| [`mle monitor`](https://roberttlange.github.io/mle-toolbox/core_api/mle_monitor/)       | Monitor resource utilisation.              |
+|📥	| [`mle retrieve`](https://roberttlange.github.io/mle-toolbox/core_api/mle_retrieve/)       | Retrieve an experiment result.              |
+|💌| [`mle report`](https://roberttlange.github.io/mle-toolbox/core_api/mle_report/)       | Create an experiment report with figures.              |
+|🔄| [`mle sync-gcs`](https://roberttlange.github.io/mle-toolbox/core_api/mle_sync_gcs/)       | Extract all GCS-stored results to your local drive.              |
 
 
 ## Examples 🎒
 
-| Example/Notebook              |        Description                                                        |
-| -------------------------- | -------------------------------------------------------------- |
-| 📄 **[Euler PDE](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/numpy_pde)**        | Integrate a PDE using forward Euler for different initial conditions.              |
-| 📄 **[MNIST CNN](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/mnist)**      | Train CNNs on multiple random seeds & different training configs.                             |
-| 📄 **[JAX VAE](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/jax_vae)**       | Search through the hyperparameter space of a MNIST VAE. |
-| 📄 **[Sklearn SVM](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/sklearn_svm)** | Train a SVM classifier to classify low-dimensional digits.            |
-|  📄 **[Multi Bash](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/bash_configs)**      | Launch multi-configuration experiments for bash based jobs.                        |
-| 📄 **[MNIST PBT](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/pbt_mnist)**            | Population-Based Training for a MNIST MLP network.                          |
-| 📓 **[Evaluation](/~https://github.com/RobertTLange/mle-toolbox/tree/main/notebooks/evaluate_results.ipynb)**          | Evaluation of gridsearch results (load/visualize). |
-| 📓 **[Testing](/~https://github.com/RobertTLange/mle-toolbox/tree/main/notebooks/hypothesis_testing.ipynb)**     | Compare different config logs & perform hypothesis tests.        |
-| 📓 **[GIF Animations](/~https://github.com/RobertTLange/mle-toolbox/tree/main/notebooks/animate_results.ipynb)** | Walk through set of animation helpers.      |
-|📓 **[PBT Evaluation](/~https://github.com/RobertTLange/mle-toolbox/tree/main/notebooks/inspect_pbt.ipynb)** | Inspect the result from Population-Based Training                                   |
+|              | Job Types|        Description                                                        |
+| -------------------------- |-------------- | -------------------------------------------------------------- |
+| 📄 **[Euler PDE](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/numpy_pde)** |  `multi-configs`, `hyperparameter-search`     | Integrate a PDE using forward Euler.              |
+| 📄 **[MNIST CNN](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/mnist)**      | `multi-configs`, `hyperparameter-search`     |Train PyTorch MNIST-CNNs.                             |
+| 📄 **[JAX VAE](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/jax_vae)**       | `hyperparameter-search`     | Train a JAX-based MNIST VAE. |
+| 📄 **[Sklearn SVM](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/sklearn_svm)** | `single-config`     | Train a Sklearn SVM classifier.            |
+|  📄 **[Multi Bash](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/bash_configs)**      | `multi-configs`     | Bash based jobs.                        |
+| 📄 **[Quadratic PBT](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/pbt_quadratic)**            | `population-based-training`    | PBT on toy quadratic surrogate.                          |
+| 📄 **[MNIST PBT](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/pbt_mnist)**            | `population-based-training`     | PBT for a MNIST MLP network.                          |
+| 📓 **[Evaluation](/~https://github.com/RobertTLange/mle-toolbox/tree/main/notebooks/evaluate_results.ipynb)**          | -     | Evaluation of gridsearch results. |
+| 📓 **[Testing](/~https://github.com/RobertTLange/mle-toolbox/tree/main/notebooks/hypothesis_testing.ipynb)**     | -     | Perform hypothesis tests on logs.        |
+| 📓 **[GIF Animations](/~https://github.com/RobertTLange/mle-toolbox/tree/main/notebooks/animate_results.ipynb)** | -     | Walk through a set of animation helpers.      |
+|📓 **[PBT Evaluation](/~https://github.com/RobertTLange/mle-toolbox/tree/main/notebooks/inspect_pbt.ipynb)** | -     | Inspect the result from PBT.                                   |
 
 ### Acknowledgements & Citing `mle-toolbox` ✏️
 
@@ -102,7 +102,7 @@ To cite this repository:
   author = {Robert Tjarko Lange},
   title = {{MLE-Toolbox}: A Reproducible Workflow for Machine Learning Experiments},
   url = {http://github.com/RobertTLange/mle-toolbox},
-  version = {1.0.0},
+  version = {0.3.0},
   year = {2021},
 }
 ```

diff --git a/config_template.toml b/config_template.toml
@@ -58,6 +58,7 @@ random_seed = 42
     # Default Slurm job arguments (if not supplied in job .yaml config)
     [slurm.default_job_args]
     num_logical_cores = 2
+    gpu_tpye = "tesla"
     partition = '<partition1>'
     job_name = 'temp'
     log_file = 'log'
@@ -96,6 +97,7 @@ random_seed = 42
     # Default SGE job arguments (if not differently supplied)
     [sge.default_job_arguments]
     num_logical_cores = 2
+    gpu_tpye = "RTX2080"
     queue = '<sge-queue-name>'
     job_name = 'temp'
     log_file = 'log'

diff --git a/docs/index.md b/docs/index.md
@@ -33,9 +33,11 @@ ML researchers need to coordinate different types of experiments on separate rem
 
 You are now ready to dive deeper into the specifics of [job configuration](setup/infrastructure/) and can start running your first experiments from the cluster (or locally on your machine) with the commands:
 
-1. Setup of credentials & toolbox settings: [`mle init`](https://roberttlange.github.io/mle-toolbox/core_api/mle_init/)
-2. Start up an experiment: [`mle run`](https://roberttlange.github.io/mle-toolbox/core_api/mle_run/)
-3. Monitor resource utilisation: [`mle monitor`](https://roberttlange.github.io/mle-toolbox/core_api/mle_monitor/)
-4. Retrieve an experiment result: [`mle retrieve`](https://roberttlange.github.io/mle-toolbox/core_api/mle_retrieve/)
-5. Create an experiment report with figures: [`mle report`](https://roberttlange.github.io/mle-toolbox/core_api/mle_report/)
-6. Extract all GCS-stored results to your local drive: [`mle sync-gcs`](https://roberttlange.github.io/mle-toolbox/core_api/mle_sync_gcs/)
+|   | Command              |        Description                                                        |
+|-----------| -------------------------- | -------------------------------------------------------------- |
+|⏳| [`mle init`](https://roberttlange.github.io/mle-toolbox/core_api/mle_init/)       | Start up an experiment.              |
+|🚀| [`mle run`](https://roberttlange.github.io/mle-toolbox/core_api/mle_run/)       | Setup of credentials & toolbox settings.              |
+|🖥️| [`mle monitor`](https://roberttlange.github.io/mle-toolbox/core_api/mle_monitor/)       | Monitor resource utilisation.              |
+|📥	| [`mle retrieve`](https://roberttlange.github.io/mle-toolbox/core_api/mle_retrieve/)       | Retrieve an experiment result.              |
+|💌| [`mle report`](https://roberttlange.github.io/mle-toolbox/core_api/mle_report/)       | Create an experiment report with figures.              |
+|🔄| [`mle sync-gcs`](https://roberttlange.github.io/mle-toolbox/core_api/mle_sync_gcs/)       | Extract all GCS-stored results to your local drive.              |
diff --git a/mle_toolbox/job/cluster/sge/helpers_launch_sge.py b/mle_toolbox/job/cluster/sge/helpers_launch_sge.py
@@ -10,7 +10,7 @@ def sge_generate_startup_file(job_arguments: dict) -> str:
     # Add desired number of requested gpus
     if "num_gpus" in job_arguments:
         if job_arguments["num_gpus"] > 0:
-            base_template += '#$ -l cuda="{num_gpus}(RTX2080)" \n'
+            base_template += '#$ -l cuda="{num_gpus}({gpu_type})" \n'
 
     # Exclude specific nodes from the queue
     if "exclude_nodes" in job_arguments:

diff --git a/mle_toolbox/job/cluster/slurm/helpers_launch_slurm.py b/mle_toolbox/job/cluster/slurm/helpers_launch_slurm.py
@@ -9,7 +9,7 @@ def slurm_generate_startup_file(job_arguments: dict) -> str:
     # Add desired number of requested gpus
     if "num_gpus" in job_arguments:
         if job_arguments["num_gpus"] > 0:
-            base_template += "#SBATCH --gres=gpu:tesla:{num_gpus} \n"
+            base_template += "#SBATCH --gres=gpu:{gpu_type}:{num_gpus} \n"
 
     # Set the max required memory per job
     if "memory_per_cpu" in job_arguments:

diff --git a/mle_toolbox/job/cluster/slurm/startup_script_slurm.py b/mle_toolbox/job/cluster/slurm/startup_script_slurm.py
@@ -1,6 +1,6 @@
 # Useful string lego building blocks for Slurm startup file formatting
 
-# Base qsub template
+# Base sbatch template
 slurm_base_job_config = """#!/bin/bash
 #SBATCH --job-name={job_name}                   # job name (not id)
 #SBATCH --output={log_file}.txt                 # output file

diff --git a/run_pbt.py b/run_pbt.py