Skip to content

Commit

Permalink
Release 0.3.0
Browse files Browse the repository at this point in the history
  • Loading branch information
RobertTLange committed Aug 21, 2021
1 parent 9de3d9b commit ab038b5
Show file tree
Hide file tree
Showing 9 changed files with 42 additions and 120 deletions.
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
### v0.3.0 - TBD
### v0.3.0 - 08/21/2021

##### Added
- Adds general processing job, which generalizes the post-processing job and enables 'shared'/centralized data pre-processing before a (search) experiment and results post-processing/figure generation afterwards. Checkout the [MNIST example](/~https://github.com/RobertTLange/mle-toolbox/blob/main/examples/torch_mnist/mnist_single.yaml).
- Adds population-based training experiment type. Checkout the [MNIST example](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/pbt_mnist) and the [simple quadratic from the paper](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/pbt_quadratic).
- Adds population-based training experiment type (still experimental). Checkout the [MNIST example](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/pbt_mnist) and the [simple quadratic from the paper](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/pbt_quadratic).
- Adds a set of unit/integration tests for more robustness and `flake8` linting.
- Adds code coverage with secrets token.
- Adds `mle.ready_to_log` based on `log_every_k_updates` in `log_config`. No more modulo confusion.
Expand Down
5 changes: 3 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,9 @@ type-check:

testing:
# Run unit tests: File loading, job template generation
# Run integration tests: Different experiment types
pytest -vv --durations=0 --cov=./ --cov-report=term-missing --cov-report=xml
pytest -vv --durations=0 --cov=./ --cov-report=term-missing --cov-report=xml tests/unit
# Run integration tests: Different experiment types [ignore report test]
pytest -vv --durations=0 --cov=./ --cov-report=term-missing --cov-report=xml tests/integration/experiment

deploy-docs:
# Deploy documentation homepage: https://roberttlange.github.io/mle-toolbox/
Expand Down
48 changes: 24 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
![MLE_Toolbox_Banner](https://github.com/RobertTLange/mle-toolbox/blob/main/docs/thumbnails/mle_thumbnail.png?raw=true)
![MLE_Toolbox_Banner](https://roberttlange.github.io/mle-toolbox/thumbnails/mle_thumbnail.png)
[![Pyversions](https://img.shields.io/pypi/pyversions/mle-toolbox.svg?style=flat-square)](https://pypi.python.org/pypi/mle-toolbox)
[![Docs Latest](https://img.shields.io/badge/docs-dev-blue.svg)](https://roberttlange.github.io/mle-toolbox/)
[![PyPI version](https://badge.fury.io/py/mle-toolbox.svg)](https://badge.fury.io/py/mle-toolbox)
Expand All @@ -10,9 +10,6 @@
ML researchers need to coordinate different types of experiments on separate remote resources. The *Machine Learning Experiment (MLE)-Toolbox* is designed to facilitate the workflow by providing a simple interface, standardized logging, many common ML experiment types (multi-seed/configurations, grid-searches and hyperparameter optimization pipelines). You can run experiments on your local machine, high-performance compute clusters ([Slurm](https://slurm.schedmd.com/overview.html) and [Sun Grid Engine](http://bioinformatics.mdc-berlin.de/intro2UnixandSGE/sun_grid_engine_for_beginners/README.html)) as well as on cloud VMs ([GCP](https://cloud.google.com/gcp/)). The results are archived (locally/[GCS bucket](https://cloud.google.com/products/storage/)) and can easily be retrieved or automatically summarized/reported as `.md`/`.html` files.

<span style="color:red">Add **basic example GIF** for toolbox application</span>.


## What Does The `mle-toolbox` Provide?

1. API for launching jobs on cluster/cloud computing platforms (Slurm, GridEngine, GCP).
Expand All @@ -30,7 +27,7 @@ ML researchers need to coordinate different types of experiments on separate rem
1. Follow the [instructions below](/~https://github.com/RobertTLange/mle-toolbox#installation-memo) to install the `mle-toolbox` and set up your credentials/configurations.
2. Read the [docs](https://roberttlange.github.io/mle-toolbox) explaining the pillars of the toolbox & the experiment meta-configuration job `.yaml` files .
3. Check out the [examples 📄](/~https://github.com/RobertTLange/mle-toolbox#examples-school_satchel) to get started: Toy [ODE integration](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/numpy_ode), training [PyTorch MNIST-CNNs](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/torch_mnist) or [VAEs in JAX](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/jax_vae).
5. Run your own experiments using the [template files, project](/~https://github.com/RobertTLange/mle-project-template) and [`mle run`](https://roberttlange.github.io/mle-toolbox/core_api/mle_run/).
4. Run your own experiments using the [template files, project](/~https://github.com/RobertTLange/mle-project-template) and [`mle run`](https://roberttlange.github.io/mle-toolbox/core_api/mle_run/).


## Installation ⏳
Expand Down Expand Up @@ -70,28 +67,31 @@ The configuration procedure consists of 3 optional steps, which depend on your n

You are now ready to dive deeper into the specifics of [job configuration](https://roberttlange.github.io/mle-toolbox) and can start running your first experiments from the cluster (or locally on your machine) with the following commands:

1. Setup of credentials & toolbox settings: [`mle init`](https://roberttlange.github.io/mle-toolbox/core_api/mle_init/)
2. Start up an experiment: [`mle run`](https://roberttlange.github.io/mle-toolbox/core_api/mle_run/)
3. Monitor resource utilisation: [`mle monitor`](https://roberttlange.github.io/mle-toolbox/core_api/mle_monitor/)
4. Retrieve an experiment result: [`mle retrieve`](https://roberttlange.github.io/mle-toolbox/core_api/mle_retrieve/)
5. Create an experiment report with figures: [`mle report`](https://roberttlange.github.io/mle-toolbox/core_api/mle_report/)
6. Extract all GCS-stored results to your local drive: [`mle sync-gcs`](https://roberttlange.github.io/mle-toolbox/core_api/mle_sync_gcs/)
| | Command | Description |
|-----------| -------------------------- | -------------------------------------------------------------- |
|| [`mle init`](https://roberttlange.github.io/mle-toolbox/core_api/mle_init/) | Start up an experiment. |
|🚀| [`mle run`](https://roberttlange.github.io/mle-toolbox/core_api/mle_run/) | Setup of credentials & toolbox settings. |
|🖥️| [`mle monitor`](https://roberttlange.github.io/mle-toolbox/core_api/mle_monitor/) | Monitor resource utilisation. |
|📥 | [`mle retrieve`](https://roberttlange.github.io/mle-toolbox/core_api/mle_retrieve/) | Retrieve an experiment result. |
|💌| [`mle report`](https://roberttlange.github.io/mle-toolbox/core_api/mle_report/) | Create an experiment report with figures. |
|🔄| [`mle sync-gcs`](https://roberttlange.github.io/mle-toolbox/core_api/mle_sync_gcs/) | Extract all GCS-stored results to your local drive. |


## Examples 🎒

| Example/Notebook | Description |
| -------------------------- | -------------------------------------------------------------- |
| 📄 **[Euler PDE](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/numpy_pde)** | Integrate a PDE using forward Euler for different initial conditions. |
| 📄 **[MNIST CNN](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/mnist)** | Train CNNs on multiple random seeds & different training configs. |
| 📄 **[JAX VAE](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/jax_vae)** | Search through the hyperparameter space of a MNIST VAE. |
| 📄 **[Sklearn SVM](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/sklearn_svm)** | Train a SVM classifier to classify low-dimensional digits. |
| 📄 **[Multi Bash](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/bash_configs)** | Launch multi-configuration experiments for bash based jobs. |
| 📄 **[MNIST PBT](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/pbt_mnist)** | Population-Based Training for a MNIST MLP network. |
| 📓 **[Evaluation](/~https://github.com/RobertTLange/mle-toolbox/tree/main/notebooks/evaluate_results.ipynb)** | Evaluation of gridsearch results (load/visualize). |
| 📓 **[Testing](/~https://github.com/RobertTLange/mle-toolbox/tree/main/notebooks/hypothesis_testing.ipynb)** | Compare different config logs & perform hypothesis tests. |
| 📓 **[GIF Animations](/~https://github.com/RobertTLange/mle-toolbox/tree/main/notebooks/animate_results.ipynb)** | Walk through set of animation helpers. |
|📓 **[PBT Evaluation](/~https://github.com/RobertTLange/mle-toolbox/tree/main/notebooks/inspect_pbt.ipynb)** | Inspect the result from Population-Based Training |
| | Job Types| Description |
| -------------------------- |-------------- | -------------------------------------------------------------- |
| 📄 **[Euler PDE](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/numpy_pde)** | `multi-configs`, `hyperparameter-search` | Integrate a PDE using forward Euler. |
| 📄 **[MNIST CNN](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/mnist)** | `multi-configs`, `hyperparameter-search` |Train PyTorch MNIST-CNNs. |
| 📄 **[JAX VAE](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/jax_vae)** | `hyperparameter-search` | Train a JAX-based MNIST VAE. |
| 📄 **[Sklearn SVM](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/sklearn_svm)** | `single-config` | Train a Sklearn SVM classifier. |
| 📄 **[Multi Bash](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/bash_configs)** | `multi-configs` | Bash based jobs. |
| 📄 **[Quadratic PBT](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/pbt_quadratic)** | `population-based-training` | PBT on toy quadratic surrogate. |
| 📄 **[MNIST PBT](/~https://github.com/RobertTLange/mle-toolbox/tree/main/examples/pbt_mnist)** | `population-based-training` | PBT for a MNIST MLP network. |
| 📓 **[Evaluation](/~https://github.com/RobertTLange/mle-toolbox/tree/main/notebooks/evaluate_results.ipynb)** | - | Evaluation of gridsearch results. |
| 📓 **[Testing](/~https://github.com/RobertTLange/mle-toolbox/tree/main/notebooks/hypothesis_testing.ipynb)** | - | Perform hypothesis tests on logs. |
| 📓 **[GIF Animations](/~https://github.com/RobertTLange/mle-toolbox/tree/main/notebooks/animate_results.ipynb)** | - | Walk through a set of animation helpers. |
|📓 **[PBT Evaluation](/~https://github.com/RobertTLange/mle-toolbox/tree/main/notebooks/inspect_pbt.ipynb)** | - | Inspect the result from PBT. |

### Acknowledgements & Citing `mle-toolbox` ✏️

Expand All @@ -102,7 +102,7 @@ To cite this repository:
author = {Robert Tjarko Lange},
title = {{MLE-Toolbox}: A Reproducible Workflow for Machine Learning Experiments},
url = {http://github.com/RobertTLange/mle-toolbox},
version = {1.0.0},
version = {0.3.0},
year = {2021},
}
```
Expand Down
2 changes: 2 additions & 0 deletions config_template.toml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ random_seed = 42
# Default Slurm job arguments (if not supplied in job .yaml config)
[slurm.default_job_args]
num_logical_cores = 2
gpu_tpye = "tesla"
partition = '<partition1>'
job_name = 'temp'
log_file = 'log'
Expand Down Expand Up @@ -96,6 +97,7 @@ random_seed = 42
# Default SGE job arguments (if not differently supplied)
[sge.default_job_arguments]
num_logical_cores = 2
gpu_tpye = "RTX2080"
queue = '<sge-queue-name>'
job_name = 'temp'
log_file = 'log'
Expand Down
14 changes: 8 additions & 6 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,11 @@ ML researchers need to coordinate different types of experiments on separate rem

You are now ready to dive deeper into the specifics of [job configuration](setup/infrastructure/) and can start running your first experiments from the cluster (or locally on your machine) with the commands:

1. Setup of credentials & toolbox settings: [`mle init`](https://roberttlange.github.io/mle-toolbox/core_api/mle_init/)
2. Start up an experiment: [`mle run`](https://roberttlange.github.io/mle-toolbox/core_api/mle_run/)
3. Monitor resource utilisation: [`mle monitor`](https://roberttlange.github.io/mle-toolbox/core_api/mle_monitor/)
4. Retrieve an experiment result: [`mle retrieve`](https://roberttlange.github.io/mle-toolbox/core_api/mle_retrieve/)
5. Create an experiment report with figures: [`mle report`](https://roberttlange.github.io/mle-toolbox/core_api/mle_report/)
6. Extract all GCS-stored results to your local drive: [`mle sync-gcs`](https://roberttlange.github.io/mle-toolbox/core_api/mle_sync_gcs/)
| | Command | Description |
|-----------| -------------------------- | -------------------------------------------------------------- |
|| [`mle init`](https://roberttlange.github.io/mle-toolbox/core_api/mle_init/) | Start up an experiment. |
|🚀| [`mle run`](https://roberttlange.github.io/mle-toolbox/core_api/mle_run/) | Setup of credentials & toolbox settings. |
|🖥️| [`mle monitor`](https://roberttlange.github.io/mle-toolbox/core_api/mle_monitor/) | Monitor resource utilisation. |
|📥 | [`mle retrieve`](https://roberttlange.github.io/mle-toolbox/core_api/mle_retrieve/) | Retrieve an experiment result. |
|💌| [`mle report`](https://roberttlange.github.io/mle-toolbox/core_api/mle_report/) | Create an experiment report with figures. |
|🔄| [`mle sync-gcs`](https://roberttlange.github.io/mle-toolbox/core_api/mle_sync_gcs/) | Extract all GCS-stored results to your local drive. |
2 changes: 1 addition & 1 deletion mle_toolbox/job/cluster/sge/helpers_launch_sge.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ def sge_generate_startup_file(job_arguments: dict) -> str:
# Add desired number of requested gpus
if "num_gpus" in job_arguments:
if job_arguments["num_gpus"] > 0:
base_template += '#$ -l cuda="{num_gpus}(RTX2080)" \n'
base_template += '#$ -l cuda="{num_gpus}({gpu_type})" \n'

# Exclude specific nodes from the queue
if "exclude_nodes" in job_arguments:
Expand Down
2 changes: 1 addition & 1 deletion mle_toolbox/job/cluster/slurm/helpers_launch_slurm.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ def slurm_generate_startup_file(job_arguments: dict) -> str:
# Add desired number of requested gpus
if "num_gpus" in job_arguments:
if job_arguments["num_gpus"] > 0:
base_template += "#SBATCH --gres=gpu:tesla:{num_gpus} \n"
base_template += "#SBATCH --gres=gpu:{gpu_type}:{num_gpus} \n"

# Set the max required memory per job
if "memory_per_cpu" in job_arguments:
Expand Down
2 changes: 1 addition & 1 deletion mle_toolbox/job/cluster/slurm/startup_script_slurm.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Useful string lego building blocks for Slurm startup file formatting

# Base qsub template
# Base sbatch template
slurm_base_job_config = """#!/bin/bash
#SBATCH --job-name={job_name} # job name (not id)
#SBATCH --output={log_file}.txt # output file
Expand Down
83 changes: 0 additions & 83 deletions run_pbt.py

This file was deleted.

0 comments on commit ab038b5

Please sign in to comment.