-
Notifications
You must be signed in to change notification settings - Fork 39
Detailed Documentation
The launch mode specifies where to run a job. There are currently 4 modes available.
doodad.mode.LocalMode()
This mode simply runs a script via command line using the default python command of your shell. It does not take in any arguments.
This mode is useful for debugging experiments before launching remotely.
doodad.mode.SSHDocker(
credentials=[SSHCredentials],
)
This mode launches scripts inside a remote docker instance, using the specified docker image. Docker must be installed on the host machine for this to work.
The recommended way to specify credentials is to point to an identity file, i.e.:
credentials = doodad.credentials.ssh.SSHCredentials(
hostname=[str],
username=[str],
identity_file=[str]
)
EC2 is supported via spot instances.
The easiest way to set up EC2 is to use the scripts/setup_ec2.py
file (detailed instructions) and use the EC2AutoconfigDocker
constructor (Use the EC2Mode
class to fill in AWS arguments manually):
doodad.mode.EC2Autoconfig(
s3_log_path=[str], # Folder to store log files under, under root directory of bucket
region=[str:'us-west-1'], # EC2 region
instance_type=[str:'m3.medium'], # EC2 instance type
spot_price=[float:0.02], # Maximum bid price
terminate_on_end=[bool:True], # Whether to terminate on finishing job
)
Output files will be stored on S3 under the folder s3://<bucket_name>/<s3_log_path>/
EC2 instance types can be found here and spot prices here. Generally, the c5
instances offer good performance, adequate memory, and good price.
Currently there is no automated setup script for GCP. You will have to follow the setup instructions.
doodad.mode.GCPDocker(
zone=[str:'us-west1-a'], # GCP zone
instance_type=[str:'n1-standard-2'], # GCP instance type
image_name=[str:'your-image'], # GCP image name
image_project=[str:'your-project'], # GCP image project
gcp_log_path=[str:'experiment'], # Folder to store log files under
terminate_on_end=[bool:True], # Whether to terminate on finishing job
use_gpu=[bool:False], # Whether to use GPUs
gpu_model=[str:'nvidia-tesla-t4'], # GPU type
num_gpu=[int:1] # Number of GPUs to use
)
Output files will be stored on Google cloud storage under the folder gs://<bucket_name>/<gcp_log_path>/XXXXX
The GPU model must be one of the models listed on the Google cloud website. Additionally, the docker image must have the nvidia container extension for docker installed. See the GCP setup instructions for more details.
All input and output data is handled by mount objects.
doodad.mount.MountLocal(
local_dir=[str], # The name of this directory on disk
mount_point=[str], # The name of this directory as visible to the script
pythonpath=[bool:False], # Whether to add this directory to the pythonpath
output=[bool:False], # Whether this directory is an empty directory for storing outputs.
filter_ext=[tuple(str):('.pyc', '.log', '.git', '.mp4')], # File extensions to not include
filter_dir=[tuple(str):('data')] # Directories to ignore
)
For remote launch modes (EC2, SSH), non-output directories will be copied to the remote server. Output directories will not be copied.
For SSH, output directories will not be copied back automatically. The directories will also show up as root permissions on disk, so you must copy back the data manually. I am currently working on a fix for this.
You can create mounts that directly point to specific branches in Git repositories. This feature is useful if you do not want to store a repository locally (and use MountLocal), or if you need to work with different versions of the same repository.
doodad.mount.MountGit(
git_url=[str], # Git URL
branch=[str:"master"], # Git branch
ssh_identity=[str], # SSH identity file for git clone
mount_point=[str],
pythonpath=[bool]
)
For EC2, all output mounts must be replaced by S3 mounts:
doodad.mount.MountS3(
s3_path=[str],
mount_point=[str], # Directory visible to the running job.
)
The contents of this folder will by synced to s3://<bucket_name>/<s3_log_path>/outputs/<s3_path>
, where the bucket and log paths are specified from the launch mode.
To pull all results for an experiment, you can use the following aws-cli command:
aws s3 sync s3://<bucket_name>/path/to/your/logs .
For GCP, all output mounts must be replaced by GCP mounts:
doodad.mount.MountGCP(
gcp_path=[str],
mount_point=[str], # Directory visible to the running job.
)
The contents of this folder will by synced to gs://<bucket_name>/<gcp_log_path>/logs/<gcp_path>
, where the bucket and log paths are specified from the launch mode.
To pull all results for an experiment, you can use the following gsutil command:
gsutil rsync gs://<bucket_name>/path/to/your/logs .
With the launch mode and mounts specified, we can now launch a python script using the launch_python
function:
doodad.launch.launch_api.launch_python(
target=[str],
mode=[LaunchMode],
mounts=[list(Mount)],
docker_image=[str:"ubuntu:18.04"],
)
The target
argument should be an absolute filepath to the target script. You can add python command-line arguments after the script.
mounts
should be a list of Mount
objects.
For non-python programs, you can directly run shell scripts
doodad.launch.launch_api.run_command(
command=[str],
mode=[LaunchMode],
mounts=[list(Mount)],
docker_image=[str:"ubuntu:18.04"],
)