-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
15 changed files
with
163 additions
and
105 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# Basic array operations | ||
|
||
The following examples show how to run a few basic Array API operations on Cubed arrays. | ||
|
||
## Adding two small arrays | ||
|
||
The first example adds two small 4x4 arrays together, and is useful for checking that the runtime is working. | ||
|
||
```{eval-rst} | ||
.. literalinclude:: ../../examples/add-asarray.py | ||
``` | ||
|
||
Paste the code into a file called `add-asarray.py`, or [download](/~https://github.com/cubed-dev/cubed/blob/main/examples/add-asarray.py) from GitHub, then run with: | ||
|
||
```shell | ||
python add-asarray.py | ||
``` | ||
|
||
If successful it will print a 4x4 array: | ||
|
||
``` | ||
[[ 2 4 6 8] | ||
[10 12 14 16] | ||
[18 20 22 24] | ||
[26 28 30 32]] | ||
``` | ||
|
||
## Adding two larger arrays | ||
|
||
The next example generates two random 20GB arrays and then adds them together. | ||
|
||
```{eval-rst} | ||
.. literalinclude:: ../../examples/add-random.py | ||
``` | ||
|
||
Paste the code into a file called `add-random.py`, or [download](/~https://github.com/cubed-dev/cubed/blob/main/examples/add-random.py) from GitHub, then run with: | ||
|
||
```shell | ||
python add-random.py | ||
``` | ||
|
||
This example demonstrates how we can use callbacks to gather information about the computation. | ||
|
||
- `RichProgressBar` shows a progress bar for the computation as it is running. | ||
- `TimelineVisualizationCallback` produces a plot (after the computation has completed) showing the timeline of events in the task lifecycle. | ||
- `HistoryCallback` produces various stats about the computation once it has completed. | ||
|
||
The plots and stats are written in the `history` directory in a directory with a timestamp. You can open the latest plot with | ||
|
||
```shell | ||
open $(ls -d history/compute-* | tail -1)/timeline.svg | ||
``` | ||
|
||
## Matmul | ||
|
||
The next example generates two random 5GB arrays and then multiplies them together. This is a more intensive computation than addition, and will take a few minutes to run locally. | ||
|
||
```{eval-rst} | ||
.. literalinclude:: ../../examples/matmul-random.py | ||
``` | ||
|
||
Paste the code into a file called `matmul-random.py`, or [download](/~https://github.com/cubed-dev/cubed/blob/main/examples/matmul-random.py) from GitHub, then run with: | ||
|
||
```shell | ||
python matmul-random.py | ||
``` | ||
|
||
## Trying different executors | ||
|
||
You can run these scripts using different executors by setting environment variables to control the Cubed configuration. | ||
|
||
For example, this will use the `processes` executor to run the example: | ||
|
||
```shell | ||
CUBED_SPEC__EXECUTOR_NAME=processes python add-random.py | ||
``` | ||
|
||
For cloud executors, it's usually best to put all of the configuration in one YAML file, and set the `CUBED_CONFIG` environment variable to point to it: | ||
|
||
```shell | ||
export CUBED_CONFIG=/path/to/lithops/aws/cubed.yaml | ||
python add-random.py | ||
``` | ||
|
||
You can read more about how [configuration](../configuration.md) works in Cubed in general, and detailed steps to run on a particular cloud service [here](#cloud-set-up). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# How to run | ||
|
||
## Local machine | ||
|
||
All the examples can be run on your laptop, so you can try them out in a familar environment before moving to the cloud. | ||
No extra set up is necessary in this case. | ||
|
||
(cloud-set-up)= | ||
## Cloud set up | ||
|
||
If you want to run using a cloud executor, first read <project:#which-cloud-service> | ||
|
||
Then follow the instructions for your chosen executor runtime from the table below. They assume that you have cloned the Cubed GitHub repository locally so that you have access to files needed for setting up the cloud executor. | ||
|
||
```shell | ||
git clone /~https://github.com/cubed-dev/cubed | ||
cd cubed/examples | ||
cd lithops/aws # or whichever executor/cloud combination you are using | ||
``` | ||
|
||
| Executor | Cloud | Set up instructions | | ||
|-----------|--------|------------------------------------------------| | ||
| Lithops | AWS | [lithops/aws/README.md](/~https://github.com/cubed-dev/cubed/blob/main/examples/lithops/aws/README.md) | | ||
| | Google | [lithops/gcp/README.md](/~https://github.com/cubed-dev/cubed/blob/main/examples/lithops/gcp/README.md) | | ||
| Modal | AWS | [modal/aws/README.md](/~https://github.com/cubed-dev/cubed/blob/main/examples/modal/aws/README.md) | | ||
| | Google | [modal/gcp/README.md](/~https://github.com/cubed-dev/cubed/blob/main/examples/modal/gcp/README.md) | | ||
| Coiled | AWS | [coiled/aws/README.md](/~https://github.com/cubed-dev/cubed/blob/main/examples/coiled/aws/README.md) | | ||
| Beam | Google | [dataflow/README.md](/~https://github.com/cubed-dev/cubed/blob/main/examples/dataflow/README.md) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Examples | ||
|
||
Various examples demonstrating what you can do with Cubed. | ||
|
||
```{toctree} | ||
--- | ||
maxdepth: 2 | ||
--- | ||
how-to-run | ||
basic-array-ops | ||
pangeo | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# Pangeo | ||
|
||
## Notebooks | ||
|
||
The following example notebooks demonstrate the use of Cubed with Xarray to tackle some challenging Pangeo workloads: | ||
|
||
1. [Pangeo Vorticity Workload](/~https://github.com/cubed-dev/cubed/blob/main/examples/pangeo-1-vorticity.ipynb) | ||
2. [Pangeo Quadratic Means Workload](/~https://github.com/cubed-dev/cubed/blob/main/examples/pangeo-2-quadratic-means.ipynb) | ||
3. [Pangeo Transformed Eulerian Mean Workload](/~https://github.com/cubed-dev/cubed/blob/main/examples/pangeo-3-tem.ipynb) | ||
4. [Pangeo Climatological Anomalies Workload](/~https://github.com/cubed-dev/cubed/blob/main/examples/pangeo-4-climatological-anomalies.ipynb) | ||
|
||
## Running the notebook examples | ||
|
||
Before running these notebook examples, you will need to install some additional dependencies (besides Cubed). | ||
|
||
`conda install rich pydot flox cubed-xarray` | ||
|
||
`cubed-xarray` is necessary to wrap Cubed arrays as Xarray DataArrays or Xarray Datasets. | ||
`flox` is for supporting efficient groupby operations in Xarray. | ||
`pydot` allows plotting the Cubed execution plan. | ||
`rich` is for showing progress of array operations within callbacks applied to Cubed plan operations. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,92 +1,5 @@ | ||
# Examples | ||
|
||
## Running on a local machine | ||
This directory contains Cubed examples in the form of Python scripts and Jupyter notebooks. There are also instructions for setting up Cubed executors to run on various cloud services. | ||
|
||
The `processes` executor is the recommended executor for running on a single machine, since it can use all the cores on the machine. | ||
|
||
## Which cloud service executor should I use? | ||
|
||
When it comes to scaling out, there are a number of executors that work in the cloud. | ||
|
||
[**Lithops**](https://lithops-cloud.github.io/) is the executor we recommend for most users, since it has had the most testing so far (~1000 workers). | ||
If your data is in Amazon S3 then use Lithops with AWS Lambda, and if it's in GCS use Lithops with Google Cloud Functions. You have to build a runtime environment as a part of the setting up process. | ||
|
||
[**Modal**](https://modal.com/) is very easy to get started with because it handles building a runtime environment for you automatically (note that it requires that you [sign up](https://modal.com/signup) for a free account). **At the time of writing, Modal does not guarantee that functions run in any particular cloud region, so it is not currently recommended that you run large computations since excessive data transfer fees are likely.** | ||
|
||
[**Coiled**](https://www.coiled.io/) is also easy to get started with ([sign up](https://cloud.coiled.io/signup)). It uses [Coiled Functions](https://docs.coiled.io/user_guide/usage/functions/index.html) and has a 1-2 minute overhead to start a cluster. | ||
|
||
[**Google Cloud Dataflow**](https://cloud.google.com/dataflow) is relatively straightforward to get started with. It has the highest overhead for worker startup (minutes compared to seconds for Modal or Lithops), and although it has only been tested with ~20 workers, it is a mature service and therefore should be reliable for much larger computations. | ||
|
||
## Set up | ||
|
||
Follow the instructions for setting up Cubed to run on your executor runtime: | ||
|
||
| Executor | Cloud | Set up instructions | | ||
|-----------|--------|------------------------------------------------| | ||
| Processes | N/A | `pip install 'cubed[diagnostics]'` | | ||
| Lithops | AWS | [lithops/aws/README.md](lithops/aws/README.md) | | ||
| | Google | [lithops/gcp/README.md](lithops/gcp/README.md) | | ||
| Modal | AWS | [modal/aws/README.md](modal/aws/README.md) | | ||
| | Google | [modal/gcp/README.md](modal/gcp/README.md) | | ||
| Coiled | AWS | [coiled/aws/README.md](coiled/aws/README.md) | | ||
| Beam | Google | [dataflow/README.md](dataflow/README.md) | | ||
|
||
## Examples | ||
|
||
The `add-asarray.py` script is a small example that adds two small 4x4 arrays together, and is useful for checking that the runtime is working. | ||
Export `CUBED_CONFIG` as described in the set up instructions, then run the script. This is for running on the local machine using the `processes` executor: | ||
|
||
```shell | ||
export CUBED_CONFIG=$(pwd)/processes/cubed.yaml | ||
python add-asarray.py | ||
``` | ||
|
||
This is for Lithops on AWS: | ||
|
||
```shell | ||
export CUBED_CONFIG=$(pwd)/lithops/aws/cubed.yaml | ||
python add-asarray.py | ||
``` | ||
|
||
If successful it should print a 4x4 array. | ||
|
||
The other examples are run in a similar way: | ||
|
||
```shell | ||
export CUBED_CONFIG=... | ||
python add-random.py | ||
``` | ||
|
||
and | ||
|
||
```shell | ||
export CUBED_CONFIG=... | ||
python matmul-random.py | ||
``` | ||
|
||
These will take longer to run as they operate on more data. | ||
|
||
The last two examples use `TimelineVisualizationCallback` which produce a plot showing the timeline of events in the task lifecycle, and `HistoryCallback` to produce stats about memory usage. | ||
The plots are SVG files and are written in the `history` directory in a directory with a timestamp. Open the latest one with | ||
|
||
```shell | ||
open $(ls -d history/compute-* | tail -1)/timeline.svg | ||
``` | ||
|
||
The memory usage stats are in a CSV file which you can view with | ||
|
||
|
||
```shell | ||
open $(ls -d history/compute-* | tail -1)/stats.csv | ||
``` | ||
|
||
## Running the notebook examples | ||
|
||
Before running these notebook examples, you will need to install some additional dependencies (besides Cubed). | ||
|
||
`mamba install rich pydot flox cubed-xarray` | ||
|
||
`cubed-xarray` is necessary to wrap Cubed arrays as Xarray DataArrays or Xarray Datasets. | ||
`flox` is for supporting efficient groupby operations in Xarray. | ||
`pydot` allows plotting the Cubed execution plan. | ||
`rich` is for showing progress of array operations within callbacks applied to Cubed plan operations. | ||
See the [documentation](https://cubed-dev.github.io/cubed/examples/index.html) for details. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters