How to write Aurora configuration files, including feature descriptions
and best practices. When writing a configuration file, make use of
aurora job inspect
. It takes the same job key and configuration file
arguments as aurora job create
or aurora job update
. It first ensures the
configuration parses, then outputs it in human-readable form.
You should read this after going through the general Aurora Tutorial.
- Aurora Configuration Tutorial
To run a job on Aurora, you must specify a configuration file that tells
Aurora what it needs to know to schedule the job, what Mesos needs to
run the tasks the job is made up of, and what Thermos needs to run the
processes that make up the tasks. This file must have
a.aurora
suffix.
A configuration file defines a collection of objects, along with parameter values for their attributes. An Aurora configuration file contains the following three types of objects:
- Job
- Task
- Process
A configuration also specifies a list of Job
objects assigned
to the variable jobs
.
- jobs (list of defined Jobs to run)
The .aurora
file format is just Python. However, Job
, Task
,
Process
, and other classes are defined by a type-checked dictionary
templating library called Pystachio, a powerful tool for
configuration specification and reuse. Pystachio objects are tailored
via {{}} surrounded templates.
When writing your .aurora
file, you may use any Pystachio datatypes, as
well as any objects shown in the Aurora+Thermos Configuration
Reference, without import
statements - the
Aurora config loader injects them automatically. Other than that, an .aurora
file works like any other Python script.
Aurora+Thermos Configuration Reference has a full reference of all Aurora/Thermos defined Pystachio objects.
A well-structured configuration starts with structural templates (if
any). Structural templates encapsulate in their attributes all the
differences between Jobs in the configuration that are not directly
manipulated at the Job
level, but typically at the Process
or Task
level. For example, if certain processes are invoked with slightly
different settings or input.
After structural templates, define, in order, Process
es, Task
s, and
Job
s.
Structural template names should be UpperCamelCased and their
instantiations are typically UPPER_SNAKE_CASED. Process
, Task
,
and Job
names are typically lower_snake_cased. Indentation is typically 2
spaces.
The following is a typical configuration file. Don't worry if there are parts you don't understand yet, but you may want to refer back to this as you read about its individual parts. Note that names surrounded by curly braces {{}} are template variables, which the system replaces with bound values for the variables.
# --- templates here ---
class Profile(Struct):
package_version = Default(String, 'live')
java_binary = Default(String, '/usr/lib/jvm/java-1.7.0-openjdk/bin/java')
extra_jvm_options = Default(String, '')
parent_environment = Default(String, 'prod')
parent_serverset = Default(String,
'/foocorp/service/bird/{{parent_environment}}/bird')
# --- processes here ---
main = Process(
name = 'application',
cmdline = '{{profile.java_binary}} -server -Xmx1792m '
'{{profile.extra_jvm_options}} '
'-jar application.jar '
'-upstreamService {{profile.parent_serverset}}'
)
# --- tasks ---
base_task = SequentialTask(
name = 'application',
processes = [
Process(
name = 'fetch',
cmdline = 'curl -O
https://packages.foocorp.com/{{profile.package_version}}/application.jar'),
]
)
# not always necessary but often useful to have separate task
# resource classes
staging_task = base_task(resources =
Resources(cpu = 1.0,
ram = 2048*MB,
disk = 1*GB))
production_task = base_task(resources =
Resources(cpu = 4.0,
ram = 2560*MB,
disk = 10*GB))
# --- job template ---
job_template = Job(
name = 'application',
role = 'myteam',
contact = 'myteam-team@foocorp.com',
instances = 20,
service = True,
task = production_task
)
# -- profile instantiations (if any) ---
PRODUCTION = Profile()
STAGING = Profile(
extra_jvm_options = '-Xloggc:gc.log',
parent_environment = 'staging'
)
# -- job instantiations --
jobs = [
job_template(cluster = 'cluster1', environment = 'prod')
.bind(profile = PRODUCTION),
job_template(cluster = 'cluster2', environment = 'prod')
.bind(profile = PRODUCTION),
job_template(cluster = 'cluster1',
environment = 'staging',
service = False,
task = staging_task,
instances = 2)
.bind(profile = STAGING),
]
Processes are handled by the Thermos system. A process is a single executable step run as a part of an Aurora task, which consists of a bash-executable statement.
The key (and required) Process
attributes are:
name
: Any string which is a valid Unix filename (no slashes, NULLs, or leading periods). Thename
value must be unique relative to other Processes in aTask
.cmdline
: A command line run in a bash subshell, so you can use bash scripts. Nothing is supplied for command-line arguments, so$*
is unspecified.
Many tiny processes make managing configurations more difficult. For example, the following is a bad way to define processes.
copy = Process(
name = 'copy',
cmdline = 'curl -O https://packages.foocorp.com/app.zip'
)
unpack = Process(
name = 'unpack',
cmdline = 'unzip app.zip'
)
remove = Process(
name = 'remove',
cmdline = 'rm -f app.zip'
)
run = Process(
name = 'app',
cmdline = 'java -jar app.jar'
)
run_task = Task(
processes = [copy, unpack, remove, run],
constraints = order(copy, unpack, remove, run)
)
Since cmdline
runs in a bash subshell, you can chain commands
with &&
or ||
.
When defining a Task
that is just a list of Processes run in a
particular order, use SequentialTask
, as described in the Defining
Task
Objects section. The following simplifies and combines the
above multiple Process
definitions into just two.
stage = Process(
name = 'stage',
cmdline = 'curl -O https://packages.foocorp.com/app.zip && '
'unzip app.zip && rm -f app.zip')
run = Process(name = 'app', cmdline = 'java -jar app.jar')
run_task = SequentialTask(processes = [stage, run])
Process
also has optional attributes to customize its behaviour. Details can be found in the Aurora+Thermos Configuration Reference.
When using Aurora, you need to get your executable code into its "sandbox", specifically the Task sandbox where the code executes for the Processes that make up that Task.
Each Task has a sandbox created when the Task starts and garbage collected when it finishes. All of a Task's processes run in its sandbox, so processes can share state by using a shared current working directory.
Typically, you save this code somewhere. You then need to define a Process
in your .aurora
configuration file that fetches the code from that somewhere
to where the slave can see it. For a public cloud, that can be anywhere public on
the Internet, such as S3. For a private cloud internal storage, you need to put in
on an accessible HDFS cluster or similar storage.
The template for this Process is:
<name> = Process(
name = '<name>'
cmdline = '<command to copy and extract code archive into current working directory>'
)
Note: Be sure the extracted code archive has an executable.
Tasks are handled by Mesos. A task is a collection of processes that runs in a shared sandbox. It's the fundamental unit Aurora uses to schedule the datacenter; essentially what Aurora does is find places in the cluster to run tasks.
The key (and required) parts of a Task are:
-
name
: A string giving the Task's name. By default, if a Task is not given a name, it inherits the first name in its Process list. -
processes
: An unordered list of Process objects bound to the Task. The value of the optionalconstraints
attribute affects the contents as a whole. Currently, the only constraint,order
, determines if the processes run in parallel or sequentially. -
resources
: AResource
object defining the Task's resource footprint. AResource
object has three attributes: -cpu
: A Float, the fractional number of cores the Task requires. -ram
: An Integer, RAM bytes the Task requires. -disk
: An integer, disk bytes the Task requires.
A basic Task definition looks like:
Task(
name="hello_world",
processes=[Process(name = "hello_world", cmdline = "echo hello world")],
resources=Resources(cpu = 1.0,
ram = 1*GB,
disk = 1*GB))
A Task has optional attributes to customize its behaviour. Details can be found in the Aurora+Thermos Configuration Reference
By default, a Task with several Processes runs them in parallel. There are two ways to run Processes sequentially:
-
Include an
order
constraint in the Task definition'sconstraints
attribute whose arguments specify the processes' run order:Task( ... processes=[process1, process2, process3], constraints = order(process1, process2, process3), ...)
-
Use
SequentialTask
instead ofTask
; it automatically runs processes in the order specified in theprocesses
attribute. Noconstraint
parameter is needed:SequentialTask( ... processes=[process1, process2, process3] ...)
For quickly creating simple tasks, use the SimpleTask
helper. It
creates a basic task from a provided name and command line using a
default set of resources. For example, in a .aurora
configuration
file:
SimpleTask(name="hello_world", command="echo hello world")
is equivalent to
Task(name="hello_world",
processes=[Process(name = "hello_world", cmdline = "echo hello world")],
resources=Resources(cpu = 1.0,
ram = 1*GB,
disk = 1*GB))
The simplest idiomatic Job configuration thus becomes:
import os
hello_world_job = Job(
task=SimpleTask(name="hello_world", command="echo hello world"),
role=os.getenv('USER'),
cluster="cluster1")
When written to hello_world.aurora
, you invoke it with a simple
aurora job create cluster1/$USER/test/hello_world hello_world.aurora
.
Tasks.concat
(synonym,concat_tasks
) and
Tasks.combine
(synonym,combine_tasks
) merge multiple Task definitions
into a single Task. It may be easier to define complex Jobs
as smaller constituent Tasks. But since a Job only includes a single
Task, the subtasks must be combined before using them in a Job.
Smaller Tasks can also be reused between Jobs, instead of having to
repeat their definition for multiple Jobs.
With both methods, the merged Task takes the first Task's name. The difference between the two is the result Task's process ordering.
-
Tasks.combine
runs its subtasks' processes in no particular order. The new Task's resource consumption is the sum of all its subtasks' consumption. -
Tasks.concat
runs its subtasks in the order supplied, with each subtask's processes run serially between tasks. It is analogous to theorder
constraint helper, except at the Task level instead of the Process level. The new Task's resource consumption is the maximum value specified by any subtask for each Resource attribute (cpu, ram and disk).
For example, given the following:
setup_task = Task(
...
processes=[download_interpreter, update_zookeeper],
# It is important to note that {{Tasks.concat}} has
# no effect on the ordering of the processes within a task;
# hence the necessity of the {{order}} statement below
# (otherwise, the order in which {{download_interpreter}}
# and {{update_zookeeper}} run will be non-deterministic)
constraints=order(download_interpreter, update_zookeeper),
...
)
run_task = SequentialTask(
...
processes=[download_application, start_application],
...
)
combined_task = Tasks.concat(setup_task, run_task)
The Tasks.concat
command merges the two Tasks into a single Task and
ensures all processes in setup_task
run before the processes
in run_task
. Conceptually, the task is reduced to:
task = Task(
...
processes=[download_interpreter, update_zookeeper,
download_application, start_application],
constraints=order(download_interpreter, update_zookeeper,
download_application, start_application),
...
)
In the case of Tasks.combine
, the two schedules run in parallel:
task = Task(
...
processes=[download_interpreter, update_zookeeper,
download_application, start_application],
constraints=order(download_interpreter, update_zookeeper) +
order(download_application, start_application),
...
)
In the latter case, each of the two sequences may operate in parallel.
Of course, this may not be the intended behavior (for example, if
the start_application
Process implicitly relies
upon download_interpreter
). Make sure you understand the difference
between using one or the other.
A job is a group of identical tasks that Aurora can run in a Mesos cluster.
A Job
object is defined by the values of several attributes, some
required and some optional. The required attributes are:
-
task
: Task object to bind to this job. Note that a Job can only take a single Task. -
role
: Job's role account; in other words, the user account to run the job as on a Mesos cluster machine. A common value isos.getenv('USER')
; using a Python command to get the user who submits the job request. The other common value is the service account that runs the job, e.g.www-data
. -
environment
: Job's environment, typical values aredevel
,test
, orprod
. -
cluster
: Aurora cluster to schedule the job in, defined in/etc/aurora/clusters.json
or~/.clusters.json
. You can specify jobs where the only difference is thecluster
, then at run time only run the Job whose job key includes your desired cluster's name.
You usually see a name
parameter. By default, name
inherits its
value from the Job's associated Task object, but you can override this
default. For these four parameters, a Job definition might look like:
foo_job = Job( name = 'foo', cluster = 'cluster1',
role = os.getenv('USER'), environment = 'prod',
task = foo_task)
In addition to the required attributes, there are several optional attributes. Details can be found in the Aurora+Thermos Configuration Reference.
At the end of your .aurora
file, you need to specify a list of the
file's defined Jobs to run in the order listed. For example, the
following runs first job1
, then job2
, then job3
.
jobs = [job1, job2, job3]
The .aurora
file format is just Python. However, Job
, Task
,
Process
, and other classes are defined by a templating library called
Pystachio, a powerful tool for configuration specification and reuse.
Aurora+Thermos Configuration Reference has a full reference of all Aurora/Thermos defined Pystachio objects.
When writing your .aurora
file, you may use any Pystachio datatypes, as
well as any objects shown in the Aurora+Thermos Configuration
Reference without import
statements - the Aurora config loader
injects them automatically. Other than that the .aurora
format
works like any other Python script.
Pystachio uses the visually distinctive {{}} to indicate template variables. These are often called "mustache variables" after the similarly appearing variables in the Mustache templating system and because the curly braces resemble mustaches.
If you are familiar with the Mustache system, templates in Pystachio have significant differences. They have no nesting, joining, or inheritance semantics. On the other hand, when evaluated, templates are evaluated iteratively, so this affords some level of indirection.
Let's start with the simplest template; text with one
variable, in this case name
;
Hello {{name}}
If we evaluate this as is, we'd get back:
Hello
If a template variable doesn't have a value, when evaluated it's replaced with nothing. If we add a binding to give it a value:
{ "name" : "Tom" }
We'd get back:
Hello Tom
Every Pystachio object has an associated .bind
method that can bind
values to {{}} variables. Bindings are not immediately evaluated.
Instead, they are evaluated only when the interpolated value of the
object is necessary, e.g. for performing equality or serializing a
message over the wire.
Objects with and without mustache templated variables behave differently:
>>> Float(1.5)
Float(1.5)
>>> Float('{{x}}.5')
Float({{x}}.5)
>>> Float('{{x}}.5').bind(x = 1)
Float(1.5)
>>> Float('{{x}}.5').bind(x = 1) == Float(1.5)
True
>>> contextual_object = String('{{metavar{{number}}}}').bind(
... metavar1 = "first", metavar2 = "second")
>>> contextual_object
String({{metavar{{number}}}})
>>> contextual_object.bind(number = 1)
String(first)
>>> contextual_object.bind(number = 2)
String(second)
You usually bind simple key to value pairs, but you can also bind three other objects: lists, dictionaries, and structurals. These will be described in detail later.
Most Aurora/Thermos users don't ever (knowingly) interact with String
,
Float
, or Integer
Pystashio objects directly. Instead they interact
with derived structural (Struct
) objects that are collections of
fundamental and structural objects. The structural object components are
called attributes. Aurora's most used structural objects are Job
,
Task
, and Process
:
class Process(Struct):
cmdline = Required(String)
name = Required(String)
max_failures = Default(Integer, 1)
daemon = Default(Boolean, False)
ephemeral = Default(Boolean, False)
min_duration = Default(Integer, 5)
final = Default(Boolean, False)
Construct default objects by following the object's type with (). If you want an attribute to have a value different from its default, include the attribute name and value inside the parentheses.
>>> Process()
Process(daemon=False, max_failures=1, ephemeral=False,
min_duration=5, final=False)
Attribute values can be template variables, which then receive specific values when creating the object.
>>> Process(cmdline = 'echo {{message}}')
Process(daemon=False, max_failures=1, ephemeral=False, min_duration=5,
cmdline=echo {{message}}, final=False)
>>> Process(cmdline = 'echo {{message}}').bind(message = 'hello world')
Process(daemon=False, max_failures=1, ephemeral=False, min_duration=5,
cmdline=echo hello world, final=False)
A powerful binding property is that all of an object's children inherit its bindings:
>>> List(Process)([
... Process(name = '{{prefix}}_one'),
... Process(name = '{{prefix}}_two')
... ]).bind(prefix = 'hello')
ProcessList(
Process(daemon=False, name=hello_one, max_failures=1, ephemeral=False, min_duration=5, final=False),
Process(daemon=False, name=hello_two, max_failures=1, ephemeral=False, min_duration=5, final=False)
)
Remember that an Aurora Job contains Tasks which contain Processes. A Job level binding is inherited by its Tasks and all their Processes. Similarly a Task level binding is available to that Task and its Processes but is not visible at the Job level (inheritance is a one-way street.)
When you define a Struct
schema, one powerful, but confusing, feature
is that all of that structure's attributes are Mustache variables within
the enclosing scope once they have been populated.
For example, when Process
is defined above, all its attributes such as
{{name
}}, {{cmdline
}}, {{max_failures
}} etc., are all immediately
defined as Mustache variables, implicitly bound into the Process
, and
inherit all child objects once they are defined.
Thus, you can do the following:
>>> Process(name = "installer", cmdline = "echo {{name}} is running")
Process(daemon=False, name=installer, max_failures=1, ephemeral=False, min_duration=5,
cmdline=echo installer is running, final=False)
WARNING: This binding only takes place in one direction. For example,
the following does NOT work and does not set the Process
name
attribute's value.
>>> Process().bind(name = "installer")
Process(daemon=False, max_failures=1, ephemeral=False, min_duration=5, final=False)
The following is also not possible and results in an infinite loop that
attempts to resolve Process.name
.
>>> Process(name = '{{name}}').bind(name = 'installer')
Do not confuse Structural attributes with bound Mustache variables. Attributes are implicitly converted to Mustache variables but not vice versa.
A second templating method is both as powerful as the aforementioned and often confused with it. This method is due to automatic conversion of Struct attributes to Mustache variables as described above.
Suppose you create a Process object:
>>> p = Process(name = "process_one", cmdline = "echo hello world")
>>> p
Process(daemon=False, name=process_one, max_failures=1, ephemeral=False, min_duration=5,
cmdline=echo hello world, final=False)
This Process
object, "p
", can be used wherever a Process
object is
needed. It can also be reused by changing the value(s) of its
attribute(s). Here we change its name
attribute from process_one
to
process_two
.
>>> p(name = "process_two")
Process(daemon=False, name=process_two, max_failures=1, ephemeral=False, min_duration=5,
cmdline=echo hello world, final=False)
Template creation is a common use for this technique:
>>> Daemon = Process(daemon = True)
>>> logrotate = Daemon(name = 'logrotate', cmdline = './logrotate conf/logrotate.conf')
>>> mysql = Daemon(name = 'mysql', cmdline = 'bin/mysqld --safe-mode')
As described above, .bind()
binds simple strings or numbers to
Mustache variables. In addition to Structural types formed by combining
atomic types, Pystachio has two container types; List
and Map
which
can also be bound via .bind()
.
The bind()
function can take Python dictionaries or kwargs
interchangeably (when "kwargs
" is in a function definition, kwargs
receives a Python dictionary containing all keyword arguments after the
formal parameter list).
>>> String('{{foo}}').bind(foo = 'bar') == String('{{foo}}').bind({'foo': 'bar'})
True
Bindings done "closer" to the object in question take precedence:
>>> p = Process(name = '{{context}}_process')
>>> t = Task().bind(context = 'global')
>>> t(processes = [p, p.bind(context = 'local')])
Task(processes=ProcessList(
Process(daemon=False, name=global_process, max_failures=1, ephemeral=False, final=False,
min_duration=5),
Process(daemon=False, name=local_process, max_failures=1, ephemeral=False, final=False,
min_duration=5)
))
>>> fibonacci = List(Integer)([1, 1, 2, 3, 5, 8, 13])
>>> String('{{fib[4]}}').bind(fib = fibonacci)
String(5)
>>> first_names = Map(String, String)({'Kent': 'Clark', 'Wayne': 'Bruce', 'Prince': 'Diana'})
>>> String('{{first[Kent]}}').bind(first = first_names)
String(Clark)
>>> String('{{p.cmdline}}').bind(p = Process(cmdline = "echo hello world"))
String(echo hello world)
Use structural templates when binding more than two or three individual values at the Job or Task level. For fewer than two or three, standard key to string binding is sufficient.
Structural binding is a very powerful pattern and is most useful in
Aurora/Thermos for doing Structural configuration. For example, you can
define a job profile. The following profile uses HDFS
, the Hadoop
Distributed File System, to designate a file's location. HDFS
does
not come with Aurora, so you'll need to either install it separately
or change the way the dataset is designated.
class Profile(Struct):
version = Required(String)
environment = Required(String)
dataset = Default(String, hdfs://home/aurora/data/{{environment}}')
PRODUCTION = Profile(version = 'live', environment = 'prod')
DEVEL = Profile(version = 'latest',
environment = 'devel',
dataset = 'hdfs://home/aurora/data/test')
TEST = Profile(version = 'latest', environment = 'test')
JOB_TEMPLATE = Job(
name = 'application',
role = 'myteam',
cluster = 'cluster1',
environment = '{{profile.environment}}',
task = SequentialTask(
name = 'task',
resources = Resources(cpu = 2, ram = 4*GB, disk = 8*GB),
processes = [
Process(name = 'main', cmdline = 'java -jar application.jar -hdfsPath
{{profile.dataset}}')
]
)
)
jobs = [
JOB_TEMPLATE(instances = 100).bind(profile = PRODUCTION),
JOB_TEMPLATE.bind(profile = DEVEL),
JOB_TEMPLATE.bind(profile = TEST),
]
In this case, a custom structural "Profile" is created to self-document
the configuration to some degree. This also allows some schema
"type-checking", and for default self-substitution, e.g. in
Profile.dataset
above.
So rather than a .bind()
with a half-dozen substituted variables, you
can bind a single object that has sensible defaults stored in a single
place.
When creating your .aurora
configuration, try to keep all versions of
a particular job within the same .aurora
file. For example, if you
have separate jobs for cluster1
, cluster1
staging, cluster1
testing, andcluster2
, keep them as close together as possible.
Constructs shared across multiple jobs owned by your team (e.g.
team-level defaults or structural templates) can be split into separate
.aurora
files and included via the include
directive.
If you see repetition or find yourself copy and pasting any parts of your configuration, it's likely an opportunity for templating. Take the example below:
redundant.aurora
contains:
download = Process(
name = 'download',
cmdline = 'wget http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tar.bz2',
max_failures = 5,
min_duration = 1)
unpack = Process(
name = 'unpack',
cmdline = 'rm -rf Python-2.7.3 && tar xzf Python-2.7.3.tar.bz2',
max_failures = 5,
min_duration = 1)
build = Process(
name = 'build',
cmdline = 'pushd Python-2.7.3 && ./configure && make && popd',
max_failures = 1)
email = Process(
name = 'email',
cmdline = 'echo Success | mail feynman@tmc.com',
max_failures = 5,
min_duration = 1)
build_python = Task(
name = 'build_python',
processes = [download, unpack, build, email],
constraints = [Constraint(order = ['download', 'unpack', 'build', 'email'])])
As you'll notice, there's a lot of repetition in the Process
definitions. For example, almost every process sets a max_failures
limit to 5 and a min_duration
to 1. This is an opportunity for factoring
into a common process template.
Furthermore, the Python version is repeated everywhere. This can be bound via structural templating as described in the Advanced Binding section.
less_redundant.aurora
contains:
class Python(Struct):
version = Required(String)
base = Default(String, 'Python-{{version}}')
package = Default(String, '{{base}}.tar.bz2')
ReliableProcess = Process(
max_failures = 5,
min_duration = 1)
download = ReliableProcess(
name = 'download',
cmdline = 'wget http://www.python.org/ftp/python/{{python.version}}/{{python.package}}')
unpack = ReliableProcess(
name = 'unpack',
cmdline = 'rm -rf {{python.base}} && tar xzf {{python.package}}')
build = ReliableProcess(
name = 'build',
cmdline = 'pushd {{python.base}} && ./configure && make && popd',
max_failures = 1)
email = ReliableProcess(
name = 'email',
cmdline = 'echo Success | mail {{role}}@foocorp.com')
build_python = SequentialTask(
name = 'build_python',
processes = [download, unpack, build, email]).bind(python = Python(version = "2.7.3"))
Many tiny Processes makes for harder to manage configurations.
copy = Process(
name = 'copy',
cmdline = 'rcp user@my_machine:my_application .'
)
unpack = Process(
name = 'unpack',
cmdline = 'unzip app.zip'
)
remove = Process(
name = 'remove',
cmdline = 'rm -f app.zip'
)
run = Process(
name = 'app',
cmdline = 'java -jar app.jar'
)
run_task = Task(
processes = [copy, unpack, remove, run],
constraints = order(copy, unpack, remove, run)
)
Each cmdline
runs in a bash subshell, so you have the full power of
bash. Chaining commands with &&
or ||
is almost always the right
thing to do.
Also for Tasks that are simply a list of processes that run one after
another, consider using the SequentialTask
helper which applies a
linear ordering constraint for you.
stage = Process(
name = 'stage',
cmdline = 'rcp user@my_machine:my_application . && unzip app.zip && rm -f app.zip')
run = Process(name = 'app', cmdline = 'java -jar app.jar')
run_task = SequentialTask(processes = [stage, run])
90% of the time you define a function in a .aurora
file, you're
probably Doing It Wrong(TM).
def get_my_task(name, user, cpu, ram, disk):
return Task(
name = name,
user = user,
processes = [STAGE_PROCESS, RUN_PROCESS],
constraints = order(STAGE_PROCESS, RUN_PROCESS),
resources = Resources(cpu = cpu, ram = ram, disk = disk)
)
task_one = get_my_task('task_one', 'feynman', 1.0, 32*MB, 1*GB)
task_two = get_my_task('task_two', 'feynman', 2.0, 64*MB, 1*GB)
This one is more idiomatic. Forced keyword arguments prevents accidents, e.g. constructing a task with "32*MB" when you mean 32MB of ram and not disk. Less proliferation of task-construction techniques means easier-to-read, quicker-to-understand, and a more composable configuration.
TASK_TEMPLATE = SequentialTask(
user = 'wickman',
processes = [STAGE_PROCESS, RUN_PROCESS],
)
task_one = TASK_TEMPLATE(
name = 'task_one',
resources = Resources(cpu = 1.0, ram = 32*MB, disk = 1*GB) )
task_two = TASK_TEMPLATE(
name = 'task_two',
resources = Resources(cpu = 2.0, ram = 64*MB, disk = 1*GB)
)