Currently Tajo is supported on EMR through a bootstrap action.
Installing the AWS Command Line Interface
s3 path
- script: s3://beta.elasticmapreduce/bootstrap-actions/tajo/
- template: s3://beta.elasticmapreduce/bootstrap-actions/tajo/template/tajo-0.10.0
Usage: [-t|--tar] [-c|--conf] [-l|--lib] [-h|--help] [-e|--env] [-s|--site] [-T|--test-home] [-H|--test-hadoop-home]
Tajo binary tarball URL.
Ex: s3://[your_bucket]/[your_path]/tajo-0.10.0.tar.gz or
-c, --conf [S3_PATH_TO_TAJO_CONF_DIR]
Tajo conf directory URL.
Ex: --conf s3://beta.elasticmapreduce/bootstrap-actions/tajo/template/tajo-0.10.0/c3.xlarge/conf
Tajo third party lib URL.
Ex: --lib s3://{your_bucket}/{your_lib_dir} or http://{lib_url}/{lib_file_name.jar}
-v, --tajo-version [INSTALL_TAJO_VERSION]
Default: Apache tajo stable version.
Ex: --tajo-version x.x.x
-h, --help
Display help message
-e, --env
Item of delimiter)
Ex: --env "TAJO_PID_DIR=/home/hadoop/tajo/pids TAJO_WORKER_HEAPSIZE=1024"
-s, --site
Item of tajo-site(space delimiter)
Ex: --site "tajo.rootdir=s3://mybucket/tajo tajo.worker.start.cleanup=true"
-T, --test-hadoop-home [LOCAL_PATH_TO_TEST_ROOT] (only used for local test)
Local test directory path
-H, --test-hadoop-home [LOCAL_PATH_TO_HADOOP_HOME_FOR_TEST] (only used for local test)
Local test HADOOP_HOME
- Note that all arguments are optional.
are only used for local test. -t
allows a user to specify a custom Tajo binary archive file through S3 URL or HTTP URL.-e
allows a user to specify environment variables in Multiple environment variables can be combined in a space delimted list. Please refer to the above example.-s
allows a user to specify config properties in tajo-site.xml. Multiple properties can be combined in a space delimited list. Please refer to the above example.
- It uses EMR HDFS as
which includes the warehouse directory - It uses all default heap and concurrency configs.
- It is good for a simple test.
$ aws emr create-cluster \
--name="[CLUSTER_NAME]" \
--ami-version=3.3 \
--no-auto-terminate \
--ec2-attributes KeyName=[KEY_PAIR_NAME] \
--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=1,InstanceType=c3.xlarge \
--bootstrap-action Name="Install tajo",Path=s3://beta.elasticmapreduce/bootstrap-actions/tajo/
- To use your Tajo tarball, you should use
to specify S3 URL. - To change
, you should make your owntajo-site.xml
and use-c
option to specify S3 URL for config dirs.- You can find appropriate config templates in /~
- if you need third party(external) library like xxx.jar, use
option to specify S3 directory URL, including third party Jars.
aws emr create-cluster \
--name="[CLUSTER_NAME]" \
--ami-version=3.3 \
--no-auto-terminate \
--ec2-attributes KeyName=[KEY_PAIR_NAME] \
--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=1,InstanceType=c3.xlarge \
--bootstrap-action Name="Install tajo",Path=s3://beta.elasticmapreduce/bootstrap-actions/tajo/,Args=["-t","s3://[your_bucket]/tajo-0.10.0.tar.gz","-c","s3://[your_bucket]/conf","-l","s3://[your_bucket]/lib"]
You need to remember your cluster id when you launch an Tajo cluster. Please replace <CLUSTER_ID>
by your cluster id.
aws emr terminate-clusters --cluster-ids "<CLUSTER ID>"
You need to remember your cluster id when you launch an Tajo cluster. Please replace <CLUSTER_ID>
by your cluster id.
aws emr list-instances --cluster-ids "j-FC5DVH3RI6AA"
allows users to test the bootstrap in local machine without EMR instances. For it, you need to use -T
and -H
- Testing root dir which is temporarily used for testing.-H
- Hadoop binary directory which is used to pretended to be EMR Hadoop home
$ ./ -t /[your_local_binary_path]/tajo-0.10.0.tar.gz -c /[your_test_conf_dir]/conf -l /[your_test_lib_dir]/lib -T /[LOCAL_PATH_TO_TEST_ROOT] -H /[LOCAL_PATH_TO_HADOOP_HOME_FOR_TEST]
Tajo can use RDS. For it:
- You need to make sure you already have a running RDS instance. And then infomation about RDS set to
option. - To use RDS, you needs appropriate JDBC jars like mysql-connector.jar.
option allows you to specify S3 directory URL, including third party Jars.
aws emr create-cluster \
--name="[CLUSTER_NAME]" \
--ami-version=3.3 \
--ec2-attributes KeyName=[KEY_PAIR_NAME] \
--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=1,InstanceType=c3.xlarge \
--bootstrap-action Name="Install tajo",Path=s3://beta.elasticmapreduce/bootstrap-actions/tajo/,Args=["-t","s3://[your_bucket]/tajo-0.10.0.tar.gz","-c","s3://[your_bucket]/conf","-l", \
"", \
"-s","{id} tajo.catalog.jdbc.connection.password={password} tajo.catalog.jdbc.uri=jdbc:mysql://{RDS_URL}:3306/tajo?createDatabaseIfNotExist=true"]
Please refer to [Catalog configuration documentation] ( in Tajo doc.