Skip to content

Getting Started with Millstone

Daniel Bryan Goodman edited this page Feb 4, 2015 · 25 revisions

This guide walks you through cloning the latest stable Amazon Machine Image (AMI) configured with Millstone. Most new users will want to use this guide. Docs for individuals wishing to configure their instance or modify source code are coming soon.

Prerequisites

You need to create to an Amazon Web Services (AWS) account. Brad Chapman's getting started guide for cloudbiolinux has a solid first chapter with instructions on getting everything setup. /~https://github.com/chapmanb/cloudbiolinux/blob/master/doc/intro/gettingStarted_CloudBioLinux.pdf?raw=true

Cloning the AMI

  1. In the EC2 console, navigate to Instances, then press the Launch Instance button. This will launch a wizard that walks you through setting up a new instance. The following steps provide instructions for configuring this instance.

  2. In the new instance wizard, search for 'Millstone', choose the latest Millstone AMI. As of this writing, the most current is millstone_combined_2015_02_03. Make sure you're in the N. Virginia instance (upper right dropdown)

  3. On the 'Choose instance type' tab, select an instance according to your needs. We recommend m3.medium (select General Purpose on the left).

  4. In 'Configure instance', the only setting we recommend changing is explicitly setting the Availability Zone (we always use us-east-1a). You can only move EBS (Amazon hard-drives) between instances in the same zone, so it'll make things easier to consistently make everything in the same zone.

  5. In 'Add storage', increase the size of the root drive to the amount of space that you'll need. For bacterial genomes, about 2 GB per sample should be more than enough (i.e. 100 samples = 200 GB).

  6. In 'Tag instance', fill in an informative value for the 'Name' key. I like the name to include the date it was created and a description of what the instance is running (e.g. 2014_04_01_mutate_all_the_things).

  7. For security group, configure a group appropriate to your needs. Most users will want to create a security group with all of the following open (NOTE: This will make your instance publically visible, but login is still required.):

    • All ICMP
    • All TCP
    • All UDP
    • SSH
  8. Continue to the final tab where you'll press 'Launch the instance'. Select or create a key. If you create the key, download and save the private key. (NOTE: If you lose the private key there's no way to ssh back into your instance. You'll have to terminate it and create a new one.)

It takes about 5-10 minutes for the instance to launch and all bootstrapping to finish, after which your Millstone is ready to grind!

Accessing your instance

From the web, visit: ec2-xx-xx-xx-xx.compute-1.amazonaws.com (replacing the x's). This can be found by going to "Instances" and finding your created instance. The webpage to go to can be found under Public DNS. It may take some time for your instance to initialize, wait until all status checks are completed before attempting to log in.

To ssh in:

ssh -i ~/.ssh/your-key.pem ubuntu@ec2-xx-xx-xx-xx.compute-1.amazonaws.com

(If permissions fail, chmod the key's permissions to 700)

Initial Millstone start

(Currently, account creation does not exist. This is written using the current build where there exists a prebuilt admin account that you can log into and use)

Confirm that Millstone is installed and working properly by going to the web page you found above. You should be greeted with a webpage that asks you to create a project. Click create a project or log in (upper right) to the admin account that comes with a fresh Millstone install. This will put you back to the main page where you can select create a new project.

Moving your data over:

Once you've confirmed that Millstone is working properly, ssh into your server to confirm that your key pair is working properly and that you're comfortable with Millstone's structure (see GitHub's main discussion of how to use ssh keys for a primer if you run into key issues). If you followed the default creation guidelines, you likely have a elastic cloud storage instance created as your main storage. This is already mounted and ready to go

Place the files that you wish to put on the Millstone server into a single folder for ease of use, and use rsync or scp to transfer over the files using the url and login you found early for ssh logins. Briefly, you want something like this:

scp -r -i ~/.ssh/YOUR-KEY /PATH-TO-FOLDER-YOUR-COMPUTER ubuntu@ec2-YOUR-NODE.compute-1.amazonaws.com:/PATH-TO-FOLDER-INSTANCE 

where -r indicates that we want to move over a folder, -i indicates that we want to move over using a key, /YOUR-KEY is the location of your .pem key, /PATH-TO-FOLDER-YOUR-COMPUTER is where you put the files millstone wants to use, YOUR-NODE is the url that amazon gives you, and PATH-TO-FOLDER-INSTANCE is the location of the folder where we want our sequences to go. This may take some time so go get a coffee and start working on setting up that project

We recommend you use rsync as it is a bit less dumb than scp. Using the same conventions, it should look like this:

 rsync -rav -e "ssh -i /FULL-PATH-TO-KEY" /PATH-TO-FOLDER-YOUR-COMPUTER ubuntu@ec2-YOUR-NODE.compute-1.amazonaws.com:/PATH-TO-FOLDER-INSTANCE --progress

Once this is done your files should be on the cloud! Check using ls to make sure everything is there and move on.

Creating a new project:

You will the prompted to give a short name to a project - this is currently unchangable so chose wisely! If you get stuck or don't like what you picked, exit back to the main millstone page and click on your project's new name to delete it.

Once you've picked a name for the project, you'll be asked to set a name for the alignment. The main difference here is that an alignment consists of a one-to-one pair with a reference genome - name accordingly. (Current versions lack a next button here to page through the alignment creation steps, but your name for the alignment is saved, and can be advanced by clicking along the bar under Create Alignment.)

Add a reference genome to your project by clicking the new button, and selecting load file from NCBI - Simply fill in the accession number (of the form NC_XXXXXX.X or similar) and give the reference genome a name. If you'd like to use a custom reference, place the file on the server using SCP or similar and give Millstone the absolute server path to your custom file. Check to make sure you've got the right accession number by comparing your genome's size to the number of nucleotides present in the reference genome.

Once that's done, move on to the samples section. After you've moved over your samples to the instance, preferably on a EBS or other stable storage, you can tell Millstone where to find your samples using the template provided (a less typing intensive version of this is coming). The targets template can be intimidating, but only three things are needed - a sample name, read_1_path, and read_2_path. Set these by hand (pay careful attention to where you put your files!) and upload the completed template to the server. If all goes well you should have your samples imported into Millstone. If you're coming from mac or windows, don't upload straight from your excel export, open in text wrangler or similar and save as, making sure to export with unix line breaks. Otherwise millstone will try (as of the current build). If it still doesn't upload properly, check the troubleshooting section below for hints and common bugs with the importation process.

Troubleshooting:

I can log in via SSH but the web interface doesn't load!

You've probably forgotten to allow access to your instance through web interfaces. This can be fixed by adding the following connections to your security group:

  • All ICMP
  • All TCP
  • All UDP

You can do this by going to the Network & Security -> Security Groups section of the EC2 dashboard and editing the security group that you created in your instance. If you've forgotten this can be found in the main instance dash on the far right under security groups. Click on that and you should be able to edit inbound rules by right clicking on the Group ID

I've managed to load the webpage but get a 502 bad gateway error!

Millstone is probably loading up, try again in a few minutes. If it still doesn't work contact Gleb :D.

Help! Registration is closed

This shouldn't be seen once millstone is open to the public, so contact Gleb or Dan about this. You might have used the pre-release instance

Millstone just sits there after importing a template file

This could be any number of things. If your template file is formatted correctly, it could be a completely out of space error, so check that you've got room on your drive containing Millstone. File formatting is often the biggest problem in this stage, so be careful that you've escaped spaces in file names. Also, make sure you're saving your file with unix line breaks. This seems to be the cause of 90% of silent failures during the template upload process (n=3)

I want to make sure everything's going right, where can I find the logs?

The logs are by default at /var/log/supervisor