Skip to content

Getting Started with Millstone

mnapolitano89 edited this page Apr 10, 2014 · 25 revisions

This guide walks you through cloning the latest stable Amazon Machine Image (AMI) configured with Millstone. Most new users will want to use this guide. Docs for individuals wishing to configure their instance or modify source code are coming soon.

Prerequisites

You need to create to an Amazon Web Services (AWS) account. Brad Chapman's getting started guide for cloudbiolinux has a solid first chapter with instructions on getting everything setup. /~https://github.com/chapmanb/cloudbiolinux/blob/master/doc/intro/gettingStarted_CloudBioLinux.pdf?raw=true

Cloning the AMI

  1. In the EC2 console, navigate to Instances, then press the Launch Instance button. This will launch a wizard that walks you through setting up a new instance. The following steps provide instructions for configuring this instance.

  2. In the new instance wizard, choose the latest Millstone AMI. Make sure you're in the N. Virginia instance (upper right dropdown)

  3. On the 'Choose instance type' tab, select an instance according to your needs. We recommend m3.medium (select General Purpose on the left).

  4. In 'Configure instance', the only setting we recommend changing is explicitly setting the Availability Zone (we always use us-east-1a). You can only move EBS (Amazon hard-drives) between instances in the same zone, so it'll make things easier to consistently make everything in the same zone.

  5. In 'Add storage', increase the size of the root drive to the amount of space that you'll need. For bacterial genomes, about 2 GB per sample should be more than enough (i.e. 100 samples = 200 GB).

  6. In 'Tag instance', fill in an informative value for the 'Name' key. I like the name to include the date it was created and a description of what the instance is running (e.g. 2014_04_01_mutate_all_the_things).

  7. For security group, configure a group appropriate to your needs. Most users will want to create a security group with all of the following open (NOTE: This will make your instance publically visible, but login is still required.):

    • All ICMP
    • All TCP
    • All UDP
    • SSH
  8. Continue to the final tab where you'll press 'Launch the instance'. Select or create a key. If you create the key, download and save the private key. (NOTE: If you lose the private key there's no way to ssh back into your instance. You'll have to terminate it and create a new one.)

It takes about 5-10 minutes for the instance to launch and all bootstrapping to finished, after which your Millstone is ready to grind!

Accessing your instance

From the web, visit: ec2-xx-xx-xx-xx.compute-1.amazonaws.com (replacing the x's). This can be found by going to "Instances" and finding your created instance. The webpage to go to can be found under Public DNS. It may take some time for your instance to initialize, wait until all status checks are completed before attempting to log in.

To ssh in:

ssh -i ~/.ssh/your-key.pem ubuntu@ec2-xx-xx-xx-xx.compute-1.amazonaws.com

(If permissions fail, chmod the key's permissions to 700)

Initial Millstone start

(Currently, account creation does not exist. This is written using the current build where there exists a prebuilt admin account that you can log into and use)

Confirm that Millstone is installed and working properly by going to the web page you found above. You should be greeted with a webpage that asks you to create a project. Click create a project or log in (upper right) to the admin account that comes with a fresh Millstone install. This will put you back to the main page where you can select create a new project.

Moving your data over:

Once you've confirmed that Millstone is working properly, ssh into your server to confirm that your key pair is working properly and that you're comfortable with Millstone's structure (see GitHub's main discussion of how to use ssh keys for a primer if you run into key issues). If you followed the default creation guidelines, you likely have a elastic cloud storage instance created along with your main storage - it's recommended that you place your sequence storage on the ECS for future use and stability reasons. As Millstone is linux based, follow amazon's instructions on how to mount an ECS instance for use (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html). Usually your ECS instance will be named xvdb, but this may be different for your compute node. Once your drive is mounted, navigate and create a new folder for your sequences to reside in (here, Seq). In this case, we have a folder at /mnt/xvdb/Seq.

Place the files that you wish to put on the Millstone server into a single folder for ease of use, and use rsync or scp to transfer over the files using the url and login you found early for ssh logins. Briefly, you want something like this:

scp -r -i ~/.ssh/YOUR-KEY /PATH-TO-FOLDER-YOUR-COMPUTER ubuntu@ec2-YOUR-NODE.compute-1.amazonaws.com:/PATH-TO-FOLDER-INSTANCE 

where -r indicates that we want to move over a folder, -i indicates that we want to move over using a key, /YOUR-KEY is the location of your .pem key, /PATH-TO-FOLDER-YOUR-COMPUTER is where you put the files millstone wants to use, YOUR-NODE is the url that amazon gives you, and PATH-TO-FOLDER-INSTANCE is the location of the folder where we want our sequences to go. This may take some time so go get a coffee and start working on setting up that project

We recommend you use rsync as it is a bit less dumb than scp. Using the same conventions, it should look like this:

 rsync -rav -e "ssh -i /FULL-PATH-TO-KEY" /PATH-TO-FOLDER-YOUR-COMPUTER ubuntu@ec2-YOUR-NODE.compute-1.amazonaws.com:/PATH-TO-FOLDER-INSTANCE --progress

Once this is done your files should be on the cloud! Check using ls to make sure everything is there and move on.

Creating a new project:

You will the prompted to give a short name to a project - this is currently unchangable so chose wisely! If you get stuck or don't like what you picked, exit back to the main millstone page and click on your project's new name to delete it.

Once you've picked a name for the project, you'll be asked to set a name for the alignment. The main difference here is that an alignment consists of a one-to-one pair with a reference genome - name accordingly. (Current versions lack a next button here to page through the alignment creation steps, but your name for the alignment is saved, and can be advanced by clicking along the bar under Create Alignment.)

Add a reference genome to your project by clicking the new button, and selecting load file from NCBI - Simply fill in the accession number (of the form NC_XXXXXX.X or similar) and give the reference genome a name. If you'd like to use a custom reference, place the file on the server using SCP or similar and give Millstone the absolute server path to your custom file. Check to make sure you've got the right accession number by comparing your genome's size to the number of nucleotides present in the reference genome.

Once that's done, move on to the samples section. After you've moved over your samples to the instance, preferably on a EBS or other stable storage, you can tell Millstone where to find your samples using the template provided (a less typing intensive version of this is coming). The targets template can be intimidating, but only three things are needed - a sample name, read_1_path, and read_2_path. Set these by hand (pay careful attention to where you put your files!) and upload the completed template to the server. If all goes well you should have your samples imported into Millstone. If it doesn't, check the troubleshooting section below for hints and common bugs with the importation process.

Troubleshooting:

I can log in via SSH but the web interface doesn't load!

You've probably forgotten to allow access to your instance through web interfaces. This can be fixed by adding the following connections to your security group:

  • All ICMP
  • All TCP
  • All UDP

You can do this by going to the Network & Security -> Security Groups section of the EC2 dashboard and editing the security group that you created in your instance. If you've forgotten this can be found in the main instance dash on the far right under security groups. Click on that and you should be able to edit inbound rules by right clicking on the Group ID

I've managed to load the webpage but get a 502 bad gateway error!

Millstone is probably loading up, try again in a few minutes. If it still doesn't work contact Gleb :D.

Help! Registration is closed

This shouldn't be seen once millstone is open to the public, so contact Gleb or Dan about this. You might have used the pre-release instance