-
Notifications
You must be signed in to change notification settings - Fork 19
Getting Started with Millstone
This guide walks you through cloning the latest stable Amazon Machine Image (AMI) configured with Millstone. Most new users will want to use this guide. Docs for individuals wishing to configure their instance or modify source code are coming soon.
You need to create to an Amazon Web Services (AWS) account. Brad Chapman's getting started guide for cloudbiolinux has a solid first chapter with instructions on getting everything setup. /~https://github.com/chapmanb/cloudbiolinux/blob/master/doc/intro/gettingStarted_CloudBioLinux.pdf?raw=true
-
In the EC2 console, navigate to Instances, then press the Launch Instance button. This will launch a wizard that walks you through setting up a new instance. The following steps provide instructions for configuring this instance.
-
In the new instance wizard, choose the latest Millstone AMI. Make sure you're in the N. Virginia instance (upper right dropdown)
-
On the 'Choose instance type' tab, select an instance according to your needs. We recommend m3.medium (select General Purpose on the left).
-
In 'Configure instance', the only setting we recommend changing is explicitly setting the Availability Zone (we always use
us-east-1a
). You can only move EBS (Amazon hard-drives) between instances in the same zone, so it'll make things easier to consistently make everything in the same zone. -
In 'Add storage', increase the size of the root drive to the amount of space that you'll need. For bacterial genomes, about 2 GB per sample should be more than enough (i.e. 100 samples = 200 GB).
-
In 'Tag instance', fill in an informative value for the 'Name' key. I like the name to include the date it was created and a description of what the instance is running (e.g. 2014_04_01_mutate_all_the_things).
-
For security group, configure a group appropriate to your needs. Most users will want to create a security group with all of the following open (NOTE: This will make your instance publically visible, but login is still required.):
- All ICMP
- All TCP
- All UDP
- SSH
-
Continue to the final tab where you'll press 'Launch the instance'. Select or create a key. If you create the key, download and save the private key. (NOTE: If you lose the private key there's no way to ssh back into your instance. You'll have to terminate it and create a new one.)
It takes about 5-10 minutes for the instance to launch and all bootstrapping to finished, after which your Millstone is ready to grind!
From the web, visit: ec2-xx-xx-xx-xx.compute-1.amazonaws.com (replacing the x's). This can be found by going to "Instances" and finding your created instance. The webpage to go to can be found under Public DNS. It may take some time for your instance to initialize, wait until all status checks are completed before attempting to log in.
To ssh in:
ssh -i ~/.ssh/your-key.pem ubuntu@ec2-xx-xx-xx-xx.compute-1.amazonaws.com
(If permissions fail, chmod the key's permissions to 700)
(Currently, account creation does not exist. This is written using the current build where there exists a prebuilt admin account that you can log into and use)
Confirm that Millstone is installed and working properly by going to the web page you found above. You should be greeted with a webpage that asks you to create a project. Click create a project or log in (upper right) to the admin account that comes with a fresh Millstone install. This will put you back to the main page where you can select create a new project.
Once you've confirmed that Millstone is working properly, ssh into your server to confirm that your key pair is working properly and that you're comfortable with Millstone's structure (see GitHub's main discussion of how to use ssh keys for a primer if you run into key issues). If you followed the default creation guidelines, you likely have a elastic cloud storage instance created as your main storage. This is already mounted and ready to go
Place the files that you wish to put on the Millstone server into a single folder for ease of use, and use rsync or scp to transfer over the files using the url and login you found early for ssh logins. Briefly, you want something like this:
scp -r -i ~/.ssh/YOUR-KEY /PATH-TO-FOLDER-YOUR-COMPUTER ubuntu@ec2-YOUR-NODE.compute-1.amazonaws.com:/PATH-TO-FOLDER-INSTANCE
where -r indicates that we want to move over a folder, -i indicates that we want to move over using a key, /YOUR-KEY is the location of your .pem key, /PATH-TO-FOLDER-YOUR-COMPUTER is where you put the files millstone wants to use, YOUR-NODE is the url that amazon gives you, and PATH-TO-FOLDER-INSTANCE is the location of the folder where we want our sequences to go. This may take some time so go get a coffee and start working on setting up that project
We recommend you use rsync as it is a bit less dumb than scp. Using the same conventions, it should look like this:
rsync -rav -e "ssh -i /FULL-PATH-TO-KEY" /PATH-TO-FOLDER-YOUR-COMPUTER ubuntu@ec2-YOUR-NODE.compute-1.amazonaws.com:/PATH-TO-FOLDER-INSTANCE --progress
Once this is done your files should be on the cloud! Check using ls to make sure everything is there and move on.
You will the prompted to give a short name to a project - this is currently unchangable so chose wisely! If you get stuck or don't like what you picked, exit back to the main millstone page and click on your project's new name to delete it.
Once you've picked a name for the project, you'll be asked to set a name for the alignment. The main difference here is that an alignment consists of a one-to-one pair with a reference genome - name accordingly. (Current versions lack a next button here to page through the alignment creation steps, but your name for the alignment is saved, and can be advanced by clicking along the bar under Create Alignment.)
Add a reference genome to your project by clicking the new button, and selecting load file from NCBI - Simply fill in the accession number (of the form NC_XXXXXX.X or similar) and give the reference genome a name. If you'd like to use a custom reference, place the file on the server using SCP or similar and give Millstone the absolute server path to your custom file. Check to make sure you've got the right accession number by comparing your genome's size to the number of nucleotides present in the reference genome.
Once that's done, move on to the samples section. After you've moved over your samples to the instance, preferably on a EBS or other stable storage, you can tell Millstone where to find your samples using the template provided (a less typing intensive version of this is coming). The targets template can be intimidating, but only three things are needed - a sample name, read_1_path, and read_2_path. Set these by hand (pay careful attention to where you put your files!) and upload the completed template to the server. If all goes well you should have your samples imported into Millstone. If you're coming from mac or windows, don't upload straight from your excel export, open in text wrangler or similar and save as, making sure to export with unix line breaks. Otherwise millstone will try (as of the current build). If it still doesn't upload properly, check the troubleshooting section below for hints and common bugs with the importation process.
I can log in via SSH but the web interface doesn't load!
You've probably forgotten to allow access to your instance through web interfaces. This can be fixed by adding the following connections to your security group:
- All ICMP
- All TCP
- All UDP
You can do this by going to the Network & Security -> Security Groups section of the EC2 dashboard and editing the security group that you created in your instance. If you've forgotten this can be found in the main instance dash on the far right under security groups. Click on that and you should be able to edit inbound rules by right clicking on the Group ID
I've managed to load the webpage but get a 502 bad gateway error!
Millstone is probably loading up, try again in a few minutes. If it still doesn't work contact Gleb :D.
Help! Registration is closed
This shouldn't be seen once millstone is open to the public, so contact Gleb or Dan about this. You might have used the pre-release instance
Millstone just sits there after importing a template file
This could be any number of things. If your template file is formatted correctly, it could be a completely out of space error, so check that you've got room on your drive containing Millstone. File formatting is often the biggest problem in this stage, so be careful that you've escaped spaces in file names. Also, make sure you're saving your file with unix line breaks. This seems to be the cause of 90% of silent failures during the template upload process (n=3)
I want to make sure everything's going right, where can I find the logs?
The logs are by default at /var/log/supervisor