Capacity Planning

Galaxy CloudMan CapacityPlanning for Amazon Web Services
http://aws.amazon.com/

This page offers advice on how much cloud infrastructure you will need to run your Galaxy instance on Amazon Web Services (AWS). See the general capacity planning page for advice that applies across different cloud infrastructures.

ā†‘ CloudMan

Amazon Web Services

CloudMan was initially developed for the Amazon Web Services (AWS) cloud platform. Before we cover AWS, we'll need to introduce some terminology:

Terminology

EC2

. Amazon's Elastic Compute Cloud (EC2) provides the compute part of their cloud. How many CPUs, and how much memory any instance has is determined by that instance's EC2 Instance Type.

EBS

. Amazon's Elastic Block Storage (EBS) provides virtual disk drives for EC2 instances.

S3

. Amazon's Simple Storage Service (s3) is "storage for the internet." It provides a web services interface to net-accessible storage. It is not used at runtime by Galaxy cloud instances, but can be used to create archives of EBS virtual disks.

How Much EC2?

Which EC2 instance type(s) should you use for your Galaxy?

EC2 Recommendations

Scenario Head Worker
1: Light usage Standard Large or Extra Large Standard Large or Extra Large
2: Occasional heavy High-Memory Double or Quadruple Extra Large High-Memory Extra Large
3: Continuous variable High-Memory Double or Quadruple Extra Large High-Memory Extra Large

EC2 Instance Type Comments

" text-align:center;"> Y" text-align:center;"> Y" "> Y" "> Y" "> Y" "> Y" "> Y" "> Y
Instance Type Recommended for Usage Scenarios Comments
="th"style="text-align:center">1 2 3
H W H W H W
Micro N N N N N N Galaxy may come up on these instances, but it can't run any analysis.
Small
Medium
Large N N N N Recommended for Scenario 1: Light Usage, head and worker nodes.
Extra Large
High-
Mem-
ory
Extra Large N N Recommended for Scenarios 2 & 3: heavy or variable usage head nodes.
Double Extra Large Recommended head node for heavy/variable usage (Scenarios 2 & 3)
Quadruple Extra Large The Galaxy Team uses this head node in workshops that run TopHat. It can support ~30 concurrent TopHat jobs without significant slowdown, whereas the Double Extra Large option gets bogged down.
Com-
pute
Cluster Any X X X X X X These are not supported by CloudMan
GPU Any

Key:

" "> Y
N Not recommended
Recommended
X Can't go there

See also

How Much EBS?

Galaxy CloudMan comes with two standard volumes:

  1. Tools Volume (10GB): Contains the tools used by the instance
  2. Indices Volume (700GB): Reference data for number of species.

In addition, you will need a data volume to contain the data used by and produced in your analysis. You don't control the size of the tools and indices volumes, but you specify the size of the data volume at setup time. The size of your data volume is determined by the size of your datasets. Unfortunately, we don't have any hard and fast guidelines or multipliers for how much you will need, given the size of your datasets.

For Scenario 1, Light usage, it is fine to specify a large data volume (up to the 1 terabyte max). However for Scenarios 2 and 3, where the storage may or will exist for a long time, allocating too much storage can incur significant cost. AWS charges for allocated storage, not actually used storage, by the hour.