PutPlace is hosted on Amazon’s Grid (called EC2 = Elastic Compute Cloud, C squared, geddit!). We store all our user data on the Amazon storage service S3 (Simple Storage Service), as they can offer us unlimited secure storage at a wholesale price of around $0.10 per GB per month. They also make it very cheap to move storage between our grid and our S3 store.
Each EC2 node comes with 250GB of local storage, but that storage springs to life when the node is created and disappears when the node is shutdown or crashes (although we have only had one node die on us in the 12 months we have been using EC2).
This is okay for user data e.g. the files you backup, as we don’t mark those as secure until they have been written to stable storage on the S3 grid. Unfortunately, this doesn’t work very well for our database (Postgres) which expects to have stable local storage directly attached to the node and visible as a local disk device. So until now we have had to bake in a bunch of safety code to ensure that if the database node crashed we could recover sensibly and quickly.
However this week Amazon has announced Elastic Block Storage. Elastic block storage combines the safety of S3 with the utility of a local disk. You can create an EBS volume of up to 1 terabyte in size and attach it to any Amazon EC2. It just looks like a local disk to that node, but if the EC2 instance dies the disk survives.
So we can now attach two EBS nodes and store our log and data on two stable devices either of which can be used to recover the other.
It gets better though. You can take snapshots of your disk and write them to S3. These snapshots can be used to backup your disk in order to copy it to a new EBS instance. Better still when creating a new instance the snapshot can be loaded lazily into the instance so you don’t have to wait to stream a whole terabyte of data into the EBS instance.
So what’s the catch? why wouldn’t you just EBS for everything and ignore S3? Well for one thing you have to allocate all the space on an EBS disk at once so you pay up front for the storage as opposed to paying for it as you use it in the S3 case. The other problem is each EBS instance is tied to a single EC2 node so if you want to share content between nodes you need to utilise something like S3 and/or SQS (Simple Queueing Service) to provide shared storage.
A big step in right direction for Amazon though and something we have been asking for for quite a while.