Well, Amazon has certainly been on a roll recently. Not even considering the fact that they are one of the largest online retailers, they are also way ahead of the game in terms of utility computing. They started off the game with Amazon Simple Storage Service (more commonly called S3), which is still their most popular service. I even use it on this blog to host a lot of my images since the bandwidth is so cheap (15 cents per GB-month of storage and 18 cents per GB transfer, plus 1 cent per 10,000 GET requests). Hosting images though is not what S3 is really for. Amazon S3 is all about generic storage. You create what are called "buckets" and then within these bucks you can create a folder structure that all of your items you want to store go into. These items can be public or private and you can control whether they can be publicly or privately modified, deteted, added, etc… Then you can access these files through API’s in an application or through URLs (which is how I use it to host images). It may sound simple, but there are some small online startups that host all of their data on S3, so for these guys, this is a real game changer for them. Sites like the popular photo sharing application SmugMug use S3 to host all of their user images, because they know that they could never host the images for the low prices that Amazon provides.
Next Amazon released the Simple Queue Service, which is just a distributed message queue. Similar to what you would use MSMQ or any other message queuing middleware for. The beauty is that your queues are hosted at Amazon, accessible from anywhere and will have virtually 100% uptime since they are hosted in Amazon’s multiple huge datacenters. You would be hard pressed to provide an environment that would be more scalable, performant, or reliable than what Amazon has built. Next time you are building a distributed application that needs reliable messaging, maybe you should consider Amazon’s Simple Queue Service. (did that sound like an advertisement?) Ha.
Next they launched the Amazon Elastic Computing Cloud (usually referred to as EC2) which is actually an generic on-demand computing platform. It allows you to create and run your own virtual machine instances to run your own custom applications (even web apps) and you only play for what you use. A true utility computing platform, but at this point it is still in beta.
Well, not to be resting on their laurels, Amazon recently announced Amazon SimpleDB. And you guessed it, they are closing the triangle on their cloud computing architecture. They have storage (S3), computing (EC2), and now they have a database (SimpleDB). While SimpleDB is in a limited Beta, once it is fully released it will provide developers with a highly available, highly scalable database that they can use from either an EC2 instance, or from any other application. It will probably make the most sense when accessed from an EC2 instance, since the connection between application and database will be much faster, but you could access the DB from anywhere if for instance you had a desktop app that needed to query data from the database remotely.
The one interesting part about SimpleDB, that some people will see as a huge downside (and for some applications it is a huge downside), is that it isn’t a relational database. While most people are used to having to define databases and schemas, in SimpleDB you create domains and these domains are essentially tables. Then you add items into these domains that have different attributes. Unlike relational databases though, you can have multiple values for a particular attribute. To see why this is necessary, consider the a t-shirt in a e-commerce database. It might come in three sizes, "small", "medium", and "large". In a relational database you would probably have one table with the shirt and then a second table that holds the three sizes and you would join them on a foreign key to get the sizes for the shirt. Since SimpleDB is not relational there are no such thing as joins or foreign keys, so if you could not put multiple values into a single attribute then you would have to do a bunch of separate calls in order to get all of this data out.
So, your item can have any number of attributes with any number of values per attribute. You can think of these attributes as columns, well kinda like columns, only different items in one domain can have a different set of attributes with completely different values. So, your "shirt" item can have an attribute called size and price, while a "watch" item could have attributes "style", "color", and "price". You could even have two items with the same attribute name that hold different values. For example a shirt would have "small", "medium", and "large" in a "size" attribute, while a belt might have "30 inch", "34 inch", etc… in an attribute called "size". But how does this work? Well, in SimpleDB there is no explicit types. It is called SimpleDB after all.
Once you have all of this data in a domain then you can query it out of the domain, but you can only query from one domain at a time. This query can encompass any of the attributes in a domain since SimpleDB indexes all of the data that is put into it. You can then query this data out using "query expressions". Query expressions are a proprietary string format that Amazon came up with to allow you to easily produce simple queries. They allow for the operators "starts-with", "=", "<", ">", "<=", ">=", and "!=" which are all fairly obvious to those using c-style languages. They are grouped by square brackets "[" and "]" and a simple example of one would be "[‘Size’ = ‘Large’]". This would match any item in the domain with a size attribute equal to large. Then if we wanted to have "Size" "Large" and "Color" "Green" we would write it as "[‘Size’ = ‘Large’] intersect [‘Color’ = ‘Green’]". The "intersect" operator tells SimpleDB to pull all items back that match both predicates. So "[‘Size’ = ‘Large’ intersect ‘Price’ > ‘10.00’] would pull back all large items with prices higher than 10. Pretty easy huh? If we wanted to pull back two disparate sets then we could say "[‘Size’ = ‘Large’] union [‘Size’ = ‘Small’]" and this would match all items that are "Large" or "Small". There is a few other simple options to the queries, but not too many, so you can see that this really is a simple query language. I’ll certainly get into these queries a lot more as Amazon opens up their beta!
So, overall Amazon SimpleDB looks pretty neat, but it is pretty limited in its database functionality. Since we are all used to relational databases with powerful stored procedures and triggers and the like, it may be hard for a lot of us to go back to a simple database structure like this. There are so many things that there is just no way to do in SimpleDB, that I find it hard to image that it would be used for anything but very simple applications. The one place that I could see it making a difference is for distributed application data storage. A rich-client application could certainly benefit from having a data store like this that was available anywhere, and I think it would be very interesting if Amazon provided some API’s for local caching and querying of data. So, is it a game changer like S3 or EC2? Probably not, but it certainly makes EC2 much more useful and further pushes Amazon to the forefront of utility computing (or cloud computing if you want to sound fancy). Come on Microsoft, Sun, Google, where you at?
As a side note, while writing my "Programmer Dress Code" post I discovered that Utility Computing was first described by John McCarthy in 1961! This was the same John McCarthy that developed Lisp and he had quite the established career. Here is what he said:
If computers of the kind I have advocated become the computers of the future, then computing may someday be organized as a public utility just as the telephone system is a public utility… The computer utility could become the basis of a new and important industry.
That is some pretty amazing stuff. Until next time…