SimpleCloud Part 5 – SimpleDB

1 Comment

I started this series back in December.  In fact I wrote 3 or 4 blog posts the day before I took two weeks of vacation.  It’s now approaching the end of the next quarter so I figured I should actually make some progress on this.

The last posting dealt with the concept of storage in the cloud.  In this one we are going to talk about database access.  You have probably heard about document databases.  While RDBMS systems are awesome for when you have related data and need ACID compliance, they are hard to scale.  When I was a consultant I was onsite with a customer who had a large Oracle implementation with some performance issues and had an Oracle consultant there at the same time.  The Oracle consultant was flabbergasted that I could get done in a week whereas their analysis could take several weeks to months.  The nature of a relational database dictates that it will require a LOT of logic, horsepower and consultant dollars to handle larger-scale scalability.

So, accessing data in a scalable environment will generally be easier (possible?) if you use non-relational data.  Well.. not NON relational, just not enforcing those relations in the same way an ACID compliant RDMS would.  So a document database makes a lot of sense and Amazon’s SimpleDB fits the bill nicely.  If you’re on EC2 it really makes the most sense, unless you need immediate consistency of data.  One of the ways you make data access scalable/highly available is by having many, many machines that can provide access to that data.  But that takes time to propagate that data to those machines and, like with the relation database I was talking about earlier, if you need immediate consistency across those nodes you need a lot of logic, horsepower with a bit of luck that you don’t accidentally deadlock the whole thing.  It’s just not worth it.  SimpleDB has what’s called Eventual Consistency”.  In other words, when you update, insert or delete eventually (within 2 seconds according to AWS, I think) the data will be righted.  Most of the time you can stand having data out of date for a little bit.

We will create our configuration just like we did with the storage adapter.

And when we want to get our document adapter we do just as we did before

Now that we have our document adapter in the registry we can work with it.  I used it in two different places.  First, in the job itself so that the job would be able insert the references to the completed images so you can query them later on.  Second, when we query them later on.

The code in the asynchronous job is

What this does is ask the document adapter for the document class (just in case there are some adapter-specific pieces of functionality) and creates the new document, inserting it into the DB.  When creating a new document object the first parameter of the constructor is a name=>value pair of the data you want to store and the second parameter is the optional primary key for the data.  When you insert the document you need to specify a collection for the document to be inserted into, images, in this case which is followed by the actual document object.

When querying the collection we do so by simply… well… querying the collection.

Notice a few things.  First we’re not creating our select object directly, we’re asking the adapter for it.  Just like with the document object, the select object may have some adapter-specific logic.  Actually that’s quite likely.  Then you provide your query parameters, which can be done in a prepared statement-like syntax.  Before passing the query object to the adapter, you must provide the collection name to the query object.  Then, to get your data you need to pass in the collection name along with the query.  Why do you need to do that for both the query object and the adapter?  I dunno.  Maybe it’s a bug, or maybe it’s a feature.  I haven’t looked.

Once you have your data you can simply iterate over it and read each member like you would a stdClass object.


SimpleCloud Part 1 – Setting the stage

Leave a comment

Earlier in December I did a webinar on the Zend PHP Cloud Application Platform.  It's not some new product or anything like that, but rather a view of how our software is going to fit together.  It's not something that will be "released" in the typical software fashion.  Instead it is the mindset of our product development teams when they look at building new features.  Cloud-based pricing for Zend Server, AWS/Cloud integration in Zend Studio, and, of course, SimpleCloud.

SimpleCloud is an initiative started last year (2009) for the purpose of allowing you to build cloud-portable applications.  In other words, you would be able to build an application on your local machine and (mostly) transparently work on any of the three supported cloud platforms.  The example application I built for that webinar was one that used, not just "the Cloud", but all of the cloud services available in SimpleCloud, the Zend Server Job Queue (to scale data processing) and, of course, Studio with it's AWS integration.

The application was one that took an uploaded image and resized it.  Simple enough, unless you want it to scale.  The example application that I wrote can theoretically scale to quite high heights.  Not because I'm a great programmer, but because I utilized the underlying architecture of people smarter than me.  That's kind of what the cloud is.  Do you have the expertise to ward off a massive, worldwide DDoS?  Apparently Amazon does.  One of the prime rules of being human is to not only know your strengths, but know your weaknesses.  Humility is very hard for humans to do, and allowing for the fact that someone may be better than you at something is hard to admit.

The purpose of this application was to demonstrate how you can build an application a) for scalability, and, supplementally, b) for the cloud.  It's definitely not there to be pretty.  🙂  So what it does is implement several cloud-based features.  You could implement all of these on your own, but doing so (especially if you are a business) would probably cost you more.  Part of the cloud's appeal is that someone else is the specialist.  Could you use RabbitMQ?  Sure.  But then you have to manage it.  Could you have a massively distributed file system?  Sure! But then you have to manage it.

When you boil it all down; when you distill it to it's essentials; when you reduce it to it's finest ingredients, the cloud is just an on-demand managed service provider.  Nothing more.

So, what does this application do?

  1. It receives an image to be uploaded
  2. It stores this image on a file system
  3. Executes a job on the Zend Server Job Queue to resize the images
  4. Communicate with the browser, letting the end user know which image sizes have been processed
  5. Browse files with meta data
  6. Download resized files

Could you do all of that on your own?  Sure.  Could you do it for a couple of thousand users?  Sure.  Could you do it for a couple of thousand users who all decided to upload their images at the same time?  Nope.  Probably not.  The cloud isn't just about scalabilty, but elastic scalability.  And the chances are pretty high that you are not good at that, unless you are a large company with loads of resources to call upon.

So let's, then, take a look at what this looks like.  Check the "Related" panel for the link to part 2.

SimpleCloud Part 2 – The Job Manager

Leave a comment

In the previous installment I talked a little about the cloud, what Zend is doing in the cloud and what the example application for my ZPCAP webinar did.  One of the primary characteristics of scalability is the ability to process data as resources are available.  To do that I implemented the Zend Server Job Queue with an abstraction layer that I’ve written about three different versions for.  I think the fourth will be the charm :-).

The Zend Server Job Queue works by making an HTTP call to a server which will execute a PHP script.  That HTTP request is the “job” which is going to be executed.  The job is simply the Job Queue daemon pretending to be a browser.  While that works pretty well I prefer a mechanism that is more structured than simply running an arbitrary script.  Having small, defined, structured tasks allow you to spread those jobs over many servers quite easily.

So what I did was write a management system that is relatively simple which allows me to define those tasks and execute them on pretty much any server that is behind a load balancer.  And on the cloud, that load balancer can have a thousand machines behind it AND it can be reconfigured without changing your application.  One of the keys of elastic scalability is that you can throw an application “out there” and it will “work”.  That is why the Zend Server Job Queue is a good idea in the cloud.  Because it uses a protocol that requires one entry point to be defined and the rest is up to the infrastructure to work out.  (I personally am of the opinion that PHP developers are too dependent on config files).

There are two parts to this manager.  1) the queueing mechanism, 2) the executing mechanism.  Both are handled in the same class, named comzendjobqueueManager.  When a job is executed, it does not execute, it sends a request to the load balancer using a REST-like API.  The Job Queueing mechanism, by default, manages the queue on the local host.  I wanted the job server to manage its own queue.  This REST-like API will send the request to the load balancer, which sends it to a host.  In that REST-like call is contained the serialized object of the job that needs to be executed, along with an dependent data/references to data.  That host then queues the job on itself and then returns a serialized PHP object that provides the host name and the job number.  This result object can then be attached to a session so you can directly query the job queue server on subsequent requests.

The code for the manager is as follows.


Sequence of Events

sendJobQueueRequest() is the first to be called.  The job is passed via a parameter and is subsequently serialized.  A connection is made to the URL, which is stored in a Zend_Config object.  That URL can be a local host name or the load balancer’s host name.  Using this you can also set up different pools of servers quite easily simply by creating multiple load balancers and have each pool managed based off of its individual resource needs.

sendJobQueueRequest() called on the front end will cause createJob() to be called on the back end.  This queues the job locally by specifying a LOCAL URL that will be responsible for executing the job and creates a response object which contains the unique hostname of the machine and the unique job number on that machine.  It is serialized and echoed.  sendJobQueueRequest() then reads the response and unserializes it into a Response object which can be attached to a session.

This is the code on the backend URL that will be executed to queue the job.


Don’t worry about the bootstrap.php yet.  It simply contains some configuration mechanisms and instantiates the SimpleCloud adapters.  We’ll cover that later.

This is the code for the response object (created in createJob()). The front end machine can call getCompletedJob() and pass the response object to check and see if the job is done.


At some point in the future, as resources are available, the URL, noted by Zend_Registry::get(self::CONFIG_NAME)->executeurl in createJob() will be executed.  The code of that URL is


Pretty simple, eh?  That’s because most of the magic happens in the Manager class.  This is when executeJob() is called.  It takes that serialized object, unserializes it, and executes the run() method.  We will look at the difference between execute() and run() in a subsequent post.  If the job executes fine, the job is re-serialized and echoed.  If there is an exception thrown, THAT is serialized.

That’s the manager.  Next we will look at the abstract job class and after that we will get into the SimpleCloud components.

SimpleCloud Part 3 – The Abstract Job

Leave a comment

We have so far looked at setting the stage and managing the job.  How about executing the job itself?  The job we will look at here will be relatively generic.  I will get into more detail after I have talked about the SimpleCloud elements.  This, here, is simply to show you the theory behind how jobs are executed.

The abstract class is pretty simple.


There are only three methods.  The first is _execute().  This method needs to be overridden.  It is the code that will be executed on the remote server.  And because it will be serialized and executed on the remote host, the code for your job class will need to be deployed there.  You could actually send the source code for the class along with the serialized version and make the backend COMPLETELY stupid, but I would think that anyone remotely security minded could see the problem with that.

To implement a new job, do something like this:


Then to send the job to the queue call:


The execute() method is called on the front end.  But it doesn’t really execute.  It calls the queue manager and queues it on the backend servers.

Then on the backend servers (remember the executeJob() method?) the run() method is called, which actually calls the _execute() method, which contains the logic.  And while I didn’t show it here, because this job is re-serialized after execution you can store status information any other data attached to the object in that object and, once it’s unserialized on the front end after calling getCompletedJob() on the job manager.  If the job is completed it will return the unserialized instance of, in this case, orgeschradejobsSendEmail as it existed at the end of its run.

Now, to get to the SimpleCloud portion of this series; Storage.  The link for part 4, discussion storage, is in the related stuff section.

SimpleCloud Part 4 – Storage

Leave a comment

Now that we’ve gotten some job processing code done, let’s get into the good stuff.  The first thing we’re going to look at is the storage mechanism in SimpleCloud.  The example we used was uploading an image to the server so it could be resized for viewing at multiple resolutions or junk.  Now, you could simply attach the file contents to the job class, serialize it and unserialize it on the other side.  But the Job Queue server is really not designed for that (nor are most other queueing applications).  So what we’re going to do is use the Storage mechanism in SimpleCloud (in this case, S3) to store the files temporarily and then for the resized versions.

The first thing we need to do is create the adapter.  I am simply putting it into a Zend_Registry object for later retrieval.  It, along with the Document and Queue adapters, are created in the bootstrap file.  The bootstrap file loads the autoloader, creates the config objects and then creates all of the cloud adapters.


The most important line is the getAdapter() line.  That takes the configuration options and creates an adapter based on those options.  It’s really quite simple.  In this case I’m using the S3 adapter.

A bucket name needs to be specified, and I believe it needs to be created ahead of time.  This allows you to separate your applications but still use the same account keys.  Easy, huh?  You haven’t even tried using it yet!  Here is the job (distilled to the essentials; full version will be downloadable) that is used to process the images.


The parts pertaining to the document adapter have been bolded.  The point here is that the storage and retrieval of file data is pretty much transparent.  Store/Fetch.  Integrating between the front and back end is pretty easy, too.


So, what is going on here?  I’ve bolded the most important parts.  When we call setSourceFile() this calls the code that uploads the file to S3.  Additionally, IIRC, there is also a stream API where you can pass a file resource and it uses that instead of the simple file contents.  That’s very useful for storing large files.  But remember in the earlier post where I said that calling execute() doesn’t actually execute it, but queues it, and that the result is a response object that provides the job number and the server host name?  There you see it getting attached the the session.  This code then forwards to another page, which we will look at in a bit.

But, as you can see, using SimpleCloud to upload files to a storage service is stupid easy when using Zend Framework.