Tag Archives: Queue

Magento-based asynchronous execution

Working with an off-the-shelf shopping cart usually requires a little bit of patience. Scaling an e-commerce site does have its share of problems. There is a LOT of interactivity that needs to be implemented. This can be things along the lines of generating targeted ads, sending email or charging a credit card.

To charge a credit card, the ecommerce software will usually take the credit card information from the end user, put it into some form of web service request and the request is submitted to a remote system.  While that web service request is taking place the process handling PHP is unable to take additional require requests to serve regular pages.

One option that you have is to complain that PHP doesn’t have threading.  That’s not the best thing to do.  As Marco Tabini said recently on Twitter “Every time someone mentions threading in PHP, an angel’s wings enter a race condition”.  Threading solves some problems.  However, chances are that while you may want threading you probably don’t need it.

However, while you probably don’t need threading, there are plenty of times when being able to do things asynchronously would be beneficial.  The example that I started looking at was a credit card request.  While waiting for the credit card transaction to occur you have one of two options.  1) Let the screen be blank while you’re waiting for the transaction or, 2) use some kind of output buffering and progressive rendering to let the end user know that the transaction is, in fact, being processed.

However, there is another, better, option. Rather than either spending loads of CPU time to process loads of logic, such as personalized ads, or have long wait times, such as processing a credit card, you can have this processed “behind the scenes” so you can immediately respond to your customer.

A simple example of what a Job Queue architecture can look like is almost like a hub and spoke architecture except that instead of the hub being the center it is actually the outside.  Ok, so a simple Job Queue architecture is exactly the opposite of a hub and spoke architecture.  Sue me.

The way it works is that there is a backend server, or cluster of servers, that handle servicing Job Queue requests. The requests are made from your front end web servers which is sent to a URL on the backend.  The URL is where the logic is that needs to be run.

Using a simple architecture you can just have that URL be a simple script that is run.  However, I prefer a more structured solution if I am going to integrate asynchronous processing in my application.

This is where the Magento connection starts.  I have already written about how to implement a structured asynchronous mechanism.  This is the same implementation that I use on this blog site.  What I’ve done is take that implementation and re-implement it so that it works within the context of a Magento application.  I have placed this implementation on Github.  It is not yet part of Magento Connect, though I intend to do it and I intend for it to be provide free of charge.  However, what I also wanted to do was give others the chance to look at it and improve it prior to putting it on Magento Connect.

Implementing your own task, be it pre-processing advertisements or processing a credit card is very easy.  Processing a credit card, however, should be done with the addition of encrypting the data is that data is stored “as-is” in the Job Queue database.

Defined in this library is a class called ZendServer_JobQueue_Job_Abstract. This is the base class for defining a task.  There is only one method that you need to implement, though you can implement as many of your own method as you want, such as getters and setters.  The method is called _execute() and this is where you would implement the logic that you want to implement.  However, it is important to note that because this is run on a completely different machine once the task has been set to execute no changes that you make will be reflected in the job if it has started running already.

In the code download there is an example of how to implement this class.  It is called ZendServer_JobQueue_Model_Mock.  All it does is write to the PHP error log, but does so asynchronously from the Job Queue URL.  The code looks like this

class ZendServer_JobQueue_Model_Mock extends ZendServer_JobQueue_Job_Abstract
{
            protected function _execute()
            {
                        error_log('Mock Model run');
            }         
}

One thing to note.  It’s freaking easy to implement this!  If you want to run this, here is your code.

$task = new ZendServer_JobQueue_Model_Mock();
$task->execute();

Wham.  Bam.  Done.  It is now running on your Job Queue server.  I won’t get into all of the details on how it’s done, though.   You can take a look at the abstract class and understand the details yourself.  It is open source after all.

But if you were to run this code right now you would probably get an exception thrown.  That is because you have not configured your Job Queue yet.  In order to do that you need to look etc/config.xml file.  You need to edit the element config/modules/ZendServer_JobQueue/jobqueue/url and specify the URL of the job queue entry point.  Since there is an index controller for the ZendServer_JobQueue extension and I just used the standard router, the URL would be $HOST/jobqueue.  It is recommended (highly recommended) that you make this URL available over the localhost or private.  It is not by default, so I recommend that you set this up using either a virtual host that only listens on 127.0.0.1 or on a machine that is behind a firewall.

So, that’s pretty much it.  Though I suppose you’ll need Zend Server as well. 

To install Zend Server you can go to zend.com and set up your system to install or download (for Windows) Zend Server.  It comes with a 30 day free trial.  Give it a shot.  If you have trouble feel free to post on the forums at forums.zend.com or you can post a comment here and I can try to answer it.

Happy coding!

Pre-caching PHP content with Zend_Cache_Manager and the Zend Server Job Queue

With the web being what it is today there can be a lot of times when you want to aggregate data from many different sources and bring them together in a single page.  I have not done much of that on my site simply because that means that I then need to learn a bunch of different API's.  However, since Youtube is the #2 search engine I figured that it might not be a bad idea to aggregate some of my YouTube content on my page automatically.  I don't necessarily want to do a blog post about each individual video I post, but I wanted there to be some place where I could just list them out.

I have two places where I post content.  Youtube and Facebook.  However, polling each site individually for each request is not conducive to having a page that renders quickly.  The thing you do NOT want to do is poll YouTube each time someone comes to an individual page.  The way around this is to cache the contents of the YouTube or Facebook query so you don't have to do that.  Then people are able to re-use the previously defined data when they view that page.  What this does is make most of the new requests to that page much faster since they don't have to re-load that data from YouTube or Facebook.  However, there's a bit of a problem there as well.  Every X number of minutes, the cache will expire and someone will take the hit of connecting to Youtube.  With a moderately low traffic site such as mine, that hit is something I didn't want to make my users endure when they came to the site since there is a decent probability that the cache will expire in between individual page requests.  And, working for Zend, I can't have a page that renders slowly, can I.

So what I did was create a new Zend Server Job Queue task, which I have detailed several times (and there should be a link to several on the side) that would connect to both YouTube and Facebook.  This task would insert the results into a cache (you could use a database if you liked) so that when someone came to a page that they would be seeing the cached data rather than polling YouTube.  From a settings perpective, the cache is set to never expire the content there.  But because I set the task to run once an hour the content is going to be refreshed.  Using this pre-population method I am able to keep requests snappy which at the same time providing mostly up to date content.

The task to do this is relatively simple.  First I edit my application.ini file to set up the cache manager.

resources.cachemanager.video.frontend.name = Core
resources.cachemanager.video.frontend.options.automatic_serialization = true
resources.cachemanager.video.frontend.options.lifetime = null
resources.cachemanager.video.backend.name = File

By defining these ini settings, Zend_Application will automatically instantiate an instance of Zend_Cache_Manager and set up a cache that is named "video" with the individual options as specified.  What this means is that I could create another cache interface by taking these configuration lines and giving it its own configuration settings.  It could be different settings or even a completely different backend, or a different front end.

Then I create my task class.

class Admin_Task_VideoPreCache extends Esc_Queue_TaskAbstract

    protected function _execute(Zend_Application $app)
    {
        $yt = new Zend_Gdata_YouTube();
        $options = $app->getOption('video');
        $uploads = $yt->getUserUploads($options['youtube']['id']);
        $manager = $app->getBootstrap()->getResource('cachemanager');
        /* @var $manager Zend_Cache_Manager */
        $manager->getCache('video')->save($uploads, 'youtube');
       
        $query = 'SELECT title, description, embed_html FROM video WHERE owner=' . $options['facebook']['id'];
        $url = 'https://api.facebook.com/method/fql.query?query='.urlencode($query);
        $data = simplexml_load_string(file_get_contents($url));
        $videos = array();
        foreach ($data->video as $video) {
            $videos[] = array(
                'title'    => (string)$video->title,    
                'description'    => (string)$video->description,
                'embed_html'    => (string)$video->embed_html
            );
        }
        $manager->getCache('video')->save($videos, 'facebook');
    }
}

Because the Zend_Application instance is always passed in I can easily get access to the predefined cache manager object in here for when I need to store the data at the end of the task.  Then in the task I use Zend_GData_Youtube to query YouTube and I do a simple FQL query to Facebook to get the Facebook videos (which stopped working between test, staging and production.  Go figure).

The next thing I have to do is make that data available to a view.  To do that I need to create a new controller action that queries the cache manager.

    public function myvideosAction()
    {
        $app = $this->getInvokeArg('bootstrap')->getApplication();
        /* @var $app Zend_Application */
        $cm = $app->getBootstrap()->getResource('cachemanager');
        /* @var $cm Zend_Cache_Manager */
        $this->view->youtube = $cm->getCache('video')->load('youtube');
        $this->view->facebook = $cm->getCache('video')->load('facebook');
    }

Then all I need to do in my view is iterate over the data and I'm pretty much good to go.  Because the cache data has been prepopulated my visitors should never have to take the hit of populating the cache and by using the Zend Server Job Queue the task of populating the cache is extremely easy to do.

Do you queue? Introduction to the Zend Server PHP Job Queue

There has been a lot of talk over the past several years about the difference between performance and scalability.  Never mind that the difference between the two will probably not really affect most developers.  Never mind that the “difference between performance and scalability” argument is often used when someone’s code performs poorly and their best argument is “Yeah, but my code scales”.  Yeah, sure it does.

But when talking about building a scalable application there is a big concept out there that many PHP developers are not overly familiar with.  That concept is queuing.  It is becoming much more prevalent in PHP-land but the concept of a queue is still relatively unused among PHP developers.

So, what is a queue?  Basically it means “take this and do something later”.  “This” could be anything, from a certain point of view (requisite Star Wars reference).  What that means is that “something” can be offloaded somewhere else (a queue) for further processing.  A queue is generally not an endpoint, but a conduit.  A pipe (requisite political reference).  But it is a pipe with a flow-control valve on it (requisite plumbing reference).  In other words the “something” will stay in the pipe until a) someone gets it, or b) it expires.  Hopefully, a.

This “something” is sometimes data and sometimes it is functionality.  There are a lot of data queues out there and the nice thing about data queues is that they are pretty much language independent.  In other words you can connect to a Java-based data queue from a PHP-based application and as long as you agree upon the format, like Stomp or JMS (if using the Java Bridge) then you can pass data back and forth without much problem.

However, there can be a problem when it comes to queueing functionality.  You clearly are not language independent.  Not that it’s a problem, but you’re not.  What this means is that now you have to have a specific method for implementing the queueing functionality.  There are a couple of open source options available, Gearman for one, but not many.  What I’d like to do is provide an example using the Job Queue in Zend Server 5.

Queueing a job is actually very easy to do.  A job is run by calling a URL where that job resides.  The Job Queue daemon will receive the request from your application and will then call the URL that you specified in the API call.  Once you call that URL your application can continue going on its merry way to finish serving up the request.

Serving the request

On your front end machine, the code to call the queue is pretty simple.  It consists of creating a ZendJobQueue object and calling the createHttpJob() method.  If you have any parameters that you need to pass to that job you can specify them in the second parameter of the call

 

1
2
3
4
5
6
$q->createHttpJob(
    'http://localhost/sendemail',
    array(
        'email' => $_POST['email']
    )
);

Then on the “sendemail” side your code would be

 

1
2
3
4
5
$params = ZendJobQueue::getCurrentJobParams();
if (isset($params['email'])) {
    mail($params['email'], 'Welcome', 'Welcome to my nightmare');
    ZendJobQueue::setCurrentJobStatus(ZendJobQueue::OK); 
}

That’s really all there is to it.

Or is there…

Serving the request… cool-y

My problem with this method is that it really is not as structured as I would like.  Modern applications are not really “scripts” even if they are written in a scripting language.  So, what I like doing is taking this existing functionality and providing some structure.  What I did for this website is take the existing Job Queue functionality and added something kinda similar to Java’s RMI.  It’s not quite, but kinda.  Or kinda like threading.  Not really, but kinda.

What I start out with is a generic abstract task class.  It looks like this.

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
abstract class Esc_Queue_TaskAbstract
{
    const OPT_NAME = 'name';
    const OPT_SCHEDULE = 'schedule';
    const OPT_SCHEDULE_TIME = 'schedule_time';
 
    private $_options = array();
 
    protected abstract function _execute();
 
    public final function execute(Zend_Application $app, $qOptions = array())
    {
        $q = new ZendJobQueue();
        $jqOpts = $app->getOption('jobqueue');
        $qOptions = array_merge(
            array('name' => get_class($this)),
            $qOptions
        );
 
        $ret = $q->createHttpJob(
            $jqOpts['url'],
            array(
                'obj' => base64_encode(serialize($this))
            ),
            $qOptions
        );
        return $ret;
    }
 
    public final function run()
    {
        $this->_execute();
    }   
}

There are two defined methods and one abstract method.  The two defined methods are final because they need not and should not be overridden for the sake of predicability (final is under-used IMHO).  The execute() function doesn’t really execute anything.  It just takes the current class, serializes it and base64 encodes it, because the params don’t like binary data and sets it as a parameter called “obj”.  From there it inserts it into the Job Queue which is specified by a Zend_Application configuration setting.  That setting is

 

1
jobqueue.url = http://localhost/jq

Since queues generally contain privileged information it is a good idea to hide it from the outside world either on another machine/VM or web server directive.

The second method is called run().  It is not called on the front end machine.  The back end Job Queue will call that to execute the functionality that is defined in this class in the abstract method, called _execute().

So that’s the abstract class that our tasks are based off of, but how about an individual task?  What does that look like.  Well, to take our code that we had previously written…

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Admin_Task_Mail extends Esc_Queue_TaskAbstract
{
 
    private $_email;
    private $_message;
    private $_subject;
 
    public function __construct($email, $subject, $message)
    {
        $this->_email     = $email;
        $this->_subject   = $subject;
        $this->_message   = $message;
    }
 
    public function _execute()
    {
        mail(
            $this->_email,
            $this->_subject,
            $this->_message
        );
    }
}

I put this code into my /application/modules/admin/tasks directory and added the following line to my bootstrap.

 

1
$al->addResourceType('task', 'tasks', 'Task');

That way the Zend_Application autoloader can easily autoload any tasks I have defined.

To execute this task, in my controllers, I simply type.

 

1
2
3
4
5
6
7
8
9
$mail = new Admin_Task_Mail(
    $_POST['to'],
    $_POST['subject'],
    $_POST['message']
);
 
$mail->execute(
    $this->getInvokeArg('bootstrap')->getApplication()
);

This will then send the job to the Job Queue daemon.

Speaking of.  We need to now execute our job.  That is done by defining a controller with code similar to the following.

 

1
2
3
4
5
6
7
8
9
10
11
12
13
$params = ZendJobQueue::getCurrentJobParams();
if (isset($params['obj'])) {
    $obj = unserialize(base64_decode($params['obj']));
    if ($obj instanceof Esc_Queue_TaskAbstract) {
        try {
            $obj->run();
            ZendJobQueue::setCurrentJobStatus(ZendJobQueue::OK);
            exit;
        } catch (Exception $e) {}
    }
}
ZendJobQueue::setCurrentJobStatus(ZendJobQueue::FAILED);
exit;

It retrieves the parameters and checks for one called “obj”.  It then unserializes the base64 decoded data, which should recreate the object that you created on the front end server.  After testing to make sure that it is an instance of Esc_Queue_TaskAbstract we call the run() method, which in turn calls the actual functionality we defined in _execute().

Sweet.

Summary

Key points on building super-cool job queue applications

  1. Create an abstract class to wrap around your tasks
  2. Use that abstract class to add itself to the Job Queue
  3. Write a controller script that is the queue endpoint
  4. Have that script recreate the object and execute the method you had defined in the code