Browser Communication with Nginx and WebSockets…

Leave a comment

… was what I was going to write about.  I’ve been meaning to come up with a solution where I could do a combination of regular HTTP requests combined with Web Sockets combined with Redis to send messages easily back and forth between the backend and the frontend.  My thought was that I might be able to take Nginx, which has WebSockets proxy support and combine it with Redis to create a means of facilitating communication between a frontend browser user and any backend asynchronous tasks that might have pertinent messages to send to the frontend.

There are definitely ways of doing it.  But, basically, all of them involve having a web frontend, Redis, and then a bunch of Node code that you have to write and maintain.  The Node block was one that has held my research on this back for several months.  I just don’t want to have yet another service running, that requires yet more code for me to write, in yet another programming language.

Fast forward to this week.  I started working on a load testing mechanism that allows Magium tests to be used as a means of load testing.  Obviously you cannot generate significant loads from a browser test on one server and so the solution REQUIRED some form of messaging and synchronization.  In my searching I found that ActiveMQ could be used as an embedded messaging service.  Well, I know Java, and I know that ActiveMQ has WebSockets support (so I could control the tests from a browser), so “why not?” I figured.

It turns out that it actually all worked really, really well.

So it turns out that Nginx + this + that + plus the kitchen sink, wasn’t actually necessary.  Everything could be handled within two distributables from Apache: Jetty and ActiveMQ (with Camel which acts as a bridge between the two).  With that I get WebSockets, HTTP/1 & 2 (HTTP push maybe?), and an integrated async messaging system (I LOVE async!).  Add an ESB like MuleSoft and you’re set for almost any kind of communication.

But I’m getting ahead of myself.

While I intend to answer deeper questions of how Jetty fits into a larger pattern of advanced web applications, including running PHP, I don’t have the time to do that today.  I owe some folks some deliverables and so I will continue my investigation into using Jetty in another post.

What I want to discuss at this point is static throughput.  Java has the reputation of being slow and, in particular, bloated.  I believe that this reputation is somewhat deserved.  But like many who assert that the filesystem is slow sometimes the bigger picture is a little more complicated.  But if Jetty turned out to be “enterprisey” (which means big, bloated and slow) then this wouldn’t work as a solution.

So like a good little boy, I tested to see if Jetty would be a blocker on the performance/bloat front.

The first graph is the test results.  The test was retrieving the favicon.ico file on a 4xIntel(R) Xeon(R) CPU X3430 @ 2.40GHz. with 16GB RAM.  All webservers were configured to run on their defaults.  That means that these numbers could be improved.  In fact, Nginx could possibly double its throughput (but, then again, so could Jetty, for reasons we’ll see in a bit).  Apache is pretty much at its limit.

throughput

What?  Jetty was faster than Nginx?  Yep.  In this test it was.  I was working under the assumption that, given the ratio of static content to PHP requests in a Magento site, I would consider Jetty a contender if it was simply better than Apache.  I was most definitely NOT expecting it to be faster than Nginx.

So lets get down into some of the details.

Let’s dispense with Apache first, since we all know why it performed like this (though, 10k requests per second is definitely not bad).

apache.static.500

Apache hit the throughput limit pretty quickly and never got above 10k requests per second

apache-vmstat.500

In terms of system resource usage, the usage pretty much maps to throughput.  The throughput was limited by CPU.

No surprises there.

Next up is Nginx.  Nginx was configured with worker_processes set to auto, which configured itself to 4.  The result of the throughput test is this:

nginx-static.500

Throughput went up very quickly and was serving upwards of 47,000 requests per second.

nginx-vmstat.500

This is the system usage during the Nginx test.  Clearly system time was the driver for throughput.  It matches the throughput almost exactly.  Though, I wonder if this is a function of network saturation because of the downward slope of the throughput for system time.  If the throughput decline was due to increased system time then the system time should increase while throughput decreases.  Instead we see them in lock step.  For that reason I believe that the overhead of additional HTTP payload (headers, and such), would explain the decline.  In other words, Nginx could probably do better.

But that’s not really the point of all this.  The point of all this, in my mind, was that Jetty was able to keep up with (and surpass) Nginx in the static throughput department.

jetty-static.500

The peak throughput was around 53,000 requests per second, compared to Nginx’s 47k.

jetty-vmstat.500

System usage was similar to Nginx, perhaps even a little better.  Nginx hit 70% and stayed there whereas Jetty peaked at 70%, dropped to 37% and then hit 60%.

In short, I was dumbfounded when my first test graphs were being rendered.  I had figured that there was an error; it was not what I was expecting at all.

Does this mean that you should switch to Jetty?

No.

Jetty has a lot of complexity and most Magento (my focus) system admins do not have the experience to administer Jetty.  If you have standard needs, Nginx + PHP-FPM is most likely the best option still.

However, do you have an application that needs WebSockets or some form of messaging (such as JMS)?  These test results make for some interesting thoughts.

That highlights one of the things I like about Java.  I like the language syntax, but what I like about Java more is that a lot of the stuff that “feels” cobbled together in the PHP world already exists in the Java world.  Instead of writing yet another abstraction layer in PHP that does about 20% of what you need, perhaps the Java infrastructure running on Jetty gets you 80% of the way there instead of 20%.

But, like I said, I have deliverables I need to deliver and so I need to wrap up this blog post.  There are three questions I still want to answer.

  1. How does this Jetty work wit PHP-FPM?  I expect this to be a slam dunk since PHP-FPM works fine with Nginx.
  2. How can a PHP developer put this all together and give themselves more features than they could ever know what to do with?
  3. I still need to directly answer the question of easy integration of WebSockets and messaging.  I expected that either Jetty + ActiveMQ or Jetty + Redis will provide an out-of-the-box(ish) solution.

 


4 charts that are guaranteed to make you a better performance detective

Leave a comment

I was giving the Magento Performance and Optimization for System Administrator’s course today and I said something that is either borderline brilliant, stupid, or common knowledge.  What I said was something along the lines of “finding performance problems is about finding a) correlations, or b) deviations“.  In other words, a big part of determining a performance problem, especially when using instrumentation data as opposed to reviewing code, a prime goal is to find data that correlates or deviates.

To illustrate this I rendered 4 charts top illustrate what I meant.

2014-04-04_1125

 

What I mean by this is that when you are determining performance problems whose cause is not readily apparent you should be looking for data that correlates either inversely or proportionally (top row) or deviates (bottom row).

What do you think?  We must always grant that there will be cases where this is not true.  However, it seems that it many scenarios finding either a) correlating data, or b) deviating data gets you about 3/4 of the way to discovering the source of a performance issue.


I’m actually really excited about Hack and HHVM

Leave a comment

When you get to be my age you start thinking about the future.  You start wondering if some of the choices you made when you were younger.  You wonder if those choices are going to come back and haunt you.  If you don’t have such thoughts you are either an idiot, or a young person without enough experience to make you think such things.

One of the things that I have been wondering about is whether my decision to focus on PHP several years ago was the right one.  Technology is always changing and eventually the guys writing Ruby, too, will start having children and then will start looking down upon people who use disposable diapers instead of more environmentally friendly cloth instead.  In other words, times change and one of the keys to navigating change is to know what to change, why to change and when, if ever, change is necessary.  Change for the sake of change is stupid.

There are several changes that are occurring in technology.  But one thing that has not changed has been the venom that PHP has been on the receiving end.  Much of the venom is, in fact, deserved, though the quantity by which the venom is spewed is not.  If you are the source of 75% of the traffic on the Internet then you might be doing something right.  But that 75% is today.  What about tomorrow?

And so my question has been “will there be enough innovation in PHP to keep it a force to be reckoned with several years down the road?”  I ask this because there are market and technology changes occurring that PHP is not well suited to navigate.

But before you think that I am totally down on PHP that is not quite true.  PHP was a revolutionary force when the web first came out.  It, and not any other programming language, democratized the web in a way that was not otherwise possible at the time.  But with the web won PHP kind of sat on its laurels for a bit.  Because of this several different languages have gained a foothold.  While PHP won the initial battle for the web (and did it convincingly) it did not protect its flank through continued innovation.  Innovation occurred to some extent, but PHP did not really mature.  In fact, I would argue that Ruby did what market forces are supposed to do to competitors ; forced them to step up them game.

And PHP did.  Within the span of just a few years framework after framework shot up.  The running joke was Q: “How many frameworks are there today?” A “That depends.  Is it Wednesday or Thursday?”  Ruby threatened and PHP responded.  Symfony, Zend, Laravel, Cake and a host others that I am not listing matured the PHP community very quickly.  What was once a hack language grew into a much more mature ecosystem.

But again, PHP seems to have stagnated a little.  There have been several people who have noted dis-function in PHP internals.  I’m not involved that close with that and so I have no definite statement I am willing to make.  However I have been concerned and what I am perceiving to be a slowdown of innovation since PHP 5.3 came out.

But the hack language may be on the verse of another jump forward via the Hack language and the HipHop VM.  While I, as a metalhead, object to the naming of the aforementioned VM with the release of version 3 we may be seeing something that could possible extend PHP’s life by 10-15 years and introduce another round of innovation for PHP.

With version 3 the excitement is almost palpable.  Davey Shafik and Padraic Brady have written blog posts recently that are the source of my optimism.  Resource utilization is a significant concern of mine at Magento (where I work now).  Magento does some really great things but does them at a fairly significant cost of CPU time.  If you’ve played with Customer Segments in EE you know what I’m talking about.  or, for that matter, rendering the layout.  When I was working at Zend I was constantly asked “what switch do I need to make PHP faster?” and an opcode cache can only do so much.  With Hack and HHVM that, quite literal, is a possibility.  In the spirit of full disclosure, I have not had a chance to play with it in depth and I hope to be able to soon, but the potential is there.

And that potential is something to be excited about.  Facebook has demonstrated significant resolve in building the HHVM.  It, like PHP to begin with, was built to solve a real problem; the problem of moving an old language into new times.

And maybe, just maybe, solving that problem will help keep PHP in the position it is in.  Maybe an old dog can be taught new tricks.

 

Now if only someone could provide my wishes of core low level event APIs and easy messaging integration.


Google finally acknowledges that PHP exists

17 Comments

I read an article today about how PHP is exploding on Google App Engine.  How is it that one of the most despised programming languages in the word is running (as Google claims) up to 75% of the web?  Many nay-sayers will say “oh it’s just WordPress” or “oh, it’s just PHPbb”.  But in doing that they are completely missing the point.

The proper response to this is not trying to dismiss it, but asking why it is that PHP-based applications just seem to always be the ones at the top of the list?  Some may answer that PHP’s ubiquity gives it a default advantage but that still dodges the question.  WHY is PHP running 75% of the web?  Hosters don’t just say “hey, let’s throw X programming language on our web servers!”

It comes down to demand.  There is a lot of demand for PHP.  That’s why hosters put it on their web servers.

In the article Venture Beat says “PHP is moving to the Enterprise very quickly”.  This is not true.  PHP IS in the enterprise and has been for a long time.  People just either don’t know it or refused to admit it.

But, again, we have not answered the question “why”.

Many of the people who are nay-sayers of PHP are the people who have studied.  And in studying they have learned that programming languages need to do certain things in certain ways.  And of these things, PHP does none of them (ok, so this is hyperbole, to a point).  This is a major reason why PHP has such a bad reputation among cutting edge developers, CS grads and trend-setters.

But what it also does is expose the vacuousness of the ivory tower.  The ivory tower deals with validating the theoretical, testing the impractical from within a educational framework or methodology.  People will often say that this approach is a more pure way of approaching the problem rather than the dirty commercially-driven interests of the private world.  To which I say “big frigging deal!”.  Don’t get me wrong, I think that study is good.  Though I didn’t go to university I am under a continuous education program called “reading”, for the theoretical, and “practice” for the practical.  Study is good.  But study is not an end.  Real life occurs and it is not clean, pure and methodological.  What a bore if it were!

But this is real life.  PHP may not solve the problem in the purest of ways; in fact it will probably be pretty dirty.  But that is why it succeeds; it mirrors real life.  In real life you have a job to get done.  And if it takes more resources to do it properly, then the improper method will be used.  Commerce and business, at their most distilled, is simply an efficient means of the utilization and transfer of resources.  Those resources could be money, time, knowledge, or any combination of those or other things.  It is the utilization and transfer of things that have “value”.  And when you have two things that both have worth, purity and practicality, a judgment call needs to be made on which is more valuable.

PHP is valuable not because WordPress is built on it, but because PHP solved the problem WordPress was solving, easier.  In other words, it solved the problem by consuming fewer resources.

Using PHP, I think, is also one of the smarter moves by the company I work for, Magento.  For those who don’t know, Magento is the most popular ecommerce platform in the word and it is written on PHP.  Magento is probably the most complicated application platform available for PHP and it’s STILL easier to build for than most Java web applications with a wider range of programming skills that can be utilized.  In other words, it enables commerce by utilizing fewer resources than competing solutions, but still provides stunning extensibility.

An organization should require as few “top-end” developers for a solution implementation as possible.  When it comes to Magento, WordPress, Joomla, WordPress, etc. you do not require a CS degree to do it.  Rather than being a failure, that is a monumental success!  Scarcity increases cost and so if you can decrease scarcity (developer skill required) you can decrease cost.  And the real world is about doing as much as possible for as little as possible.

So how is it that Google missed PHP?  That is a question that I cannot answer since I don’t work for Google.  But I would surmise that it has something to do with the fact that Google didn’t WANT it there.  For all their posturing about being “data driven” they completely missed PHP despite the fact that they have access to the best web data on the planet.  Therefore I must presume that it’s another iteration of The Invisible Postman; also called “having blinders on”.  Node, Ruby, Python; all great languages and can do some really cool things that PHP cannot.  But they do not solve the problem of resource scarcity on the same level that PHP does, when it comes to web-based applications.

For software companies that are looking to break into the web there is only one language to start with.  As long as HTTP is the de facto protocol of the web, PHP will be its de facto programming language.  Suck up your pride, build your stuff, and be successful.

 

… and let the trolling commence.


What I would love in PHP-FPM

3 Comments

I love most things about PHP, but what I don’t like is that in order for me to do any kind of asynchronous processing I need to create an infrastructure.  In other words, I need to build a queuing daemon or build some kind of interface.

It really shouldn’t be that much work for what is a simple task in many other languages.

So it would be really cool if PHP-FPM had a FIFO/delayed queue where you could inject a FastCGI request into the queue and do either fire and forget or allow the executing process to wait on a queue selector.  So it would look kind of like this

1
2
3
4
5
$j1 = new FpmRequest('/some/url', 'POST', array('var' => 1, 'var2' => 2));
 
$q = new FpmQueue();
$q->addJob($j1);
$q->execute();

or if you want to wait for the response, this

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$j1 = new FpmRequest('/some/url', 'POST', array('var' => 1, 'var2' => 2));
$j2 = new FpmRequest('/some2/url2', 'POST', array('var' => 1, 'var2' => 2));
$j3 = new FpmRequest('/some3/url3', 'POST', array('var' => 1, 'var2' => 2));
$q = new FpmQueue();
$q->addJob($j1);
$q->addJob($j2);
$q->addJob($j3);
 
$q->execute();
$q->wait();
 
echo $j1->getOutput();
echo $j2->getOutput();
echo $j2->getOutput();

It would be nice for the Apache SAPI to do this as well, so I could debug the requests easier (I use Zend Server which, ATM, only supports Apache).  But it would seem that PHP-FPM would have an easier time of doing this because it manages its own resources it could do things like maintain a separate pool reserved for queued requests.

Maybe I have unique use cases or I just like making things more complicated for myself.  But it would be really nice to have some kind of queuing work out of the box.