indeyets/appserver-in-php · GitHub

indeyets/appserver-in-php · GitHub.

Lukas Smith responded on Twitter to a posting of mine I made on this blog about the possibility of having a precompiled bootstrap in PHP that would allow large sections of bootstrapping code to be bypassed, including autoloading, class definitions and certain objects.

His tweet linked to the link above which solves the problem, but in a different way.  It requires a middle layer that needs to be running to process these requests.  I believe that I have thought about doing something similar using ESI locally  which did not work out so well since Varnish processed ESI requests synchronously.

I have not played with this library and I would prefer a solution that is closer to the engine and doesn’t add another layer.  That said, this looks like an interesting project that seems worth taking a look at.

Would this be a dumb idea for PHP core?

So, I have been playing around with an idea in my head for a while, a few years now.  It really came along as we started seeing more and more PHP applications rely on bootstrapping.  For me it was as I saw more ZF applications becoming more and more complicated.  At the time I was consulting and I would see significant server resources consumed by bootstrapping the apps.  Loading config files, loading dependent classes, setting up dependencies, initializing ACL’s, and the list goes on and on.

One of the ways to negate the effect would be to cache a bootstrap object and then pull that object from the cache at the start of the request.  However, the problem is that unserialization can actually end up taking more time than the bootstrap process itself.

So, I was wondering.  Perhaps there would be a way to provide a cacheable state of the Zend Engine.

Perhaps it would look something like this.

init_engine_state(Callback $init);

What this would do is call the callback and after the callback returns, but before init_engine_state() returns, the engine would take a snapshot of everything except the superglobals.  This would include classes, objects and opcodes.  The next time a request comes in the callback would not be executed, but the state of the engine would be set to the state that it was in during the previous run of the callback.

Internally, what would happen before init_engine_state() returns is that all of the pertinent hash tables would be copied to a different memory block for the initial request.  Then the next time a request comes in, memory for the copied hashtables would overwrite the existing ones.  As noted earlier this could also include opcodes for files which would mean that the reams of autoloading function calls that typically happen could be completely bypassed.

I have seen legitimate cases where bootstrapping is taking 50% of the wallclock time.  Perhaps by providing an engine hook like this PHP performance could be dramatically improved.

 

… or maybe I’m just speaking out of my buttocks.

 

Please comment.  (On the idea, not my buttocks)

 

[UPDATE]

Here is an additional snippet of code that might help explain what I’m thinking of

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<?php
 
$app = init_engine_state(function() {
    // This code would only be executed once
    require_once 'lib/code/Application.php';
    require_once 'lib/code/Autolaoder.php';
 
    $app = new Application();
    $app->createAutoloader();
    $app->bootstrap();
    return $app;
    /* 
    * At this point a snapshot would be taken of
    * - opcodes
    * - class definitions
    * - objects
    * 
    */
});
// No autoloading would need to be done
$adapter = Application::getAdapter();
$app->dispatch();

10 “what to do’s when setting up Magento” and file inclusion attacks

Found this list of things “to do” on Twitter this morning.  I went over the list and saw that there was one item that was missing, which I feel is very important to do.  I saw it in another post on Local File Inclusion for which it seems like there was a local file inclusion vulnerability in Joomla (I think.  I didn’t read that far into it).

The thing on the list that was missing was securing your local file system when installing Magento.  The default installation asks for certain directories to be writable.  This is necessary for certain things.  But what we lazy installers sometimes do is just make the whole thing writable to make installation easier.  And while I am not aware of any specific Magento vulnerability like the one noted it is definitely a good practice to deny write access to all but the necessary files.  This is done by changing the permission settings on the files but changing the file ownership so that the web server user is unable to change the permissions to something more permissive.  And for the files that you need write access to you should deny access via either .htaccess or <directory> settings in httpd.conf so they can’t be called remotely.

So, the 11th thing to do is to secure your file system by denying write access to the server user that is running your Magento code.

Starting with Magento on Monday

Having spent several years as a consultant with Zend, working with highly scalable applications, developing many of Zend’s training courses, building mobile applications and doing my best to be a generally good guy I am making the move to Magento.  More specifically, MagentoU.  Magento has, for several years, been a company that I have been interested in.  Their product is technically quite interesting and very powerful, but my interest has been in watching the company’s meteoric rise.

This rise is because they did a lot of things right.  From the start the system was designed to be expandable.  It was designed to be easily built upon without having to rewrite portions of the core code.  This allowed a community of developers and companies to spring up around the software which, in turn, generates a tremendous amount of activity.  So Varien, now Magento, built not just a software package, but provided the groundwork for an ecosystem to sprout and grow.

And grow it did.

But now, a new chapter starts in Magento’s life.

Me.

hehe.  More likely it’s the other way around :-)

I will be a Technical Manager for Education and Consulting.  Sounds boring, right?  Not at all.  It is actually a very wide ranging position.  I will be teaching, developing courseware, driving self-education in the Magento community (blogging, videos, forums, etc.), provide support for training partners, execute consulting services, working with customers and partners and a whole bunch of other things.

This is an opportunity I am quite excited to be part of and I am quite thankful for having it.

… I’m also hoping that now that I have an “8-5″ job, which I am well aware it will not be, that I will be able to carve out some time for writing a couple of tunes again.  When you’re trying to start your own company you tend to force those kinds of things to the backburner.  I’m hoping that I will be able to move it forward a little bit.  (and the world cheers!).

Generating secure cross site request forgery tokens (csrf)

I don’t talk much about security.  This is mostly because it’s such a moving target.  I’m also horrified that I might give bad advice and someone will be hacked because of me.

But in researching the second edition for the IBM i Programmer’s Guide to PHP Jeff and I decided to include a chapter on security since we really didn’t talk much about it in the first edition.  I’m talking about cross site request forgeries right now and I wanted to make sure that what I was going to suggest would not break the internet in some way.

I did some Google searching to see what other people were recommending.  Almost all of the pages I found for generating a CSRF token use code like this

1
$token = md5(uniqid(rand(), true));

On the pages for rand() and uniqid(), as well as looking at the C code, they specifically state that these functions should not be used for generating secure tokens.  They tend to generate predictable values.  And the documentation for md5() states that it should not be used for password hashing.  Granted we’re not hashing passwords when creating a CSRF token, but with the tooling available shouldn’t we be using functions that are more cryptographically secure?  Like this?

1
2
3
4
5
$token = hash_hmac(
    'sha512',
    openssl_random_pseudo_bytes(32),
    openssl_random_pseudo_bytes(16)
);

Am I missing something or wouldn’t something like this be a whole lot better?

[UPDATE]

padraicb validated my thought on the matter.  The goal here is the random value.  As such the hashing using hash_hmac() does not buy you a whole lot extra.  The number of possible values in a 32 byte random string is 1.1579208923731619542357098500869e+77.  That alone would seem to be enough for a CSRF prevention token.  mt_rand() returns an integer which gives you  about 4 billion possible numbers.  While that will probably protect you, the other value will offer you better protection.  There’s no sense in gambling with a smaller value if you have the ability to generate a larger value with virtually no additional cost.

So it would seem that, for generating a proper token the code that you would really need is this

1
$token = base64_encode( openssl_random_pseudo_bytes(32));

The only reason for the base64_encode() call is to make sure that the value provided will not break your HTML layout.

Synapses, Volume 2

Hey, it’s a second volume and I haven’t gotten bored yet!  I had intended to post one yesterday but I was painting.

5 Reasons Web Speed Is in the Eye of the Beholder

I would actually disagree with all 5 of the reasons.  Slow web apps suck, there are no two ways about it.  I don’t care if I’m doing a hyper complex OLAP query if the site is not responsive.

But I totally agree with the title.  I have been saying this for years that performance is irrelevant.  It is the “perception of performance” that is important.  Does your app take 10 seconds to load?  It doesn’t really matter as long as the user is able to interact with it quickly.  As long as the user’s perception of the apps speed is that it is fast, it does not matter how long it takes to render… from the user’s perspective.  It probably matters to Google Page Rank, though.

Anyway, I posted this because there’s a chart that shows various websites and their render times with, what is most important, the First Paint time.  The first paint time would be the first time that someone can see or interact with content from the app.  THAT is the most important number.

Breaking the Build is Not a Crime

I liked this article not so much because I like hearing stories about what people do when they “break the build” but things that you can do so that your mainline code doesn’t get foobarred in the first place.

Performance of Graph vs. Relational Databases

Another day, another X vs Relational Databases post.  I’m just waiting for a “Why NoSQL, Big Table, Columnar, Distributed, Sharded, Memory Resident databases are faster than SQLite” post.  But I posted this article because graph databases are a good solution to looking for connected, not just related, data.  Social is “the now” and so connected data often is as important as related data and relation databases just don’t do this well.  The performance difference documented in this post makes me want to spend some time actually learning a graph DB.

Optimization, suboptimization, and staggering toward education improvement

John Carmack posted this on Twitter.  The author is talking about something Bill Gates said about improving education.  He said something along the lines of having to measure things in order to figure out what to do.  While Pournelle didn’t disagree, but noted that while measurement is important knowing what to measure and what measurements you determined to be important can mean vastly different things.  He gave the example of WWII Britain having the problem of their supply vessels being sunk by German subs.  So they used a tactic whereby they would maximize the number of subs sunk.  The problem was that it was done at the cost of British ships being sunk as well.  The problem that Britain had was not that they needed to sink German subs, but that they needed to feed their supply lines.  They changed tactics and while the number of sunk subs went down the tonnage of supplies getting through went up.

Focus is hugely important for any organization.  Focus on what is important by way of the goals of the organization first.

That’s enough for today.

<edit>Adding another

DevTools: Visually Re-engineering CSS For Faster Paint Times

Ran into this post after publishing the post.  But given that it pertains to article #1 I added it hear.  Very interesting info on diagnosing poor render times for browsers.

Synapses, Volume 1

That word is a combination of synopsis and synapse.  Technically it’s the plural of synapse but in my case it’s intended to be a combination of the two words since I’m combining short commentary (synopsis) with witty commentary (synapse, or brain activity).

The reason I am doing this is because I do a lot of reading during the day.  A lot.  Much of it is technical but there is much that is not.  As I am shutting down my company and pursuing gainful employment I figured that I might benefit humanity somehow by providing a quick synopsis of the interesting, and pertinent, things that I’ve read.  It won’t be all of them, and I will omit various pieces that might be interesting and pertinent, but contentious.  The reason for this is that items of contentiousness should be discussed in person, where you have to face your opponent (who can also be your friend), and not on the Internet.  If someone uses the word “hater” to describe your position then it’s likely that they don’t understand your position and arguing with them over the Internet will be fruitless.

I am going to try and do this on a daily basis… until i get bored with it.  :-)

How to Optimize Every Decision in Your Life and Accomplish Nothing

I got this link from Linked in and I found it interesting because I have some distrust over decision making made purely on data.

“Fear of missing out is a paralyzing force”  I have found this quote to be true as I’m going through the search for employment.  I have several very good opportunities in front of me.  I am in a scenario that is quite enviable.  I actually get to choose the best option for me.  But one of my big problems is that I have so many options for which there are so many variables that I was (until yesterday) quite paralyzed to indecision.

“done is better than perfect”  Apparently this is written on Facebook’s walls.  I am the creative type.  One of the handicaps of the creative type is that we are interested in creating, not completing.  When working on a large(ish) project I need to intentionally focus my mind on the end product.  It can be done, but it takes discipline.

HAProxy and Varnish comparison

Doing some work on a system administrator’s course.  Found some useful lists here.  Basically, HAProxy is a load balancer, Varnish is a cache.  That does not mean that they don’t have features in common.  But “focus” is often important when making a determination about which software to use.

Why Michael Dell Really had to Take Dell Private

Let me start off by saying that I am a fan of the stock market.  The stock market has created tremendous wealth and expanded the middle class like nothing else.  Where else can moderately talented people who have moderate means earn a kingly living?

But that said, if you are publicly traded and you seem to have stagnated people get very nervous, very quickly.  Someone smarter than me one told me that it was the institutional investors, not the general trading public, that tend to dump a stock in a hurry.  They are the most impatient.

And I can understand this.  If you have $10 million invested in a company and you expect to make $2 million in two years why wouldn’t you dump that stock if you found an opportunity to make $3 million?  That’s not greed.  Greed is when you forsake things that are truly more important, such as friends, family or people in general, in the name of making money.  Dumping stock in a company for a company with better returns isn’t greed (though it can be).

But it can be really problematic if you need to take a risk that will take longer than a few quarters to implement.  If Michael Dell has some place interesting that he wants to take Dell it would seem to be a good move to buy out the people who would dump the stock on the slightest bad news while he executes his plan.

Or it may be ego, or greed.  We’ll see.

The one thing I can guarantee is that there will be tons of “analysis” on this move which will amount to a bunch of half-assed guesswork.

The Worst CEOs of 2012

I could have made this list.   RIP, App Happy.

WHY DELL GOING PRIVATE IS LESS RISK FOR CUSTOMERS THAN THEIR CURRENT PATH

There is one reason why I posted this link and it is this quote.

“What I often see in organizations struggling with changes are 3 things: 1/ When I ask five different people what the aim of the company is and what their role is in getting there, I’ll get five completely different answers”

This made me think of an important question you need to ask your boss.  ”What is the aim of the company and what is my role in getting there?”  If the answer has more than three points to it you might need to look at how you might re-align or define your job more succinctly.

“Purpose” is incredibly important for an organization to succeed.

 

That’s it for today.

 

The IBM i Programmer’s Guide to PHP… second edition?

Yep!  PHP is still making strides on the IBM i and people are loving it.  But with the world’s premier book for PHP on the i Series developer now several years old it is time to update it.  So Jeff Olen and I have decided to start work on a second edition.  We will be making some changes but we will mostly be adding new content.  Lots of new content.  Here is a list of things that we’ve come up with.

  • Web services/Mobile
    • JSON
    • REST
    • Mobile interfaces and considerations
  • Language features in PHP 5.3, 5.4 and 5.5 (It will probably be a while before 5.5 is available on the i but we still want to cover it, giving you even MORE value for your money)
  • SOLID principles
  • Expand OO
    • Basics
    • Advanced
  • Standards
    • PSR*
    • Autoloading
  • Beginning Test Driven Development
  • New Toolkit
  • Security

I’ve actually started writing one of the chapters this morning, BUT!  If you are an IBM i developer and there are topics that you would like to have covered we would love to hear from you.  You can either comment below or email me at [email protected] (Yes, I’ll share anything with Jeff :-) ).

Magento, ESI, Varnish and performance

I have been doing a little playing with Magento over the past couple of days.  I’ve been helping out Ebay/Magento by delivering some of their performance training over the past few months.  I’m by no means the world’s best Magento person at the moment, but I know the architecture pretty well.

One of the things I’ve wanted to do is play more with Varnish.  There’s lots of hype, and a lot of the hype is true.  It really is as fast as they say it is and worth looking at for a full-page caching solution.

But what about when you “can’t” do full page caching?  Enter Magento (or any ecommerce platform).  Most of the time you pages are fully cacheable.  Right up until you click the “Add To Cart” button.  At that point full page caching doesn’t work and so the default behavior for most platforms is to simply not cache output at that point.

But the problem is that you still have 90% of the page (or more) is still cacheable.  So you are doing full execution on a page that has a bare minimum of actual dynamic content.

This is where ESI comes in.  ESI, or Edge Side Includes, allow servers on the edge of a CDN do full page caching but do dynamic callbacks to your website to fill in certain parts of the page.  You lose a lot of the benefit of having a full page cache in that the many order of magnitude performance improvements you gain with full page caching is reduced to how fast you can get the backend dynamic content.

But who says that ESI only needs to be on the edge?  I decided to take a look at how I might implement ESI, with partially dynamic content in Magento, using Varnish as a processor.

My results are in and they look pretty good, though they are preliminary.  As I write this blog post I’m dealing with some wildly erratic response times from Varnish.  Varnish as a full page cache is consistent.  The ESI pages are consistent.  But put them together and I have response times that vary by two orders of magnitude when I do a benchmark that uses concurrent connections.

But if I do the benchmark with sequential HTTP calls you can definitely see an improvement.

First the results without a full page cache.  This is the way you do it by default now once someone adds something to their shopping cart.

Concurrency Level: 1
Time taken for tests: 1.171843 seconds
Complete requests: 10
Failed requests: 0
Write errors: 0
Total transferred: 81030 bytes
HTML transferred: 76210 bytes
Requests per second: 8.53 [#/sec] (mean)
Time per request: 117.184 [ms] (mean)

Now the results when using Varnish with ESI.

Concurrency Level: 1
Time taken for tests: 0.591032 seconds
Complete requests: 10
Failed requests: 0
Write errors: 0
Total transferred: 86500 bytes
HTML transferred: 81000 bytes
Requests per second: 16.92 [#/sec] (mean)
Time per request: 59.103 [ms] (mean)

That is roughly a 50% improvement with no loss of functionality.

In this example I simply removed the call to the Magento sidebar which renders the shopping cart and replaced it with an ESI tag.

<esi:include src=”http://magento.loc/eschrade/esi/sidebar” />

That routes to a controller where I manually spit out the sidebar contents.

Now, you might be thinking “OK, that’s one HTTP call.  What happens if you have 20 ESI calls?  Won’t your server be overloaded?”  Well, perhaps (though not definitely).  But what you need to remember is that performance and scalability are two different things.  Yes, you could overload that ONE server with 20 ESI calls.  But what if you have 20 servers running behind a load balancer?  Yes, you will be using much more by way of server resources.  But if your response time from each of those backend servers is 50 ms you still only have a total response time of 50ms, give or take a few nanoseconds to render the page.  So while CPU time will be greatly increased across your cluster, wall time will be greatly reduced because the processing is being done asynchronously with Varnish aggregating the results.

This, of course, assumes that Varnish does its ESI processing asynchronously, which I will confirm.  (I’d be surprised if it were synchronous)

[UPDATE]

As noted in the first comment before, it does look like Varnish processes ESI synchronously, meaning one at a time.  So these numbers would seem to hold if you only have one ESI call to make, but they drop off significantly as you add ESI calls.  So I will be looking at different options since the whole premise depends on asynchronous ESI.

Crap.

[/UPDATE]

But like I said, these are preliminary results.  I will do a fuller blog post on the subject when I have the inconsistent performance issue worked out.

 

Looking for some interesting work

So, if you remember from one of my earlier posts I had decided to stop working on mobile applications.  There were a number of things that I learned from the experience, many of which you can read about in that blog post.  But one of the interesting things that I learned was I learned who I am not.

I learned that while I am tremendously entrepreneurial, I am not an Entrepreneur.  Believe me, that was a bitter pill to swallow.  It’s tough to be a risk taker by nature but lacking in either aptitude, connections or some mix of that.  The people who are proper Entrepreneurs are those who can take an idea, get the funding, find the people and bring that idea to market (not necessarily in that order).

I like to build things and I get my satisfaction from having built that “thing”.  Whether it’s writing music, writing software or writing stories, it is the final product from where I most derive my pride.  I can take (and have taken) abstract ideas and either built or communicated them in ways that make it easy for people who are on the other side of the spectrum to understand and work with it.  I have well over 200 posts on this blog and over 60 videos on YouTube that demonstrate this.

Having now “taken 6 months off” it is time for me to start taking the look for employment seriously again.  During that 6 months of “time off” I was working hard building mobile apps and their server-side kin and so I was hardly sitting around drinking Mai Tais.

I’ve done work as a system admin, architect, developer, designer, trainer, training developer, consultant, marketer, evangelist, conference organizer and MC, speaker, author and composer.  Some people are the type who strive to be the best at something.  In other words, finding a niche and becoming the best in that niche.  That is definitely a good way to become successful.  But for myself, I am best at doing things that require me to pull from multiple disciplines.

The kind of work I’m looking for is the antithesis of the job postings in my Deleted Items folder.  I am really not interested in robo-recruiters.  I’m already working with a great one and so I have no interest in cut-and-paste recruitment.  If you are doing cut-and-paste recruitment you can’t afford me.

There aren’t many companies that I am averse to, but my preference will lean towards funded, maturing start ups doing interesting things.  By “interesting”, I mean something for which there is an actual (or potentially actual) market and you aren’t relying on teenager traffic to pretend to make your company valuable.  I truly believe that for most business, even the Internet businesses, value is best determined by whether or not someone is willing to pay for what you’re doing.  A well defined monetization strategy is something I am interested in.  I am happy to talk to larger organizations (and I am doing so) but the regimented environment of a mature business is one that is more difficult for someone such as my self to successfully navigate.

North Dallas is where I call home (for now).  I would consider relocating for the right position, but working remotely with some travel would likely be the best arrangement for me.  I’ve done that for most of my career and it has, for the most part, worked well.

If you are in need of someone who knows PHP better than most, can communicate to techies and suits, can blog, do videos, training and speaking and who can build really good server-side software and the infrastructure to run it, I would love to hear from you.