How much memory does Magento use?

So, I was teaching the Magento Performance for System Administrators class the other day.  I’ve delivered the class several times but this is the first time that somebody asked me “How much memory does Magento actually use?”  Now, I know what you’re supposed to set it at, but I’ve never measured actual usage.  So I gave some bullcrap answer about how it really depends on a bunch of things and that I really shouldn’t give a precise answer.  But the individual persisted and I was forced to put my tail between my legs and admit that I didn’t know.

So I promised that I would take a look and here are my results.  I’m sure someone will get their panties in a wad about how it doesn’t do this, that, and the other thing. The purpose of these tests were to give a baseline and that’s it. I expect that an actual catalog implementation will have consumption values higher than what is seen here.

The way I measured the memory usage was that I added some code to Mage_Core_Model_App::dispatchEvent() and took memory measurements each time an event was triggered.  I called both memory_get_usage() and memory_get_usage(true).  I also injected a marker at controller_action_predispatch, core_block_abstract_prepare_layout_before,  core_block_abstract_to_html_before, resource_get_tablename, and controller_front_send_response_after.  The reason for that is so that I could get visual cues as to which part of the execution (controller, layout prepare, layout render) was responsible for each.

The Y axis is memory used in MB.  I divided the memory_get_usage() values /1024/1024 to get the the MB used.  Additionally, because the events were used as the sampling point the X axis represents the event count, not elapsed time as your brain might think.

The data that I used was the Magento sample data.  While it is unrealistic to have such a small catalog in Magento it is the only consistent data that is really available.

memory-home-page

The first mark is the start of the request.  The second is controller_action_predispatch.  The second is core_block_abstract_prepare_layout_before.  The third is core_block_abstract_to_html_before.

memory-category-no-images

At this point I had realized that I had not copied the sample media over.  So I re-ran the request with the media.

memory-category-images

There were only 5 products in this category so I wanted to run a page that had pagination.

memory-category-28-9

memory-category-28-30

Obviously there were some images being re-sized here. But actual usage, while higher, was not overly significant.

Then, of course, there is the question of product pages

memory-simple-product-page

memory-configurable-product

There was no real difference between any of the other product types.

Adding to cart was relatively benign.

memory-add-to-cart

As was displaying the cart ( a 302 Found done after adding to cart).

memory-display-cart

While it is a little difficult to read, here is a chart that shows several of these charts in comparison.

memory-all-items

A couple of takeaways from this

  1. Image processing sucks up memory and could allocate more than memory_limit
  2. Layout generation generally uses the most memory.
  3. From a memory perspective, the product type does not seem to do much to memory consumption
  4. From a memory perspective, the number of items in a category does not seem to have much impact in memory consumption
  5. If memory is an issue, layout XML optimizations might be a valid place to look to reduce usage

However, it bears mentioning

  1. This test did not test very large collections
  2. This test did not test very complicated layouts
  3. This test did not test catalog or shopping cart rules
  4. This test did not test performance
  5. And a bunch of other things.

What is Apdex?

Ever since I started using New Relic I’ve been seeing a number for Apdex.  Given that whenever I see a floating point number I presume that the calculation will be too complex for me to understand I just presumed that it was some kind of mystical number voodoo.

Turns out that it is not.  It’s actually really simple.  First of all, New Relic didn’t come up with it at all.  There is an apdex.org website created by a consortium of different companies.  New Relic just did what any good tech company does and used it to their advantage.

In Excel spreadsheet formulas calculating the Apdex is done by taking an arbitrary number that represents your SLA goal, or some number below.  That is your baseline.  If the page response time is below that number a user is considered “Satisfied”.  If the response time is over that number the user is considered “Tolerating”.  If the response time is over that number by a factor of four they are considered “Frustrated”.  In other words if your baseline is 500ms any user below that is satisified, above, tolerating, above baseline * 4, frustrated.

The purpose of this is to get rid of raw response time measurements as the goal.  It, to some degree, gets rid of the 95% rule.

To calculate an Apdex score create an Excel spreadsheet and have the first column be your Satisfied count, the second your Tolerating, the third your Frustrated and apply this formula to it: =(A2 + (B2 / 2))/SUM(A2:C2).  That is the Apdex score.  It is the number of satisfied users plus the number of tolerating users divided by two divided by the sum of all three.

Here.  Let me show you what that looks like.

Satisfied Tolerating Frustrated Apdex
95 12 3 0.918182
76 10 20 0.764151
60 20 50 0.538462
20 50 50 0.375
0 0 100 0

The score is on a scale of 1 (all users satisfied) to zero (all users frustrated).

What is the time scale?  Whatever you choose.  Your Apdex score is calculated based off of whatever time frame you have specified.  Personally, I think a rolling Apdex is a good idea.  But I didn’t really get a proper view of the number until I took these numbers and put them into a multi-axis chart.  The bars are the total count for requests in each of the different categorizations and the line is the Apdex score.

apdex

 

Seeing that corresponding with the raw values helped me to understand what I was looking at for all these months.

Hash value sizes

For giggles, here are examples of hashes for the SHA1, SHA256 and SHA512 hashing mechanisms.

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
echo hash_hmac(
  'sha1',
  openssl_random_pseudo_bytes(32),
  openssl_random_pseudo_bytes(32)
) . "\n";
 
echo hash_hmac(
  'sha256',
  openssl_random_pseudo_bytes(32),
  openssl_random_pseudo_bytes(32)
) . "\n";
 
echo hash_hmac(
  'sha512',
  openssl_random_pseudo_bytes(32),
   openssl_random_pseudo_bytes(32)
) . "\n";

Output

1
2
3
00666100c04543601c9de450b061b4bbc5538c50
5762b94cd40d3e62c7b343df1ca3511343dc00fd99a4f1ee64988bf523c13b8a
49cae01474cfd4adfa21e94d35dd93a1f808dff4538042d5140fc661773bc8d0019311ee3dcb7ed8e2a27b021ae47c006f9a477fb768f60256276cc99e8c4bd0

Have a good weekend.

More – The file system is slow

A while back I wrote one post on how the overhead of logging was so minimal that the performance impact was well worth the benefits of proper logging.  I also wrote another blog post a while back about how deploying your application in tmpfs or on a RAM drive basically buy you nothing.  I had a conversation the other day by a person I respect (I respect any PHP developer who knows how to use strace) about the cost of file IO.  My assertion has been, and has been for a long time, that file IO is not the boogeyman that it is claimed to be.

So I decided to test a cross between those two posts.  What is the performance cost of writing 1,000,000 log-sized entries onto a physical file system compared to a RAM drive.  As an added bonus I also wanted to show the difference between an open/write/close repeated compared to holding open a file handle and writing the log entries because I think that there is something worth learning there.

The first thing I needed to do was create my RAM drive.  My first test run ran out of disk space so I had to reboot the machine with the kernel parameter ramdisk_size=512000.  This allowed my RAM drive to be up to 512M (or thereabouts).  Then I created my RAM drive.

1
2
3
mke2fs -m 0 /dev/ram0
mkdir -p /ramdrive
mount /dev/ram1 /ramdrive

The code I used to test was the following PHP code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
$physicalLog = '/home/kschroeder/file.log';
$ramLog = '/ramdrive/file.log';
$iterations = 1000000;
$message = 'The Quick Brown Fox Jumped Over The Lazy, oh who really cares, anyway?';
 
$time = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
  file_put_contents($physicalLog, $message, FILE_APPEND);
}
 
echo sprintf("Physical: %s\n", (microtime(true) - $time));
 
$time = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
  file_put_contents($ramLog, $message, FILE_APPEND);
}
 
echo sprintf("RAM: %s\n", (microtime(true) - $time));
 
unlink($physicalLog);
unlink($ramLog);
 
$time = microtime(true);
$fh = fopen($physicalLog, 'w');
for ($i = 0; $i < $iterations; $i++) {
  fwrite($fh, $message);
}
fclose($fh);
 
echo sprintf("Physical (open fh): %s\n", (microtime(true) - $time));
 
$time = microtime(true);
$fh = fopen($ramLog, 'w');
for ($i = 0; $i < $iterations; $i++) {
  fwrite($fh, $message);
}
fclose($fh);
 
echo sprintf("RAM (open fh): %s\n", (microtime(true) - $time));

This code tests each of the four scenarios.  file_put_contents to physical, file_put_contents to RAM, fwrite to physical and fwrite to RAM.  The test was run three times.  The test is a measure of the performance of a really crappy file system compared to RAM.

physical-ram

Y Axis is the number of seconds that it took to write 1,000,000 log entries.  As we can see from the chart the RAM drive provided us some benefit, it was about 12% faster.  But the interesting part is the last two.  The physical write to the file system with an open file handle is about 2.5 times faster than writing to the RAM drive.  The RAM drive, again, outperformed the physical disks by 42%.

But if you compare between the two sets of tests you will notice something interesting.  While the performance difference in the latter test is 42% faster, the wall-clock time is almost the same.  2.1 seconds for file_put_contents, 1.6 for fwrite.  So it would seem that the physical overhead of using the disks, per 1,000,00 log file writes is 0.00000055 seconds.

Now, I KNOW my disks are not that fast, nor is the VM that I ran the tests on.  It is probably due to write caching.  But that is largely irrelevant.  My assertion is that the file system (not necessarily just the disk) is not your enemy.

This is an important distinction.  I ran two dd commands, one to the disk with fdatasync set and one to the RAM drive with fdatasync set.  The results are not surprising. 55MB per second for the disk, 290MB per second to the RAM drive.  But that is not the point.  The operating system, Linux in this case, does a LOT of things to make working with the physical layer as efficient as possible.  Therefore, things like logging or doing other operations on the file system are not necessarily a bad thing because the actual overhead involved is minimal compared with your application logic.

Please feel free to do similar tests and post results.  I would love to see data that contradicts me.  That would make this a much more interesting topic of conversation.  :-)

What SSL $_SERVER variables are available in PHP

I found myself wondering what HTTPS variables were available in the $_SERVER variable today and didn’t find a specific list (and didn’t have mod_ssl installed).  So as a public service, here is what my server says.

array(58) {
["HTTPS"]=>
string(2) “on”
["SSL_VERSION_INTERFACE"]=>
string(13) “mod_ssl/2.2.3″
["SSL_VERSION_LIBRARY"]=>
string(25) “OpenSSL/0.9.8e-fips-rhel5″
["SSL_PROTOCOL"]=>
string(5) “TLSv1″
["SSL_SECURE_RENEG"]=>
string(4) “true”
["SSL_COMPRESS_METHOD"]=>
string(4) “NULL”
["SSL_CIPHER"]=>
string(18) “DHE-RSA-AES256-SHA”
["SSL_CIPHER_EXPORT"]=>
string(5) “false”
["SSL_CIPHER_USEKEYSIZE"]=>
string(3) “256″
["SSL_CIPHER_ALGKEYSIZE"]=>
string(3) “256″
["SSL_CLIENT_VERIFY"]=>
string(4) “NONE”
["SSL_SERVER_M_VERSION"]=>
string(1) “3″
["SSL_SERVER_M_SERIAL"]=>
string(4) “6B5B”
["SSL_SERVER_V_START"]=>
string(24) “Aug 30 13:53:57 2013 GMT”
["SSL_SERVER_V_END"]=>
string(24) “Aug 30 13:53:57 2014 GMT”
["SSL_SERVER_S_DN"]=>
string(139) “/C=–/ST=SomeState/L=SomeCity/O=SomeOrganization/OU=SomeOrganizationalUnit/CN=localhost.localdomain/emailAddress=[email protected]
["SSL_SERVER_S_DN_C"]=>
string(2) “–”
["SSL_SERVER_S_DN_ST"]=>
string(9) “SomeState”
["SSL_SERVER_S_DN_L"]=>
string(8) “SomeCity”
["SSL_SERVER_S_DN_O"]=>
string(16) “SomeOrganization”
["SSL_SERVER_S_DN_OU"]=>
string(22) “SomeOrganizationalUnit”
["SSL_SERVER_S_DN_CN"]=>
string(21) “localhost.localdomain”
["SSL_SERVER_S_DN_Email"]=>
string(26) “[email protected]
["SSL_SERVER_I_DN"]=>
string(139) “/C=–/ST=SomeState/L=SomeCity/O=SomeOrganization/OU=SomeOrganizationalUnit/CN=localhost.localdomain/emailAddress=[email protected]

["SSL_SERVER_I_DN_C"]=>
string(2) “–”
["SSL_SERVER_I_DN_ST"]=>
string(9) “SomeState”
["SSL_SERVER_I_DN_L"]=>
string(8) “SomeCity”
["SSL_SERVER_I_DN_O"]=>
string(16) “SomeOrganization”
["SSL_SERVER_I_DN_OU"]=>
string(22) “SomeOrganizationalUnit”
["SSL_SERVER_I_DN_CN"]=>
string(21) “localhost.localdomain”
["SSL_SERVER_I_DN_Email"]=>
string(26) “[email protected]
["SSL_SERVER_A_KEY"]=>
string(13) “rsaEncryption”
["SSL_SERVER_A_SIG"]=>
string(21) “sha1WithRSAEncryption”
["SSL_SESSION_ID"]=>
string(64) “BE411F57BA97B3C7D61FC07B0DA965B99BF448081CA8C936C2BDE0C320712F3E”
["HTTP_TE"]=>
string(18) “deflate,gzip;q=0.3″
["HTTP_CONNECTION"]=>
string(9) “TE, close”
["HTTP_HOST"]=>
string(9) “localhost”
["HTTP_USER_AGENT"]=>
string(16) “lwp-request/2.07″
["PATH"]=>
string(29) “/sbin:/usr/sbin:/bin:/usr/bin”
["SERVER_SIGNATURE"]=>
string(70) “<address>Apache/2.2.3 (CentOS) Server at localhost Port 443</address>

["SERVER_SOFTWARE"]=>
string(21) “Apache/2.2.3 (CentOS)”
["SERVER_NAME"]=>
string(9) “localhost”
["SERVER_ADDR"]=>
string(9) “127.0.0.1″
["SERVER_PORT"]=>
string(3) “443″
["REMOTE_ADDR"]=>
string(9) “127.0.0.1″
["DOCUMENT_ROOT"]=>
string(13) “/var/www/html”
["SERVER_ADMIN"]=>
string(14) “root@localhost”
["SCRIPT_FILENAME"]=>
string(23) “/var/www/html/index.php”
["REMOTE_PORT"]=>
string(5) “41195″
["GATEWAY_INTERFACE"]=>
string(7) “CGI/1.1″
["SERVER_PROTOCOL"]=>
string(8) “HTTP/1.1″
["REQUEST_METHOD"]=>
string(3) “GET”

["QUERY_STRING"]=>
string(0) “”
["REQUEST_URI"]=>
string(1) “/”
["SCRIPT_NAME"]=>
string(10) “/index.php”
["PHP_SELF"]=>
string(10) “/index.php”
["REQUEST_TIME_FLOAT"]=>
float(1377871511.902)
["REQUEST_TIME"]=>
int(1377871511)
}

How much does logging affect performance?

So, I was having a discussion with a person I respect about logging and they noted that often logging poses a prohibitive cost from a performance perspective.  This seemed a little odd to me and so I decided to run a quick series of benchmarks on my own system.  Following is the code I used.

require_once 'Zend/Loader/Autoloader.php';
require_once 'Zend/Loader.php';
Zend_Loader_Autoloader::getInstance();
$levels = array(
  Zend_Log::EMERG =&gt; 10000,
  Zend_Log::ALERT =&gt; 10000,
  Zend_Log::CRIT =&gt; 10000,
  Zend_Log::ERR =&gt; 10000,
  Zend_Log::WARN =&gt; 10000,
  Zend_Log::NOTICE =&gt; 10000,
  Zend_Log::INFO =&gt; 10000,
  Zend_Log::DEBUG =&gt; 10000
);
echo '&lt;table&gt;';
 
foreach (array_keys($levels) as $priority) {
@unlink('/tmp/log');
$format = '%timestamp% %priorityName% (%priority%): %message%' . PHP_EOL;
$formatter = new Zend_Log_Formatter_Simple($format);
$writer = new Zend_Log_Writer_Stream('/tmp/log');
$writer-&gt;addFilter(new Zend_Log_Filter_Priority($priority));
$writer-&gt;setFormatter($formatter);
$logger = new Zend_Log($writer);
 
$startTime = microtime(true);
 
foreach ($levels as $level =&gt; $count) {
  for ($i = 0; $i &lt; $count; $i++) {
    $logger-&gt;log(
      'Warning: include(Redis.php): failed to open stream: No such file or directory in /var/www/ee1.13/release/lib/Varien/Autoload.php on line 93',
      $level
    );
  }
 
  $endTime = microtime(true);
 
  echo sprintf("&gt;tr&gt;&gt;td&gt;%d&gt;/td&gt;&gt;td&gt;%f&gt;/td&gt;&gt;/tr&gt;\n", $priority, ($endTime - $startTime));
 
}
 
echo '&lt;table&gt;';

What this code does is iterate over each of the different levels of logging 10k times with different levels of priority filtering for a logging message.  So, basically, it will write 80,000 log entries with each iteration doing a different level of logging to see the performance overhead.

logging-overhead-total

You can see the total overhead for each level of logging.  This represents the total elapsed time to log 80,000 log events at the various levels of logging priority.

But nobody is logging 80,000 events (hopefully).  So what does this look like for a realistic approach?  Following is the breakdown based off of the elapsed time for 100 log entries for an individual request.

logging-overhead-x100

 

So, logging seems to cost you a sum total of 1/1000ths of a second per request (assuming 100 log entries).

So this begs the question…

 

3v9tuj

Google finally acknowledges that PHP exists

I read an article today about how PHP is exploding on Google App Engine.  How is it that one of the most despised programming languages in the word is running (as Google claims) up to 75% of the web?  Many nay-sayers will say “oh it’s just WordPress” or “oh, it’s just PHPbb”.  But in doing that they are completely missing the point.

The proper response to this is not trying to dismiss it, but asking why it is that PHP-based applications just seem to always be the ones at the top of the list?  Some may answer that PHP’s ubiquity gives it a default advantage but that still dodges the question.  WHY is PHP running 75% of the web?  Hosters don’t just say “hey, let’s throw X programming language on our web servers!”

It comes down to demand.  There is a lot of demand for PHP.  That’s why hosters put it on their web servers.

In the article Venture Beat says “PHP is moving to the Enterprise very quickly”.  This is not true.  PHP IS in the enterprise and has been for a long time.  People just either don’t know it or refused to admit it.

But, again, we have not answered the question “why”.

Many of the people who are nay-sayers of PHP are the people who have studied.  And in studying they have learned that programming languages need to do certain things in certain ways.  And of these things, PHP does none of them (ok, so this is hyperbole, to a point).  This is a major reason why PHP has such a bad reputation among cutting edge developers, CS grads and trend-setters.

But what it also does is expose the vacuousness of the ivory tower.  The ivory tower deals with validating the theoretical, testing the impractical from within a educational framework or methodology.  People will often say that this approach is a more pure way of approaching the problem rather than the dirty commercially-driven interests of the private world.  To which I say “big frigging deal!”.  Don’t get me wrong, I think that study is good.  Though I didn’t go to university I am under a continuous education program called “reading”, for the theoretical, and “practice” for the practical.  Study is good.  But study is not an end.  Real life occurs and it is not clean, pure and methodological.  What a bore if it were!

But this is real life.  PHP may not solve the problem in the purest of ways; in fact it will probably be pretty dirty.  But that is why it succeeds; it mirrors real life.  In real life you have a job to get done.  And if it takes more resources to do it properly, then the improper method will be used.  Commerce and business, at their most distilled, is simply an efficient means of the utilization and transfer of resources.  Those resources could be money, time, knowledge, or any combination of those or other things.  It is the utilization and transfer of things that have “value”.  And when you have two things that both have worth, purity and practicality, a judgment call needs to be made on which is more valuable.

PHP is valuable not because WordPress is built on it, but because PHP solved the problem WordPress was solving, easier.  In other words, it solved the problem by consuming fewer resources.

Using PHP, I think, is also one of the smarter moves by the company I work for, Magento.  For those who don’t know, Magento is the most popular ecommerce platform in the word and it is written on PHP.  Magento is probably the most complicated application platform available for PHP and it’s STILL easier to build for than most Java web applications with a wider range of programming skills that can be utilized.  In other words, it enables commerce by utilizing fewer resources than competing solutions, but still provides stunning extensibility.

An organization should require as few “top-end” developers for a solution implementation as possible.  When it comes to Magento, WordPress, Joomla, WordPress, etc. you do not require a CS degree to do it.  Rather than being a failure, that is a monumental success!  Scarcity increases cost and so if you can decrease scarcity (developer skill required) you can decrease cost.  And the real world is about doing as much as possible for as little as possible.

So how is it that Google missed PHP?  That is a question that I cannot answer since I don’t work for Google.  But I would surmise that it has something to do with the fact that Google didn’t WANT it there.  For all their posturing about being “data driven” they completely missed PHP despite the fact that they have access to the best web data on the planet.  Therefore I must presume that it’s another iteration of The Invisible Postman; also called “having blinders on”.  Node, Ruby, Python; all great languages and can do some really cool things that PHP cannot.  But they do not solve the problem of resource scarcity on the same level that PHP does, when it comes to web-based applications.

For software companies that are looking to break into the web there is only one language to start with.  As long as HTTP is the de facto protocol of the web, PHP will be its de facto programming language.  Suck up your pride, build your stuff, and be successful.

 

… and let the trolling commence.

Is prevention the best security practice?

I read a post tweeted by Chris Cornutt today.  The basic gist of the article is that your security is only as strong as your most ethically-challenged developer.  That got me thinking that we spend so much time trying to prevent intrusions when detection might be a better priority.  Some tactics, such as SQL Injection, are useful because they protect not just against intruders but people who tend towards single-quote usage as well.  I would argue that SQL Injection is just as much about inadvertent data entry as it is about security.  Same thing with XSS.

But this also got me thinking about laws.  We tend to (wrongly) view laws as a preventative measure.  The problem is that there are always people who are willing to skirt the law, whatever that law may be.  Sometimes it’s because laws are unjust.  But who is to decide when the perceived unjust-ness of a law is sufficient to permit civil disobedience?  Or the rejection of that law by an individual?

But what if we (getting back to developers) worked under the presumption that our code would be attacked and security would be defeated?  If we presume that our software is vulnerable does it make more sense to lock it down as much as we can, or implement methods to detect, or at least collect, information in a way to make prosecution or recovery easy.  Just like you cannot write a law to prevent all people from wrongdoing you cannot guarantee that your code is 100% secure.  Given that, would it work to take an approach that focused more on detection (and recovery) in front of prevention?

Would our approach be different?

What would it look like?

Would it work?

Would it matter?

It may sound a little silly to ask but consider that banks do something like this when it comes to financial transactions.  Banks use eventual consistency to maintain financial records.  They are not ACID compliant.  It is possible to overdraw your account if you do it in a manner that beats out the eventually consistent implementation they use.  It is the only way to maintain the scale that they require.  The position of the banks is that IF a circumstance occurs where there is a discrepancy in bank records it costs them less to fix the issue than to prevent it in the first place.

Likewise, Amazon allows items to be sold when they aren’t sure about stock (just look at a recent purchase of mine).  Their presumption, presumptively, is that it will cost them more to ensure completely accurate inventory management than to send an apology letter to a waiting customer.  Is there a correlation in software development when it comes to security?

I don’t have any answers ATM, and it may be that any implementation may end up being more costly than prevention (my current thought is that it is).  I’m just thinking out loud and wondering if anyone else has given though to this.

How Magento configuration merges work

A few days ago I wrote a blog post on how configuration works in Magento.  While it was fairly comprehensive it missed out on one very important piece of information; how the configuration file merges.  It is this merging that gives Magento the power it does for being extensible and so configurable.  All over the place you will see code like

1
Mage::getStoreConfig('node/somechild');

You will see this or something similar EVERYWHERE.  The reason why this is powerful is  because it allows for multiple different configuration files to be loaded all at once, merged into one giant configuration array and then access any node defined in any file from any place in your application.  And, perhaps, once you see now it works any mysticism you may have towards Magento configuration may be slightly de-mysticism-fied.

Let’s start with something basic. The Magento base configuration object is based off of Varien_Simplexml_Element, which, itself, extends the PHP SimpleXML class.  The problem is that the Simplexml class does not merge XML files very well (at all).  So the core Magento devs created a mechanism that allows XML files to be merged.  To see how this works let’s take a look at what this looks like on a small example.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$config = new Varien_Simplexml_Element(
'<config>
  <child1>text</child1>
</config>'
);
 
$config2 = new Varien_Simplexml_Element(
'<config>
  <child2>text</child2>
</config>'
);
 
$config->extend($config2);
 
echo $config->asNiceXml();

When we run this code we get the output

1
2
3
4
<config>
   <child1>text</child1>
   <child2>text</child2>
</config>

See that? By calling the extend() method on the $config object, the two objects were both appended to the base <config> node.

How about multiple different nodes?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$config = new Varien_Simplexml_Element(
'<config>
  <base1>
    <child1>text</child1>
  </base1>
</config>'
);
 
$config2 = new Varien_Simplexml_Element(
'<config>
  <base2>
    <child2>text</child2>
  </base2>
</config>'
);
 
$config->extend($config2);
 
echo $config->asNiceXml();

This code renders

1
2
3
4
5
6
7
8
<config>
   <base1>
      <child1>text</child1>
   </base1>
   <base2>
      <child2>text</child2>
   </base2>
</config>

How this is really cool is when you start dealing with multiple nested nodes.

1
2
3
4
5
6
7
8
9
10
11
$config = new Varien_Simplexml_Element(
'<config><base1><child1>text1</child1></base1></config>'
);
 
$config2 = new Varien_Simplexml_Element(
'<config><base1><child2>text2</child2></base1></config>'
);
 
$config->extend($config2);
 
echo $config->asNiceXml();

This renders

1
2
3
4
5
6
<config>
   <base1>
      <child1>text1</child1>
      <child2>text2</child2>
   </base1>
</config>

Then if you wanted to retrieve all of the children you could use the Mage_Core_Model_Config_Base class (which you would normally get by Mage::getConfig() or, preferably, Mage::getStoreConfig()).

1
2
3
4
5
6
7
$base = new Mage_Core_Model_Config_Base($config);
 
$node = $base->getNode('base1');
 
foreach ($node->children() as $name => $child) {
    echo sprintf("%s: %s\n", $name, $child);
}

So why is this cool? Because it allows you to hook your module into multiple other modules or system components without modifying the config files of said components. Could you have simpler syntax to register an observer? Sure. But it would require either a less consistent interface or explicitly loading config files any time a module needs to be used.

So when you are writing a node under global/events/controller_action_predispatch/observers/my_custom_observer what you are doing is working within the merging system in way that puts your information in a predictable location. For events, they are looking for children under global/events/{$event}/observers. By having your name unique there you are figuratively making it part of an associative array where the key is only used to maintain uniqueness. The merging system does not create arrays of like-named nodes; the last node with a given name will always win. That is why, if you are injecting your data into a previously defined node, such as events, you need to give it a unique node name.

Hopefully these little examples can help you make sense of how you can work with the config system in Magento.

Magento Performance on PHP 5.3, 5.4 and 5.5RC3

<update>Magento 1 now supports PHP 5.4</update>

I woke up this morning with a burning desire to do load tests.  Actually, I woke up with a burning desire to not do the same thing I did yesterday and needed a slight change, so I decided to do a load test.  I wanted to see what the performance difference for Magento was between PHP versions 5.3, 5.4 and 5.5RC3.

As you may know, Magento only supports 5.3 and 5.2.  Personally, I would not even be thinking about running any kind of remotely serious ECommerce site on PHP 5.2.  But with work on PHP 5.5 pushing towards GA it means that some time soon support for 5.3 willing be going away.  This might be a bit of a problem for software that isn’t supported on 5.4+.  One of the reasons for this is that there are bugs in PHP that are holding back support.  I don’t know what they all are but this one regarding XML processing is one.  There might be others, but that’s one that I know of.

But enough about bugs, what about performance.  For PHP 5.3 and 5.4 I used Zend Server with Optimizer+.  This is partially because I use Zend Server on my local machine and also because it would give a good comparison with PHP 5.5 since Optimizer+ has been open sourced and will be included.

The configure settings I used for PHP 5.5 is this

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
./configure  
  --with-config-file-path=/etc/php-5.5
  --with-config-file-scan-dir=/etc/php-5.5/php.d 
  --disable-debug 
  --enable-inline-optimization 
  --disable-all 
  --enable-libxml 
  --enable-session 
  --enable-xml 
  --enable-hash 
  --with-pear 
  --with-layout=GNU 
  --enable-filter 
  --with-pcre-regex 
  --with-zlib 
  --enable-simplexml 
  --enable-dom 
  --with-openssl 
  --enable-pdo 
  --with-pdo-sqlite 
  --with-readline 
  --with-iconv 
  --with-sqlite3 
  --disable-phar 
  --enable-xmlwriter 
  --enable-xmlreader 
  --enable-mysqlnd 
  --enable-json 
  --with-gd 
  --enable-soap 
  --with-curl 
  --with-apxs2 
  --enable-ctype 
  --with-pdo-mysql 
  --prefix=/opt/php-5.5 
  --enable-opcache

My 5.5 opcode cache settings were this

1
2
3
4
5
6
7
zend_extension=opcache.so
opcache.memory_consumption=128
opcache.interned_strings_buffer=8
opcache.max_accelerated_files=4000
opcache.revalidate_freq=60
opcache.fast_shutdown=1
opcache.enable_cli=1

I make no guarantees that these are the optimal settings.

I used Siege as the load test running against a single URL that had some relatively complex logic in it.  I was using Magento Enterprise 1.13, but had full page caching turned off.  I ran with only 4 concurrent sessions since that’s how many CPU’s are on the machine I was testing.  This was not a test of server capacity, but of raw performance.  I suppose that a single concurrent session would have been better, but se la vis.  I didn’t see a drop in response time until I went to 5 concurrent sessions anyway so I doubt this was an issue.

The first chart is the throughput per second, so higher is better.

magento-php-line

As you can see, PHP 5.4 and 5.5 faired better than 5.3.  5.5 faired just a little better than 5.4.

This next chart shows the slowest, fastest and average times for each.  Lower is better.

magento-php-bar

The slowest time is not really all that interesting since every load test will have a few hickups.  I suspect that if I took the 95th percentile that it would look pretty close to the average.  But overall 5.4 and 5.5RC3 did better in all the data points that matter.

Now to get those bugs fixed so Magento can support those two…

Web Analytics