A few weeks ago Twitter was a storm of activity around the TIOBE index. PHP had dropped two spots having been overtaken by Python and C++. Most of the activity was not from the PHP side.
Having used the TIOBE index myself I was a little surprised to see PHP drop so suddenly. So I decided to take a look and see what was going on behind the scenes by reading on the index definition to see if I could understand what happened.
After reading through the definition I still don’t know what happened, or even why it happened.
The TIOBE index is calculated by looking at the top 6 sites on Alexa that have a search function which says how many hits there are for the query +”<language> programming”. So, for PHP it would be +”php programming”. The results of the search weighted based off of the search engine and calculated as a percentage so that the total of the top 50 programming languages equals 100%. Additionally, each query is given a confidence rating to weed out false positives via a manual process. For example, searching for +”basic programming” could also return a result for “Improve your basic programming skills in Java”. If 10% of the hits in the first hundred pages of the search results are determined to be false positives then the search results will be taken at 90% of their value.
So what was it that caused PHP to tank and Python and C++ to take over? In short; I don’t know. Also, FTR, I consider C++ in the same realm as C and so I’m not overly concerned with C++. Many of the search results for C also have C++ in there so who knows how the confidence rating is calculated there.
The reason why I don’t know is that I did the search +”php programming” and +”python programming” on most of the search engines and I found that PHP was usually double the Python results, or more.
[table id=1 /]
The only places where Python took over was Wikipedia and YouTube, and not by much.
Here’s where the TIOBE index breaks down. It claims that “The ratings are based on the number of skilled engineers world-wide, courses and third party vendors.”
No it’s not.
The only way that TIOBE is calculated is by applying a single query on a few search engines and matching it up with a manual confidence factor for the results. That query is not sufficient in determining “popularity”. No data is provided as to why THAT query alone is sufficient to determine popularity. It’s assumed.
PHP is an easy language to test the veracity of that statement because there are very few other acronyms that match up with the programming language so neatly. Let’s compare the Youtube results of +”php programming” and “PHP”. The TIOBE query on Youtube yielded 575 results, but simply searching for “PHP” returned 768,000 results. On Google, the TIOBE query got 1,480,000 hits whereas the simple “PHP query yielded 14,870,000,000 results and the first few pages had only one false positive. Here’s an interesting Google query to try; +”php programming” site:www.php.net. It has fewer results than “python site:www.php.net”.
In other words, when checking against a query that is quite specific (not much else is called PHP) we see a very wide differentiation between the TIOBE query for PHP and a simpler query. It seems like the index is more of a data retrieval experiment than a real-world representation of actual popularity.
And I still can’t figure out why PHP dropped the way it does. Perhaps it has a high false positive rating. Which, I could argue, could be more important than search results. Why? Because the more popular you are the more people will try to use your popularity for their own ends. So what to make of the TIOBE index’s PHP results? I simply don’t know because the methodology seems to be pulled out of a hat. Perhaps a percentage increase of a given language is also taken into consideration (meaning that blog posts jumping 50% in a given month would be added to the score) but I didn’t see that noted in the index definition.
But I could be wrong. I could have completely misread the index definition and perhaps the real world popularity of Python DID jump over PHP. I know it did for a day or two after the results came out. But after going through the methodology and trying to match it up with what I saw in the search engines I have my doubts as to it’s accuracy.