What to make of TIOBE’s PHP results

A few weeks ago Twitter was a storm of activity around the TIOBE index.  PHP had dropped two spots having been overtaken by Python and C++.  Most of the activity was not from the PHP side.

Having used the TIOBE index myself I was a little surprised to see PHP drop so suddenly.  So I decided to take a look and see what was going on behind the scenes by reading on the index definition to see if I could understand what happened.

After reading through the definition I still don’t know what happened, or even why it happened.

The TIOBE index is calculated by looking at the top 6 sites on Alexa that have a search function which says how many hits there are for the query +”<language> programming”.  So, for PHP it would be +”php programming”.  The results of the search weighted based off of the search engine and calculated as a percentage so that the total of the top 50 programming languages equals 100%.  Additionally, each query is given a confidence rating to weed out false positives via a manual process.  For example, searching for +”basic programming” could also return a result for “Improve your basic programming skills in Java”.  If 10% of the hits in the first hundred pages of the search results are determined to be false positives then the search results will be taken at 90% of their value.

So what was it that caused PHP to tank and Python and C++ to take over?  In short; I don’t know.  Also, FTR, I consider C++ in the same realm as C and so I’m not overly concerned with C++.  Many of the search results for C also have C++ in there so who knows how the confidence rating is calculated there.

The reason why I don’t know is that I did the search +”php programming” and +”python programming” on most of the search engines and I found that PHP was usually double the Python results, or more.

[table id=1 /]

The only places where Python took over was Wikipedia and YouTube, and not by much.

Here’s where the TIOBE index breaks down.  It claims that “The ratings are based on the number of skilled engineers world-wide, courses and third party vendors.”

No it’s not.

The only way that TIOBE is calculated is by applying a single query on a few search engines and matching it up with a manual confidence factor for the results.  That query is not sufficient in determining “popularity”.  No data is provided as to why THAT query alone is sufficient to  determine popularity.  It’s assumed.

PHP is an easy language to test the veracity of that statement because there are very few other acronyms that match up with the programming language so neatly.  Let’s compare the Youtube results of +”php programming” and “PHP”.  The TIOBE query on Youtube yielded 575 results, but simply searching for “PHP” returned 768,000 results.  On Google, the TIOBE query got 1,480,000 hits whereas the simple “PHP query yielded 14,870,000,000 results and the first few pages had only one false positive.  Here’s an interesting Google query to try; +”php programming” site:www.php.net.  It has fewer results than “python site:www.php.net”.

In other words, when checking against a query that is quite specific (not much else is called PHP) we see a very wide differentiation between the TIOBE query for PHP and a simpler query. It seems like the index is more of a data retrieval experiment than a real-world representation of actual popularity.

And I still can’t figure out why PHP dropped the way it does.  Perhaps it has a  high false positive rating.  Which, I could argue, could be more important than search results.  Why?  Because the more popular you are the more people will try to use your popularity for their own ends. So what to make of the TIOBE index’s PHP results?  I simply don’t know because the methodology seems to be pulled out of a hat.  Perhaps a percentage increase of a given language is also taken into consideration (meaning that blog posts jumping 50% in a given month would be added to the score) but I didn’t see that noted in the index definition.

But I could be wrong.  I could have completely misread the index definition and perhaps the real world popularity of Python DID jump over PHP.  I know it did for a day or two after the results came out.  But after going through the methodology and trying to match it up with what I saw in the search engines I have my doubts as to it’s accuracy.

18 Thoughts to “What to make of TIOBE’s PHP results”

  1. I think there’s one other major thing that’s missing here – is the TIOBE index indicative of anything meaningful at all?

    For instance, if you replace the ‘programming’ with ‘development’ (arguably a much common word in this day and age), the results change radically:

    +”PHP development” – 1.5M
    +”Python development” – 97K
    (PHP 15x bigger)

    +”PHP development” – 376M
    +”Python development” – 19M
    (PHP 20x bigger)

    I would argue that this doesn’t mean much either, except for proving that the TIOBE index is not really indicative of anything, except for the trend of +”XYZ programming” search terms.

    For that reason (and the unexplainable fluctuations – probably as a result of search engine changes) I stopped referring to the TIOBE index several years ago, even when PHP looked amazing on it.

  2. The TIOBE Programming Community index is an “indicator of the popularity of programming languages”.

    The important part is to remember is it’s an *indicator* of some measure of popularity.

    Their definition of “popular” and methodology is far from scientific + no measurements of statistical significance.

    As a short experiement, you can google the following languages in the “past 24 hours”:

    PHP : 1,710,000,000 results (0.10 seconds)
    .NET : 1,440,000,000 results (0.11 seconds)
    Python: 195,000,000 results (0.08 seconds)
    JAVA : 1,500,000 results (0.06 seconds)

    In my opinion, how much content is written or updated about a language in the past 24 hours is a much better definition of “popular” and measurable.

    In short, the TIOBE index is indicative of something but don’t confuse that with being significant and actually measuring and prooving something.

    1. Yeah, but what is that “something” that it is indicative of? If it’s not reality what’s the point of it? The whole purpose of an indicator is to be a measurement of “something” either backward looking, forward looking or current. In order for it to be useful it needs to be accurate. It measures, according to the site, “The ratings are based on the number of skilled engineers world-wide, courses and third party vendors.” But the methodology, as defined on the website, doesn’t even look at that information.

      So if, as you say, the index is not significant, does not measure accurately and does not prove the thing it’s trying to prove, how can it be indicative of anything?

    1. I had linked to that report on the Zend Facebook page too. Our experience suggests that PHP demand is increasing not only in the overall market but in the high-tech “enterprise” organizations as well.

    1. That actually proves my point. OK, so PHP is trending downwards, but even so, it’s popularity according to THIS metric (which also includes news about snakes) is still 3-4 times what Python is which goes against the TIOBE index’s claim that Python is now more popular than PHP. The only way this works in Python’s favor is if you do a direct comparison of the TIOBE search terms (http://www.google.com/trends?q=python+programming%2C+php+programming&ctab=0&geo=all&date=all&sort=0) which shows Python overtaking PHP two years ago. But as I noted earlier, the terms that the TIOBE index use are arbitrary at best.

  3. Pingback: abcphp.com
  4. I wonder if the missing results are due to the compilers trying to take into account that PHP has a massive false positive rating: Google *still* hasn’t figured out that seeing “php”, with a dot in front of it, in the URL doesn’t mean the page is about PHP.

    1. Nope. With the query +”php programming” that is not going to happen. I also did a search for just “PHP” and in the first 10 pages (100 results) I believe I only found two false positives. For just “python” I found 12.

  5. It looks to me like the TIOBE index is just a really successful troll. Consider VB6 vs VB.NET. They are ranked at 5% vs .5%, a huge difference. If you even know anything at all about the VB world you’d know that it’s just the opposite. So they are not even close enough to be a loose estimate!

Leave a Reply

Your email address will not be published.