Generating secure cross site request forgery tokens (csrf)

I don’t talk much about security.  This is mostly because it’s such a moving target.  I’m also horrified that I might give bad advice and someone will be hacked because of me.

But in researching the second edition for the IBM i Programmer’s Guide to PHP Jeff and I decided to include a chapter on security since we really didn’t talk much about it in the first edition.  I’m talking about cross site request forgeries right now and I wanted to make sure that what I was going to suggest would not break the internet in some way.

I did some Google searching to see what other people were recommending.  Almost all of the pages I found for generating a CSRF token use code like this

$token = md5(uniqid(rand(), true));

On the pages for rand() and uniqid(), as well as looking at the C code, they specifically state that these functions should not be used for generating secure tokens.  They tend to generate predictable values.  And the documentation for md5() states that it should not be used for password hashing.  Granted we’re not hashing passwords when creating a CSRF token, but with the tooling available shouldn’t we be using functions that are more cryptographically secure?  Like this?

$token = hash_hmac(

Am I missing something or wouldn’t something like this be a whole lot better?


padraicb validated my thought on the matter.  The goal here is the random value.  As such the hashing using hash_hmac() does not buy you a whole lot extra.  The number of possible values in a 32 byte random string is 1.1579208923731619542357098500869e+77.  That alone would seem to be enough for a CSRF prevention token.  mt_rand() returns an integer which gives you  about 4 billion possible numbers.  While that will probably protect you, the other value will offer you better protection.  There’s no sense in gambling with a smaller value if you have the ability to generate a larger value with virtually no additional cost.

So it would seem that, for generating a proper token the code that you would really need is this

$token = base64_encode( openssl_random_pseudo_bytes(32));

The only reason for the base64_encode() call is to make sure that the value provided will not break your HTML layout.

40 Thoughts on “Generating secure cross site request forgery tokens (csrf)

  1. The real security problem in generating a secure CSRF token is the randomness of the seed. MD5 or SHA512 are not so different in this case from a security point of view. The openssl_random_pseudo_bytes() is the most secure way to generate good random numbers in PHP. For instance, in ZF2 we used that function to generate CSRF token in ZendForm.

    • kschroeder on February 11, 2013 at 12:16 pm said:

      Thanks. That’s a good point. In other words, using md5() or sha512 is not as important as getting the actual random bits. The hashing, itself, is really only there to make sure that the bits that come out do not break the format. One could almost say that when using openssl_random_pseudo_bytes() you could use md5(), hash_hmac() or base64_encode() without a loss of security, something that would not be possible to say about uniqid().

      • kschroeder on February 11, 2013 at 12:16 pm said:

        …I should say a *significant* loss in security.

        • I would suggest to use an hashing algorithm (MD5 or SHA-*), instead of base64, as final output for a token because it offers a better obfuscation of the seed (hashing are not invertible).

      • padraicb on February 13, 2013 at 5:34 am said:

        @kschroeder The primary goal of the CSRF token is to be an unpredictable random string of sufficient length to defeat brute force attacks. So literally the OpenSSL PRNG is sufficient. 32 being a nice length (anything less than 8 being severely weak). Hashing or obscuring the token is unnecessary since the random number is itself is not a secret – what is sent to the user is. If that’s a hash then the attacker only needs the hash. Base64 encoding is merely to ensure the token is a simple ASCII compatible string.
        Note: Tokens are generated securely as a standard practice. Also note the “pseudo” in the function name if concerned about entropy consumption ;).

    • I’d like to add (as I posted to pmjones’ blog) that it is a bit misleading to say that openssl_random_pseudo_bytes() is “better” (security-wise speaking) than any other method that relies on /dev/urandom (or the Windows equivalence on Windows). Reading straight from /dev/urandom, or fetching bytes some other way (which uses /dev/urandom) are all practically equal.
      Care should be taken to make sure to avoid those quirks when fetching random bytes. For example, openssl_random_pseudo_bytes() blocking on certain versions, /dev/uradom not available on Windows and security issues with mcrypt_create_iv() (using DEV_URANDOM) on certain versions on Windows.

    • kschroeder on February 11, 2013 at 1:34 pm said:

      It uses that as an example for generating a token, but that page also specifically states that it is based off of microtime. Because of that the value would be predictable.

      • Yes, it would be predictable – presumably that’s why that code was removed. I’m just saying that is why you see that code all over the Internet (and in various open source projects) – it is because everyone originally copied it from the PHP manual.

  2. I am not a cryptographic expert.
    In Aura.Session uniqid(mt_rand(), true); is used
    The problem with openssl is we need it to be installed and configured in server. I have seen another one hash(‘sha256’, uniqid(mt_rand(), true), true);

    • ezimuel on February 12, 2013 at 3:42 am said:

      The problem with uniqid(mt_rand(), true); is related with mt_rand() that is not cryptographically secure. A more secure way to generate a random token is to use md5(openssl_random_pseudo_bytes(32)); or hash($algo, openssl_random_pseudo_bytes(128)); where $algo is sha-*. If you don’t have the OpenSSL extension enabled you can use the mcrypt_create_iv($length, MCRYPT_DEV_URANDOM); where $length is the size of the random bytes. We implemented a random generator in ZF2 based on this considerations:

  3. pmjones on February 12, 2013 at 8:54 pm said:

    I am not a security expert, so please be gentle.
    What does the extra cryptographic security buy us? For long-lived hashes that get used over and over, I can see the point, but for what are short-lived tokens, it seems a bit of overkill.
    Additionally, it seems like it would deplete the entropy available to the system more rapidly. Too many CSRF tokens that get used and thrown away means you don’t have the entropy when you need it for real security.

    • I too think the same for it is just a form token. Is the cryptography really needed. May be to that sort of systems, but not to all I guess.

    • kschroeder on February 13, 2013 at 7:53 am said:

      Cryptos (κρυπτός) and graphein (γράφειν) just means “secret writing”. When we’re generating a token what we want to do is give a secret to the person on the web page that will be extremely difficult to predict. The examples that I’ve found tend to rely on uniqid() which is based off of the time and, thus, predictable. So when you’re thinking about cryptography you are probably thinking about the actual act of encryption, which is not what we’re talking about. We are using the tool from one of the first steps in the chain for creating an “unpredictable” value.
      The 32 bytes (256 bits) of data give us 1.1579208923731619542357098500869e+77 values, which is a pretty big set of values for you to use and so I doubt that you would deplete entropy.
      However, mt_rand() returns an integer, not a series of bytes. That means that you have only 4 billion or so numbers to choose from. Compared to that other huge number, I would choose the latter.

      • Thank you for making it clear.

      • pmjones on February 13, 2013 at 8:21 am said:

        You are making the assumption, though, that a CSRF token falls in the realm of “cryptography.” (Perhaps it is.)
        Is not a random shared value, sent along with the form, enough to defeat CSRF attacks? You say the random value is predictable and this may be true, but I’d like to see a demonstration of it. How much time and effort is required to predict it?

        • kschroeder on February 13, 2013 at 8:32 am said:

          There are parts of token generation that, on a basic level, do fall into the realm of cryptography since cryptography is about “writing secrets”. Beyond that the link to crypto is simply that the cryptographic tooling does a better job of providing more, better, pseudo-random values.
          When we’re talking about predictability it will depend on which function we’re talking about. If you have a timestamp, uniqid() is actually pretty easy to guess. It was designed to be unique, not unpredictable. And mt_rand() isn’t so much predictable as it has a significantly smaller pool of values to choose from. In other words, mt_rand() is good, but openssl_random_pseudo_bytes() is better.

    • When using cryptographically strong random bytes, you don’t have to worry about possible edge cases and attack vectors etc. that may appear when using weak randomness. Ie. when the system is under an active attack. I’d make sure CSRF tokens are also generated using strong randomness (it is easy to make sure the system do not get vulnerable, in any situation (edge cases included), because of weak randomness). If strong randomness is not available, just exit with an error.
      About “deplete the entropy available”, this is actually not the case with /dev/urandom and alike. System random number generators (like /dev/urandom) do not run out of entropy. Urandom _might_ be low on entropy immediately after a fresh OS install, but this is insignificant when talking about web apps.

  4. pmjones on February 12, 2013 at 8:58 pm said:

    Additionally, the OWASP guys seem to think mt_rand() is sufficient for the purpose:
    I cannot say if their method is *actually* sufficient.

    • By the way @pmjones they are using a hashing algorithm ( $token=hash(“sha512”,mt_rand(0,mt_getrandmax())); ) and in the top it mentions the code is not verified by OWASP experts.

    • ezimuel on February 13, 2013 at 6:00 am said:

      I just sent an email to the author of PHP_CSRF_Guard suggesting to use openssl_random_pseudo_bytes() instead of mt_rand(). I agree with @padraicb, the random number provided by OpenSSL is enough for a CSRF token, you can just use it without an hash function.

    • padraicb on February 18, 2013 at 6:43 am said:

      The OWASP version relies on two options as a token:

      A. The SHA512 hash of mt_rand().

      The MD5 hashes of all outputs from mt_rand() are online. SHA256 hashes can be brute forced at some incredible speeds on a GPU making it fairly pointless for minimal entropy inputs – it’s only a number between 0 and 2^31 (mt_getrandmax()). SHA512 is much much slower that SHA256 but I can’t help wonder if it’s so slow as to take TOO long running only 2.147B comparisons – most hashing tools have GPU support these days and the last GPU generation were marvellous for this task. It wouldn’t surprise me if it took

  5. Based on comments elsewhere, I see the point. Looks like I have to modify to use SSL when available, and only fall back to mt_rand when SSL is not available. Thanks, gentlemen.

  6. Pingback: For CSRF tokens, mt_rand() is ok-ish but openssl_random_pseudo_bytes() is a lot better | Paul M. Jones

  7. ircmaxell on February 14, 2013 at 7:30 am said:

    The perfect reason for not relying upon rand() or mt_rand() is that both are susceptible to seed poisoning:
    So to produce strong random numbers, rand() or mt_rand() should not be used in a predictable manner:
    I’m working on splitting out the RNG from CryptLib and PasswordLib into a stand-alone dependency so that you can use its strong random mixer to produce these kinds of tokens (it uses many sources to generate the randomness, and is secure as long as any one source is secure)…

  8. RenThraysk on February 15, 2013 at 10:33 am said:

    I don’t like the reliance on random numbers.
    I actually think your first suggestion of a HMAC is on the right path, but again not hashing random bytes.
    The $data argument to hash_hmac should be made up from serialised data. This should include the full uri to where the form is to be posted, session id, and any hidden values in the form ().
    This provides not only CSRF protection, but also another layer of validation to parts of the form.
    The $key parameter for the CSRF could be a site wide secret, and do away with needing to use $_SESSION at all.

    • kschroeder on February 15, 2013 at 10:47 am said:

      Could you explain why hashing values that are relatively easy to figure out is better than a pseudo random number generator?

    • ircmaxell on February 16, 2013 at 6:41 am said:

      If it’s 100% deterministic for the server (has no random per-session data), then it’s 100% deterministic for the client. And that means it’s 100% deterministic for an attacker as well. Which basically means that the protection is useless at stopping CSRF style attacks…

      • RenThraysk on February 16, 2013 at 8:21 am said:

        The key for the HMAC is server side secret. The client or attacker never knows it.

        • ircmaxell on February 16, 2013 at 8:27 am said:

          Well, you do disclose the derivative of it (via the HMAC), so if they know what goes into the left side, they can attempt to brute force the right side. Not a huge issue, but something to think about.
          But in the end, what does this gain you? Nonce is a proven technique that does not require storing cryptographic secrets (which is what your key really is), and has good forward security (breaching today implies nothing towards breaching tomorrow). Your method requires a cryptographic secret, and has poor forward security (a breach today means a breach tomorrow). The rest of the security industry recommends using a random nonce, typically per-request (but at least per session). So what major benefit does this add to that paradigm that it’s worth going against the rest of the industry?

        • kschroeder on February 16, 2013 at 9:32 am said:

          Additionally, 70% of all successful attacks come from inside an organization. Having a configurable value a) requires you to manage the key, and b) is something that an internal attacker may have knowledge of. Using a large pseudo-random number requires no configuration management and is not known by an internal individual. Defense in Depth, baby!

        • RenThraysk on February 19, 2013 at 8:54 am said:

          Even a large psuedo-random number gets written somewhere, that is what $_SESSION does. So an internal attacker can read it.

        • RenThraysk on February 19, 2013 at 9:36 am said:

          If an attacker that can access the HMAC secret key on your server, you have more worrying concerns. Like credentials to access databases directly.
          I wouldn’t say it was going against the rest of industry. The wider security field has created Message Authentication Codes as means to provide assurances about messages. The message in this case is a HTTP POST request.
          It’s stateless.
          Having multiple forms on the same page, or the user have multiple pages with multiple forms open will work, and each would have different token.
          It’s trivial to combine an expiration time within the token, [expires.hmac(expires + data)] so you can shorten the time that a token remains valid. Closing the window on replay attacks.

  9. Pingback: Stateless CSRF Tokens | Joseph Scott

Leave a Reply

Your email address will not be published. Required fields are marked *

Post Navigation

Web Analytics