Generating secure cross site request forgery tokens (csrf)

I don’t talk much about security.  This is mostly because it’s such a moving target.  I’m also horrified that I might give bad advice and someone will be hacked because of me.

But in researching the second edition for the IBM i Programmer’s Guide to PHP Jeff and I decided to include a chapter on security since we really didn’t talk much about it in the first edition.  I’m talking about cross site request forgeries right now and I wanted to make sure that what I was going to suggest would not break the internet in some way.

I did some Google searching to see what other people were recommending.  Almost all of the pages I found for generating a CSRF token use code like this

1
$token = md5(uniqid(rand(), true));

On the pages for rand() and uniqid(), as well as looking at the C code, they specifically state that these functions should not be used for generating secure tokens.  They tend to generate predictable values.  And the documentation for md5() states that it should not be used for password hashing.  Granted we’re not hashing passwords when creating a CSRF token, but with the tooling available shouldn’t we be using functions that are more cryptographically secure?  Like this?

1
2
3
4
5
$token = hash_hmac(
    'sha512',
    openssl_random_pseudo_bytes(32),
    openssl_random_pseudo_bytes(16)
);

Am I missing something or wouldn’t something like this be a whole lot better?

[UPDATE]

padraicb validated my thought on the matter.  The goal here is the random value.  As such the hashing using hash_hmac() does not buy you a whole lot extra.  The number of possible values in a 32 byte random string is 1.1579208923731619542357098500869e+77.  That alone would seem to be enough for a CSRF prevention token.  mt_rand() returns an integer which gives you  about 4 billion possible numbers.  While that will probably protect you, the other value will offer you better protection.  There’s no sense in gambling with a smaller value if you have the ability to generate a larger value with virtually no additional cost.

So it would seem that, for generating a proper token the code that you would really need is this

1
$token = base64_encode( openssl_random_pseudo_bytes(32));

The only reason for the base64_encode() call is to make sure that the value provided will not break your HTML layout.

37 comments
RenThraysk
RenThraysk

I don't like the reliance on random numbers. I actually think your first suggestion of a HMAC is on the right path, but again not hashing random bytes. The $data argument to hash_hmac should be made up from serialised data. This should include the full uri to where the form is to be posted, session id, and any hidden values in the form (). This provides not only CSRF protection, but also another layer of validation to parts of the form. The $key parameter for the CSRF could be a site wide secret, and do away with needing to use $_SESSION at all.

ircmaxell
ircmaxell

The perfect reason for not relying upon rand() or mt_rand() is that both are susceptible to seed poisoning: http://www.suspekt.org/2008/08/17/mt_srand-and-not-so-random-numbers/ So to produce strong random numbers, rand() or mt_rand() should not be used in a predictable manner: http://blog.ircmaxell.com/2011/07/random-number-generation-in-php.html I'm working on splitting out the RNG from CryptLib and PasswordLib into a stand-alone dependency so that you can use its strong random mixer to produce these kinds of tokens (it uses many sources to generate the randomness, and is secure as long as any one source is secure)... https://github.com/ircmaxell/RandomLib

pmjones
pmjones

Based on comments elsewhere, I see the point. Looks like I have to modify https://github.com/auraphp/Aura.Session to use SSL when available, and only fall back to mt_rand when SSL is not available. Thanks, gentlemen.

pmjones
pmjones

I am not a security expert, so please be gentle. What does the extra cryptographic security buy us? For long-lived hashes that get used over and over, I can see the point, but for what are short-lived tokens, it seems a bit of overkill. Additionally, it seems like it would deplete the entropy available to the system more rapidly. Too many CSRF tokens that get used and thrown away means you don't have the entropy when you need it for real security.

ezimuel
ezimuel

The real security problem in generating a secure CSRF token is the randomness of the seed. MD5 or SHA512 are not so different in this case from a security point of view. The openssl_random_pseudo_bytes() is the most secure way to generate good random numbers in PHP. For instance, in ZF2 we used that function to generate CSRF token in Zend\Form.

ircmaxell
ircmaxell

If it's 100% deterministic for the server (has no random per-session data), then it's 100% deterministic for the client. And that means it's 100% deterministic for an attacker as well. Which basically means that the protection is useless at stopping CSRF style attacks...

padraicb
padraicb

The OWASP version relies on two options as a token: A. The SHA512 hash of mt_rand(). The MD5 hashes of all outputs from mt_rand() are online. SHA256 hashes can be brute forced at some incredible speeds on a GPU making it fairly pointless for minimal entropy inputs - it's only a number between 0 and 2^31 (mt_getrandmax()). SHA512 is much much slower that SHA256 but I can't help wonder if it's so slow as to take TOO long running only 2.147B comparisons - most hashing tools have GPU support these days and the last GPU generation were marvellous for this task. It wouldn't surprise me if it took

ezimuel
ezimuel

I just sent an email to the author of PHP_CSRF_Guard suggesting to use openssl_random_pseudo_bytes() instead of mt_rand(). I agree with @padraicb, the random number provided by OpenSSL is enough for a CSRF token, you can just use it without an hash function.

harikt
harikt

By the way @pmjones they are using a hashing algorithm ( $token=hash("sha512",mt_rand(0,mt_getrandmax())); ) and in the top it mentions the code is not verified by OWASP experts.

timoh
timoh

When using cryptographically strong random bytes, you don't have to worry about possible edge cases and attack vectors etc. that may appear when using weak randomness. Ie. when the system is under an active attack. I'd make sure CSRF tokens are also generated using strong randomness (it is easy to make sure the system do not get vulnerable, in any situation (edge cases included), because of weak randomness). If strong randomness is not available, just exit with an error. About "deplete the entropy available", this is actually not the case with /dev/urandom and alike. System random number generators (like /dev/urandom) do not run out of entropy. Urandom _might_ be low on entropy immediately after a fresh OS install, but this is insignificant when talking about web apps.

kschroeder
kschroeder moderator

Cryptos (κρυπτός) and graphein (γράφειν) just means "secret writing". When we're generating a token what we want to do is give a secret to the person on the web page that will be extremely difficult to predict. The examples that I've found tend to rely on uniqid() which is based off of the time and, thus, predictable. So when you're thinking about cryptography you are probably thinking about the actual act of encryption, which is not what we're talking about. We are using the tool from one of the first steps in the chain for creating an "unpredictable" value. The 32 bytes (256 bits) of data give us 1.1579208923731619542357098500869e+77 values, which is a pretty big set of values for you to use and so I doubt that you would deplete entropy. However, mt_rand() returns an integer, not a series of bytes. That means that you have only 4 billion or so numbers to choose from. Compared to that other huge number, I would choose the latter.

harikt
harikt

I too think the same for it is just a form token. Is the cryptography really needed. May be to that sort of systems, but not to all I guess.

ezimuel
ezimuel

The problem with uniqid(mt_rand(), true); is related with mt_rand() that is not cryptographically secure. A more secure way to generate a random token is to use md5(openssl_random_pseudo_bytes(32)); or hash($algo, openssl_random_pseudo_bytes(128)); where $algo is sha-*. If you don't have the OpenSSL extension enabled you can use the mcrypt_create_iv($length, MCRYPT_DEV_URANDOM); where $length is the size of the random bytes. We implemented a random generator in ZF2 based on this considerations: https://github.com/zendframework/zf2/blob/master/library/Zend/Math/Rand.php#L25

kschroeder
kschroeder moderator

It uses that as an example for generating a token, but that page also specifically states that it is based off of microtime. Because of that the value would be predictable.

timoh
timoh

I'd like to add (as I posted to pmjones' blog) that it is a bit misleading to say that openssl_random_pseudo_bytes() is “better” (security-wise speaking) than any other method that relies on /dev/urandom (or the Windows equivalence on Windows). Reading straight from /dev/urandom, or fetching bytes some other way (which uses /dev/urandom) are all practically equal. Care should be taken to make sure to avoid those quirks when fetching random bytes. For example, openssl_random_pseudo_bytes() blocking on certain versions, /dev/uradom not available on Windows and security issues with mcrypt_create_iv() (using DEV_URANDOM) on certain versions on Windows.

kschroeder
kschroeder moderator

Thanks. That's a good point. In other words, using md5() or sha512 is not as important as getting the actual random bits. The hashing, itself, is really only there to make sure that the bits that come out do not break the format. One could almost say that when using openssl_random_pseudo_bytes() you could use md5(), hash_hmac() or base64_encode() without a loss of security, something that would not be possible to say about uniqid().

RenThraysk
RenThraysk

The key for the HMAC is server side secret. The client or attacker never knows it.

pmjones
pmjones

Cool. I see that the Math library does that; is it used in the Session library?

pmjones
pmjones

You are making the assumption, though, that a CSRF token falls in the realm of "cryptography." (Perhaps it is.) Is not a random shared value, sent along with the form, enough to defeat CSRF attacks? You say the random value is predictable and this may be true, but I'd like to see a demonstration of it. How much time and effort is required to predict it?

harikt
harikt

Thank you for making it clear.

siliconforks
siliconforks

Yes, it would be predictable - presumably that's why that code was removed. I'm just saying that is why you see that code all over the Internet (and in various open source projects) - it is because everyone originally copied it from the PHP manual.

ezimuel
ezimuel

@timoh you are right but we compared mt_rand() or rand() with openssl_random_pseudo_bytes() and this is better from a secure point of view because it uses a pseudo random source like /dev/urandom. Moreover openssl_random_pseudo_bytes() is supported also on Windows where /dev/urandom is not available.

padraicb
padraicb

@kschroeder The primary goal of the CSRF token is to be an unpredictable random string of sufficient length to defeat brute force attacks. So literally the OpenSSL PRNG is sufficient. 32 being a nice length (anything less than 8 being severely weak). Hashing or obscuring the token is unnecessary since the random number is itself is not a secret - what is sent to the user is. If that's a hash then the attacker only needs the hash. Base64 encoding is merely to ensure the token is a simple ASCII compatible string. Note: Tokens are generated securely as a standard practice. Also note the "pseudo" in the function name if concerned about entropy consumption ;).

kschroeder
kschroeder moderator

...I should say a *significant* loss in security.

RenThraysk
RenThraysk

If an attacker that can access the HMAC secret key on your server, you have more worrying concerns. Like credentials to access databases directly. I wouldn't say it was going against the rest of industry. The wider security field has created Message Authentication Codes as means to provide assurances about messages. The message in this case is a HTTP POST request. Benefits: It's stateless. Having multiple forms on the same page, or the user have multiple pages with multiple forms open will work, and each would have different token. It's trivial to combine an expiration time within the token, [expires.hmac(expires + data)] so you can shorten the time that a token remains valid. Closing the window on replay attacks.

RenThraysk
RenThraysk

Even a large psuedo-random number gets written somewhere, that is what $_SESSION does. So an internal attacker can read it.

kschroeder
kschroeder moderator

Additionally, 70% of all successful attacks come from inside an organization. Having a configurable value a) requires you to manage the key, and b) is something that an internal attacker may have knowledge of. Using a large pseudo-random number requires no configuration management and is not known by an internal individual. Defense in Depth, baby!

ircmaxell
ircmaxell

Well, you do disclose the derivative of it (via the HMAC), so if they know what goes into the left side, they can attempt to brute force the right side. Not a huge issue, but something to think about. But in the end, what does this gain you? Nonce is a proven technique that does not require storing cryptographic secrets (which is what your key really is), and has good forward security (breaching today implies nothing towards breaching tomorrow). Your method requires a cryptographic secret, and has poor forward security (a breach today means a breach tomorrow). The rest of the security industry recommends using a random nonce, typically per-request (but at least per session). So what major benefit does this add to that paradigm that it's worth going against the rest of the industry?

kschroeder
kschroeder moderator

There are parts of token generation that, on a basic level, do fall into the realm of cryptography since cryptography is about "writing secrets". Beyond that the link to crypto is simply that the cryptographic tooling does a better job of providing more, better, pseudo-random values. When we're talking about predictability it will depend on which function we're talking about. If you have a timestamp, uniqid() is actually pretty easy to guess. It was designed to be unique, not unpredictable. And mt_rand() isn't so much predictable as it has a significantly smaller pool of values to choose from. In other words, mt_rand() is good, but openssl_random_pseudo_bytes() is better.

ezimuel
ezimuel

I would suggest to use an hashing algorithm (MD5 or SHA-*), instead of base64, as final output for a token because it offers a better obfuscation of the seed (hashing are not invertible).

Trackbacks

  1. […] Looks like we need to update Aura.Session to use openssl when available and fall back to mt_rand() when it’s not. Via Generating secure cross site request forgery tokens (csrf). […]

  2. […] few notes about this approach. First, use openssl_random_pseudo_bytes instead of mt_rand ( suggested by Kevin Schroeder ) when possible. Second, be sure to only use === when comparing the token value. You want to avoid […]

Post Navigation

Web Analytics