Tuesday, 12 March 2013

The Email RegEx that (could had) DOSed a site

While I was writing the UnitTests for TeamMentor's NewUser validator (see Validating a POCO DataContract using .NET's DataAnnotations Validator ), I had a weird result in one of the tests.

I basically got a 'never ending execution' scenario on this UnitTest:



image

The test shown above (not 100% completed at this stage), was supposed to check that we got the expected ‘The field xyz must be a string with a maximum length of nn’ errors on all string POCO properties .

Instead we got this:
image  and image

So I executed that test under the debugger, paused the execution (once in that ‘high CPU state’) and saw:

image

Which means that one of the 'to be validated' values is hanging the validator.

Looking at the NewUser class, the most likely culprit is the Email Field (since that is the only one that has a RegEx)

image

To confirm it, lets run the same test with a small email value:

image

And bingo, the test runs in 200ms:

image

Note that the 200ms execution time is NOT caused by the 'validation step' since using 5001 random letters for each field:

image

produces a similar value (198ms):

image

What is the problem?

It is the RegEx used:
public class ValidationRegex
{
    public const string Email = @"^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$";
}

because, if we change this regex to:
public class ValidationRegex
{        
    public const string Email = @"^[\w-\.]{1,}\@([\w]{1,}\.){1,}[a-z]{2,4}$";
}

and restore the test:

image

we will get the expected time:

image

and results

image


Final version of the test:

image

which executes in 173ms

image

Moral of the story and the power of fuzzing:

Always test your RegEx queries on UnitTests to ensure that they are able to sustain the data that can be sent to it.

Here is another good use for an ESTAPI, which would provide a set of RegExes for particular 'types of validation' and a batch of tests to check its funcionality (and DOS-protection capabilities)

I wonder if the Email RegEx (that DOS on 20+ chars) is only a prob in the .NET Framework?