Regular expressions are quite possibly the least enjoyable thing about programming, mostly because I can't read them they're terrible. They're supposed to be used to search for and match within text, but the more often I encounter them, the more often I lament their very existence. Like this one:
^(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]){3}$
What the hell does this match? I've found regular expressions like this before in my day-to-day work, and there are only two possible outcomes from any such encounter:
- I figure out what this regex is supposed to be doing, delete it, and replace it with readable code OR
- I don't figure out what it is doing and have to leave it for fear of breaking something.
I hate regular expressions. I'll find one in my code, and Scenario #2 inevitably rears its ugly head, and then I'm all:
To me, regular expressions are like meetings: sometimes we need them to do effective work, but damn do they suck. Which is why stumbling on libraries such as VerbalExpressions makes me feel all giddy inside, like rainbows and lollipops will spew forth from my mouth in a torrent of glee.
That happens to everyone, right? Not just me?
What Is VerbalExpressions?
VerbalExpressions is a library that builds regular expressions from readable code. For example, let's say we had this regex:
^(http)(s)?(://)([^\ ]*)$
This regex is designed to match simple URLs. Here's the rules for matching:
- The URL must start with either "http" or "https".
- The URL must then have "://".
- The URL can then have anything following "://", as long as it is isn't a space.
VerbalExpressions allows us to write the following C# code to produce this regex:
var urlExp = new VerbalExpressions()
.StartOfLine()
.Then("http")
.Maybe("s")
.Then("://")
.AnythingBut(" ")
.EndOfLine();
Which, if I do say so myself, is a LOT better than trying to read through dense, impossible-to-parse regular regular expressions.
But let's say you don't believe me (and, honestly, you don't have to), and would like to test it yourself. In order to test that this regex is valid, we could use simple assertions.
var url = "http://www.exceptionnotfound.net";
Assert.IsTrue(urlExp.Test(url), "The URL is not valid!");
A Few More Examples
Let's walk through converting two more common regular expressions. First up is a regex that is designed to do simple validation on an email:
^(.*)(@)([^\ ]*)(\.)([^\ ]*)$
Here's the rules:
- The email may start with any text, followed by an '@' symbol.
- After the '@', the email may contain any text (except a blank space), followed by a '.'
- After the '.', the email address may contain any text (except a blank space).
Here's how we would write that using VerbalExpressions:
var emailExp = new VerbalExpressions()
.StartOfLine()
.Anything()
.Then("@")
.AnythingBut(" ")
.Then(".")
.AnythingBut(" ")
.EndOfLine();
var email = "[email protected]";
var invalidEmail = "test@example";
Assert.IsTrue(emailExp.Test(email), "The email is not valid!");
Assert.IsTrue(emailExp.Test(invalidEmail), "The email is not valid!"); //This assert will fail!
What about a phone number? For simplicity's sake, I'm assuming a United States ten-digit telephone number. Possible matches include:
(123) 456-7890
123 456-7890
1234567890
The regex for this looks like the following (this absolutely can be shortened):
^(\()?[0-9]{3}(\))?(\ )?[0-9]{3}(-)?[0-9]{4}$
Here's the rules:
- The phone number may start with "(".
- The phone number must then have 3 digits, each of which are in the range 0-9.
- The phone number may then have ")".
- Following the optional ")", the phone number may also have a space.
- Following the optional space, the phone number must have 3 digits, each in the range 0-9.
- Following this set of digits, the phone number may optionally include a dash ("-").
- Following the optional dash, the phone number must have 4 digits, each in the range 0-9.
Here's the VerbalExpressions code for this:
var phoneExp = new VerbalExpressions()
.StartOfLine()
.Maybe("(")
.Range('0', '9')
.RepeatPrevious(3)
.Maybe(")")
.Maybe(" ")
.Range('0', '9')
.RepeatPrevious(3)
.Maybe("-")
.Range('0', '9')
.RepeatPrevious(4)
.EndOfLine();
var phone = "(123) 456-7890";
var invalidPhone = "(123) 456-789";
Assert.IsTrue(phoneExp.Test(phone), "The phone number is invalid.");
Assert.IsTrue(phoneExp.Test(invalidPhone), "The phone number is invalid."); //This assert will fail!
Testing the Generated Expressions
Let's say we don't trust this package and want to prove that it is creating regexs that actually match appropriate input. For simple testing, we can use Assert. Let's test all three of the above regexes:
var url = "http://www.exceptionnotfound.net";
var email = "[email protected]";
var invalidEmail = "test@example";
var phone = "(123) 456-7890";
Assert.IsTrue(urlExp.Test(url), "The URL is not valid!");
Assert.IsTrue(emailExp.Test(email), "The email is not valid!");
Assert.IsTrue(phoneExp.Test(phone), "The phone number is invalid.");
Assert.IsTrue(emailExp.Test(invalidEmail), "The email is not valid!"); //This assert will fail!
Easy enough, right? I'd like to see more complex testing examples, so if anyone out there comes up with some, let me know!
In addition to testing, one of the cooler little things about VerbalExpressions is that, during debugging, the actual generated regexes are shown for the instances of VerbalExpressions in Visual Studio's debugger windows:
An Important NuGet Note
As of this writing, there is a NuGet package for the C# edition of VerbalExpressions, but the package is woefully behind the most recent version of the code on GitHub. Here's hoping the creator of the package gets this on NuGet so we can use it from there. For this demo, I just downloaded and included the code files in my project (there are only two of them).
Summary
Regular Expressions still suck, but now they suck less (at least in C#) thanks to VerbalExpressions! Use this package to build readable, easy-to-understand regular expressions that can still be used in everyday coding.
I have a sample project for this post over on GitHub, so go check it out!
As always, if I missed something or the code can be made better, feel free to let me know in the comments. If you hate regular expressions, feel free to vent your anger below.
Happy Coding!
NOTE: The example regular expression at the beginning of this post is for parsing IPV4 addresses, taken from this StackOverflow answer.