Monday 15 April 2013

Depending on the Java Matcher method you use, your regex may not get what you want

We had a system that you could configure with regexes to parse
incoming data. One of them was a pattern to look at a user's email,
and was configured like:

@domain\.com

And it wasn't matching the values we had, even though the values were
of the form:


somebody@domain.com

someone.else@domain.com


Then I found that the code we used to find a match was:

pattern.matcher(value).matches();

Now the javadoc for matches() says:

"Attempts to match the entire region against the pattern."

So if you look at the ENTIRE string, then

somebody@domain.com

would never match

If you use Matcher.find(), then it just looks at any substring that
matches the pattern, and so this would succeed.

The fix was to change the regex to


.*@domain\.com

YES, it also means that invalid username values would be matched, but
we have other filters that would check if the entire string was of
valid email format. All we care about is the domain, for this bit of
functionality.


REF: http://stackoverflow.com/questions/4450045/difference-between-matches-and-find-in-java-regex

No comments: