Monday, November 21, 2005

regex package in java & handling wild card characters

Java offers you regex package to handle the regular expression. Suppose you take a string from the user and search for the string in the database using regular expression, you may have some problems in implementing wildcard characters handling. Regex package can generate regular expression for all the string that you pass. But wildcard character * in regular expression has a different meaning from our normal interpretation.

Take for an example,

The search string entered : a*b

In regular expression, a*b means any number of 'a's followed by a 'b'.
so in this case, valid match could be

ab, aab, aaab, (a)-n times followed by (b).

but what user wants could be a followed by any character, any number of times and finally a 'b'. To convert this search string in to a proper format to be passed to the regex method is to replace '*' with a '.*' . And similarly for '?', it should be replaced by '.'

So, a search string "a*b"( in our normal sense, meaning 'a' followed by any character any number of times, followed by 'b') should be modified as "a.*b" before passing on to the regex package. If this search string is passed to the regex and asked to generate a regular expression from it, then we can get this regular expression to do the normal wildcard operation.

Similarly, a search string "a?b" ( meaning 'a' followed by any one character, followed by 'b') should be converted into "a.b" before passing on to the regex for generating the regular expression.

No comments:

Post a Comment