Friday, July 15, 2011

RegEx Lesson 2 – Matching Several Characters – Quick Guide to Regular Expressions

2.1 Matching one of Several Characters
T: amith smith gmith
RegEx: [ag]mith
R: amith smith gmith


[Rr]eg[Ee]x ,Match both RegEx or regex.
2.2 Using Character set ranges

    [ns]a[0123456789] == [ns]a[0-9]
    • A-Z matches all uppercase characters from A to Z.
    • a-z matches all lowercase characters from a to z.

    note:
    When you use ranges, be careful not to provide an end range that is less than the start range (such as [Z-A]).

    Multiple ranges may be combined in a single set.
    [A-Za-z0-9] ,This pattern is shorthand for:
    [ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890]

    Eg: validation RegEx for css color codes;
    T: BGCOLOR="#336633" TEXT="#FFFFFF"
    RegEx: #[0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f]
    R: It will check the color code are in hexadecimal range.
    BGCOLOR="#336633" TEXT="#FFFFFF"


2.3 Anything but not this; (using the ^ metacharacter)
The pattern used in this example is the exact opposite of the one used previously. [0-9] matches all digits (and only digits). [^0-9] matches anything by the specified range of digits. As such,

[ns]a[^0-9]\.xls matches sam.xls but not na1.xls, na2.xls, or sa1.xls.


2.4 Dealing with meta characters with escaping(\):
There may be some special cases where metacharacters wont behave their normal way. For example it cant be used . to match. or [ to match [. So we have to specially pointed out what is the exact role of those meta characters in a Regular Expression.


    T: myArray[0]
    RegEx: myArray\[0\]
    R: myArray[0]


So any metacharacter can be escaped by preceding it with a blackslash (\).


2.5 Matching Whitespace Characters:
When you are performing regular expression searches, you’ll often need to match nonprinting whitespace characters within your text.
Metacharacter Description
[\b] Backspace
\f Form feed
\n Line feed
\r Carriage return
\t Tab
\v Vertical tab
Matching Specific Character types
\d Any digit (same as [0-9])
\D Any nondigit (same as [^0-9])
\w Any alphanumeric character in upper or lower case and underscore (same as [a-zA-Z0-9_])
\W Any nonalphanumeric or underscore character (same as [^a-zA-Z0-9_])
\s Any whitespace character (same as [\f\n\r\t\v])
\S Any nonwhitespace character (same as [^\f\n\r\t\v])

No comments: