Friday, July 15, 2011

RegEx Lesson 3 – Repeating Matches – Quick Guide to Regular Expressions

3.1 Matching one or more characters
To match one or more instances of a character, simply append a + character.
Eg: matching an email
T: askamith@fmail.com
RegEx: \w+@\w+\.\w+
R: askamith@fmail.com


3.2 Matching Zero or More Characters
use * metacharacter, it is placed right after a character or a set and it will match zero or more instances of the character or set.
T: Amith Ah Amh A3465654747ucrfjerghggh
RegEx: A.*h
R: Amith Ah Amh A3465654747ucrfjerghggh43566dcs

last regex match result reminds us that we have to limit the exact match. its done using (?).


3.2 Matching Zero or One Character - ?
use ? metacharacter for matching zero or one char right after a character.
T: The URL is http://www.askamith.wordpress.com/, to connect
securely use https://www.askamith.wordpress.com/ instead.
RegEx: https?://
R: The URL is http://www.askamith.wordpress.com/, to connect
Securely use https://www.askamith.wordpress.com/ instead.


3.3 Range Interval matching - {n1,n2}
T: 10-6-2004
RegEx: \d{1,2}[-\/]\d{1,2}[-\/]\d{2,4}
R: 10-6-2004


3.4 Preventing Over Matching using Lazy Quantifiers - *?
T: living in <b>SL</b> and <b>LK</b>.
RegEx: <b>.*</b>
R: living in SL and LK.
The reason for this is that metacharacters such as * and + are greedy; they look greatest possible match as opposed to the smallest. You can prevent this by using lazy versions of these quantifiers (they matches the fewest characters instead of the most).
T: living in <b>SL</b> and <b>LK</b>.
RegEx: <b>.*?</b>
R: living in SL and LK.


3.5 Match Word Boundaries
\b is used to match the start or end of a word.
T: The cat scattered his food all over the room.
RegEx: \bcat\b
R: The cat scattered his food all over the room.

No comments: