Python Regular Expression

Python re module handles regular expression. search method: return MatchObject for a match, None if no match

>>> import re
>>> x = "Perschon Python Online Tutorial"
>>> m = re.search('Python',x)
>>> if m: print("found")
...
found
match method: return MatchObject for a match at the beginning of the string
>>> m = re.match('Python',x)
>>> if m: print("found") #No match
>>> if m is None: print("no match")
...
no match
>>> m = re.match('Perschon',x)
>>> if m is None: print("no match")
...
>>> if m: print("found")
...
found
split method: split string by pattern
>>> re.split('\s',x)
['Perschon', 'Python', 'Online', 'Tutorial']
>>> re.split('o',x)
['EndMem', ' Pyth', 'n Online Tut', 'rial']
findall method: return all matches
>>> re.findall('o[a-z]',x)
['on', 'or']
sub method: replace the match with pattern
>>> re.sub('o',"XXX", x)
'EndMemXXX PythXXX Online TutXXXial'
Non greedy regular expression:
>>> x = "EndMemXXXXXXXXXXXXX PythXXXXXXn"
>>> re.sub('X{6,13}','o',x)
'Perschon Python'
>>> x = "EndMemXXXXXXXXXXXXX PythXXXXXXn"
>>> re.sub('X{6,13}?','o',x)
'PerschonoX Python'

Regular Expression Syntax:

Syntax
Description
\d
Digit, 0,1,2 ... 9
\D
Not Digit
\s
Space
\S
Not Space
\w
Word
\W
Not Word
\t
Tab
\n
New line
^
Beginning of the string
$
End of the string
\
Escape special characters, e.g. \\ is "\", \+ is "+"
|
Alternation match. e.g. /(e|d)n/ matches "en" and "dn"
Any character, except \n or line terminator
[ab]
a or b
[^ab]
Any character except a and b
[0-9]
All Digit
[A-Z]
All uppercase A to Z letters
[a-z]
All lowercase a to z letters
[A-z]
All Uppercase and lowercase a to z letters
i+
i at least one time
i*
i zero or more times
i?
i zero or 1 time
i{n}
i occurs n times in sequence
i{n1,n2}
i occurs n1 - n2 times in sequence
i{n,}
i occures >= n times
\0
NUL
\f
Form feed character
\r
Carriage return character
\v
Vertical tab
\xhhhh
Unicode with 4 characters of hexadecimal code hhhh
\xhh
Character with 2 characters of hexadecimal code hh
?=i
Lookahead matches only if i is followed
?!i
Lookahead matches only if i is not followed