python正则表达式的用法

[python] view plaincopyprint?
import re
r1 = re.compile(r'(?im)(?P<name></html>)$')
content = """
        <HTML>
boxsuch as 'box' and 'boxes', but not 'inbox'. In other words
box
<html>dsafdsafdas </html> </ahtml>
</html>
</HTML>
"""

reobj = re.compile("(?im)(?P<name></.*?html>)$")
for match in reobj.finditer(content):
    # match start: match.start()
    # match end (exclusive): match.end()
    # matched text: match.group()
    print "start>>", match.start()
    print "end>>", match.end()
    print "span>>", match.span()
    print "match.group()>>", match.group()

print "*"*20



if r1.match(content): print 'match succeeds'
else: print 'match fails'                          # prints: match fails

if r1.search(content): print 'search succeeds'   # prints: search succeeds
else: print 'search fails'

print r1.flags
print r1.groupindex
print r1.pattern

l = r1.split(content)
print "l>>", l

for item in r1.findall(content):
    print "item>>", item

s = r1.sub("aa", content)
print "s>>", s

s_subn, s_sub_count = r1.subn("aaaaaaaaaaaa", content)
print "s_subn>>", s_subn
print "s_sub_count>>", s_sub_count

[ Team LiB ]
9.7 Regular Expressions and the re Module
A regular expression is a string that represents a pattern. With regular expression functionality, you can compare that pattern to another string and see if any part of the string matches the pattern.
The re module supplies all of Python's regular expression functionality. The compile function builds a regular expression object from a pattern string and optional flags. The methods of a regular expression object look for matches of the regular expression in a string and/or perform substitutions. Module re also exposes functions equivalent to a regular expression's methods, but with the regular expression's pattern string as their first argument.
Regular expressions can be difficult to master, and this book does not purport to teach them桰 cover only the ways in which you can use them in Python. For general coverage of regular expressions, I recommend the book Mastering Regular Expressions, by Jeffrey Friedl (O'Reilly). Friedl's book offers thorough coverage of regular expressions at both the tutorial and advanced levels.
9.7.1 Pattern-String Syntax
The pattern string representing a regular expression follows a specific syntax:
Alphabetic and numeric characters stand for themselves. A regular expression whose pattern is a string of letters and digits matches the same string.
Many alphanumeric characters acquire special meaning in a pattern when they are preceded by a backslash (\).
Punctuation works the other way around. A punctuation character is self-matching when escaped, and has a special meaning when unescaped.
The backslash character itself is matched by a repeated backslash (i.e., the pattern \\).
Since regular expression patterns often contain backslashes, you generally want to specify them using raw-string syntax (covered in Chapter 4). Pattern elements (e.g., r'\t', which is equivalent to the non-raw string literal '\\t') do match the corresponding special characters (e.g., the tab character '\t'). Therefore, you can use raw-string syntax even when you do need a literal match for some such special character.
Table 9-2 lists the special elements in regular expression pattern syntax. The exact meanings of some pattern elements change when you use optional flags, together with the pattern string, to build the regular expression object. The optional flags are covered later in this chapter.
Table 9-2. Regular expression pattern syntax

Element
Meaning
.
Matches any character except \n (if DOTALL, also matches \n)
^
Matches start of string (if MULTILINE, also matches after \n)
$
Matches end of string (if MULTILINE, also matches before \n)
*
Matches zero or more cases of the previous regular expression; greedy (match as many as possible)
+
Matches one or more cases of the previous regular expression; greedy (match as many as possible)
?
Matches zero or one case of the previous regular expression; greedy (match one if possible)
*? , +?, ??
Non-greedy versions of *, +, and ? (match as few as possible)
{m,n}
Matches m to n cases of the previous regular expression (greedy)
{m,n}?
Matches m to n cases of the previous regular expression (non-greedy)
[...]
Matches any one of a set of characters contained within the brackets
|
Matches expression either preceding it or following it
(...)
Matches the regular expression within the parentheses and also indicates a group
(?iLmsux)
Alternate way to set optional flags; no effect on match
(?:...)
Like (...), but does not indicate a group
(?P<id>...)
Like (...), but the group also gets the name id
(?P=id)
Matches whatever was previously matched by group named id
(?#...)
Content of parentheses is just a comment; no effect on match
(?=...)
Lookahead assertion; matches if regular expression ... matches what comes next, but does not consume any part of the string
(?!...)
Negative lookahead assertion; matches if regular expression ... does not match what comes next, and does not consume any part of the string
(?<=...)
Lookbehind assertion; matches if there is a match for regular expression ... ending at the current position (... must match a fixed length)
(?<!...)
Negative lookbehind assertion; matches if there is no match for regular expression ... ending at the current position (... must match a fixed length)
\number
Matches whatever was previously matched by group numbered number (groups are automatically numbered from 1 up to 99)
\A
Matches an empty string, but only at the start of the whole string
\b
Matches an empty string, but only at the start or end of a word (a maximal sequence of alphanumeric characters; see also \w)
\B
Matches an empty string, but not at the start or end of a word
\d
Matches one digit, like the set [0-9]
\D
Matches one non-digit, like the set [^0-9]
\s
Matches a whitespace character, like the set [ \t\n\r\f\v]
\S
Matches a non-white character, like the set [^ \t\n\r\f\v]
\w
Matches one alphanumeric character; unless LOCALE or UNICODE is set, \w is like [a-zA-Z0-9_]
\W
Matches one non-alphanumeric character, the reverse of \w
\Z补充：Web开发 , Python ,