11.. _regex-howto :
22
33****************************
4- Regular Expression HOWTO
4+ Regular expression HOWTO
55****************************
66
77:Author: A.M. Kuchling <amk@amk.ca>
@@ -47,7 +47,7 @@ Python code to do the processing; while Python code will be slower than an
4747elaborate regular expression, it will also probably be more understandable.
4848
4949
50- Simple Patterns
50+ Simple patterns
5151===============
5252
5353We'll start by learning about the simplest possible regular expressions. Since
@@ -59,7 +59,7 @@ expressions (deterministic and non-deterministic finite automata), you can refer
5959to almost any textbook on writing compilers.
6060
6161
62- Matching Characters
62+ Matching characters
6363-------------------
6464
6565Most letters and characters will simply match themselves. For example, the
@@ -159,7 +159,7 @@ match even a newline. ``.`` is often used where you want to match "any
159159character".
160160
161161
162- Repeating Things
162+ Repeating things
163163----------------
164164
165165Being able to match varying sets of characters is the first thing regular
@@ -210,7 +210,7 @@ this RE against the string ``'abcbd'``.
210210| | | ``[bcd]* `` is only matching |
211211| | | ``bc ``. |
212212+------+-----------+---------------------------------+
213- | 6 | ``abcb `` | Try ``b `` again. This time |
213+ | 7 | ``abcb `` | Try ``b `` again. This time |
214214| | | the character at the |
215215| | | current position is ``'b' ``, so |
216216| | | it succeeds. |
@@ -255,7 +255,7 @@ is equivalent to ``+``, and ``{0,1}`` is the same as ``?``. It's better to use
255255to read.
256256
257257
258- Using Regular Expressions
258+ Using regular expressions
259259=========================
260260
261261Now that we've looked at some simple regular expressions, how do we actually use
@@ -264,7 +264,7 @@ expression engine, allowing you to compile REs into objects and then perform
264264matches with them.
265265
266266
267- Compiling Regular Expressions
267+ Compiling regular expressions
268268-----------------------------
269269
270270Regular expressions are compiled into pattern objects, which have
@@ -295,7 +295,7 @@ disadvantage which is the topic of the next section.
295295
296296.. _the-backslash-plague :
297297
298- The Backslash Plague
298+ The backslash plague
299299--------------------
300300
301301As stated earlier, regular expressions use the backslash character (``'\' ``) to
@@ -335,7 +335,7 @@ expressions will often be written in Python code using this raw string notation.
335335
336336In addition, special escape sequences that are valid in regular expressions,
337337but not valid as Python string literals, now result in a
338- :exc: `DeprecationWarning ` and will eventually become a :exc: `SyntaxError `,
338+ :exc: `SyntaxWarning ` and will eventually become a :exc: `SyntaxError `,
339339which means the sequences will be invalid if raw string notation or escaping
340340the backslashes isn't used.
341341
@@ -351,7 +351,7 @@ the backslashes isn't used.
351351+-------------------+------------------+
352352
353353
354- Performing Matches
354+ Performing matches
355355------------------
356356
357357Once you have an object representing a compiled regular expression, what do you
@@ -369,10 +369,10 @@ for a complete listing.
369369| | location where this RE matches. |
370370+------------------+-----------------------------------------------+
371371| ``findall() `` | Find all substrings where the RE matches, and |
372- | | returns them as a list. |
372+ | | return them as a list. |
373373+------------------+-----------------------------------------------+
374374| ``finditer() `` | Find all substrings where the RE matches, and |
375- | | returns them as an :term: `iterator `. |
375+ | | return them as an :term: `iterator `. |
376376+------------------+-----------------------------------------------+
377377
378378:meth: `~re.Pattern.match ` and :meth: `~re.Pattern.search ` return ``None `` if no match can be found. If
@@ -473,7 +473,7 @@ Two pattern methods return all of the matches for a pattern.
473473The ``r `` prefix, making the literal a raw string literal, is needed in this
474474example because escape sequences in a normal "cooked" string literal that are
475475not recognized by Python, as opposed to regular expressions, now result in a
476- :exc: `DeprecationWarning ` and will eventually become a :exc: `SyntaxError `. See
476+ :exc: `SyntaxWarning ` and will eventually become a :exc: `SyntaxError `. See
477477:ref: `the-backslash-plague `.
478478
479479:meth: `~re.Pattern.findall ` has to create the entire list before it can be returned as the
@@ -491,7 +491,7 @@ result. The :meth:`~re.Pattern.finditer` method returns a sequence of
491491 (29, 31)
492492
493493
494- Module-Level Functions
494+ Module-level functions
495495----------------------
496496
497497You don't have to create a pattern object and call its methods; the
@@ -518,7 +518,7 @@ Outside of loops, there's not much difference thanks to the internal
518518cache.
519519
520520
521- Compilation Flags
521+ Compilation flags
522522-----------------
523523
524524.. currentmodule :: re
@@ -642,7 +642,7 @@ of each one.
642642 whitespace is in a character class or preceded by an unescaped backslash; this
643643 lets you organize and indent the RE more clearly. This flag also lets you put
644644 comments within a RE that will be ignored by the engine; comments are marked by
645- a ``'#' `` that's neither in a character class or preceded by an unescaped
645+ a ``'#' `` that's neither in a character class nor preceded by an unescaped
646646 backslash.
647647
648648 For example, here's a RE that uses :const: `re.VERBOSE `; see how much easier it
@@ -669,7 +669,7 @@ of each one.
669669 to understand than the version using :const: `re.VERBOSE `.
670670
671671
672- More Pattern Power
672+ More pattern power
673673==================
674674
675675So far we've only covered a part of the features of regular expressions. In
@@ -679,7 +679,7 @@ retrieve portions of the text that was matched.
679679
680680.. _more-metacharacters :
681681
682- More Metacharacters
682+ More metacharacters
683683-------------------
684684
685685There are some metacharacters that we haven't covered yet. Most of them will be
@@ -872,7 +872,7 @@ Backreferences like this aren't often useful for just searching through a string
872872find out that they're *very * useful when performing string substitutions.
873873
874874
875- Non-capturing and Named Groups
875+ Non-capturing and named groups
876876------------------------------
877877
878878Elaborate REs may use many groups, both to capture substrings of interest, and
@@ -976,7 +976,7 @@ current point. The regular expression for finding doubled words,
976976 'the the'
977977
978978
979- Lookahead Assertions
979+ Lookahead assertions
980980--------------------
981981
982982Another zero-width assertion is the lookahead assertion. Lookahead assertions
@@ -1058,7 +1058,7 @@ end in either ``bat`` or ``exe``:
10581058``.*[.](?!bat$|exe$)[^.]*$ ``
10591059
10601060
1061- Modifying Strings
1061+ Modifying strings
10621062=================
10631063
10641064Up to this point, we've simply performed searches against a static string.
@@ -1080,7 +1080,7 @@ using the following pattern methods:
10801080+------------------+-----------------------------------------------+
10811081
10821082
1083- Splitting Strings
1083+ Splitting strings
10841084-----------------
10851085
10861086The :meth: `~re.Pattern.split ` method of a pattern splits a string apart
@@ -1134,7 +1134,7 @@ argument, but is otherwise the same. ::
11341134 ['Words', 'words, words.']
11351135
11361136
1137- Search and Replace
1137+ Search and replace
11381138------------------
11391139
11401140Another common task is to find all the matches for a pattern, and replace them
@@ -1233,15 +1233,15 @@ pattern object as the first parameter, or use embedded modifiers in the
12331233pattern string, e.g. ``sub("(?i)b+", "x", "bbbb BBBB") `` returns ``'x x' ``.
12341234
12351235
1236- Common Problems
1236+ Common problems
12371237===============
12381238
12391239Regular expressions are a powerful tool for some applications, but in some ways
12401240their behaviour isn't intuitive and at times they don't behave the way you may
12411241expect them to. This section will point out some of the most common pitfalls.
12421242
12431243
1244- Use String Methods
1244+ Use string methods
12451245------------------
12461246
12471247Sometimes using the :mod: `re ` module is a mistake. If you're matching a fixed
@@ -1307,7 +1307,7 @@ string and then backtracking to find a match for the rest of the RE. Use
13071307:func: `re.search ` instead.
13081308
13091309
1310- Greedy versus Non-Greedy
1310+ Greedy versus non-greedy
13111311------------------------
13121312
13131313When repeating a regular expression, as in ``a* ``, the resulting action is to
@@ -1385,9 +1385,9 @@ Feedback
13851385========
13861386
13871387Regular expressions are a complicated topic. Did this document help you
1388- understand them? Were there parts that were unclear, or Problems you
1388+ understand them? Were there parts that were unclear, or problems you
13891389encountered that weren't covered here? If so, please send suggestions for
1390- improvements to the author .
1390+ improvements to the :ref: ` issue tracker < using-the-tracker >` .
13911391
13921392The most complete book on regular expressions is almost certainly Jeffrey
13931393Friedl's Mastering Regular Expressions, published by O'Reilly. Unfortunately,
0 commit comments