@@ -1064,34 +1064,19 @@ Functions
10641064
10651065 Return the string obtained by replacing the leftmost non-overlapping occurrences
10661066 of *pattern * in *string * by the replacement *repl *. If the pattern isn't found,
1067- *string * is returned unchanged. *repl * can be a string or a function; if it is
1068- a string, any backslash escapes in it are processed. That is, ``\n `` is
1069- converted to a single newline character, ``\r `` is converted to a carriage return, and
1070- so forth. Unknown escapes of ASCII letters are reserved for future use and
1071- treated as errors. Other unknown escapes such as ``\& `` are left alone.
1072- Backreferences, such
1073- as ``\6 ``, are replaced with the substring matched by group 6 in the pattern.
1074- For example::
1075-
1076- >>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',
1077- ... r'static PyObject*\npy_\1(void)\n{',
1078- ... 'def myfunc():')
1079- 'static PyObject*\npy_myfunc(void)\n{'
1080-
1081- If *repl * is a function, it is called for every non-overlapping occurrence of
1082- *pattern *. The function takes a single :class: `~re.Match ` argument, and returns
1083- the replacement string. For example::
1067+ *string * is returned unchanged.
1068+ The pattern may be a string or a :class: `~re.Pattern `.
1069+ A string pattern's behaviour may be modified by specifying a *flags * value,
1070+ which can be any of the `flags `_ variables, combined using bitwise OR
1071+ (the ``| `` operator).
10841072
1085- >>> def dashrepl(matchobj):
1086- ... if matchobj.group(0) == '-': return ' '
1087- ... else: return '-'
1088- ...
1089- >>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
1090- 'pro--gram files'
1091- >>> re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE)
1092- 'Baked Beans & Spam'
1073+ >>> re.sub(r ' ( and) ' , r ' * \1 * ' , ' Contraband Andalusian Beans AND Spam' ,
1074+ ... flags= re.IGNORECASE )
1075+ 'Contrab*and* *And*alusian Beans *AND* Spam'
10931076
1094- The pattern may be a string or a :class: `~re.Pattern `.
1077+ >>> pattern = re.compile(r ' ( and) ' , flags = re.IGNORECASE )
1078+ >>> re.sub(pattern, r ' * \1 * ' , ' Contraband Andalusian Beans AND Spam' )
1079+ 'Contrab*and* *And*alusian Beans *AND* Spam'
10951080
10961081 The optional argument *count * is the maximum number of pattern occurrences to be
10971082 replaced; *count * must be a non-negative integer. If omitted or zero, all
@@ -1102,21 +1087,51 @@ Functions
11021087 As a result, ``sub('x*', '-', 'abxd') `` returns ``'-a-b--d-' ``
11031088 instead of ``'-a-b-d-' ``.
11041089
1105- .. index :: single: \g; in regular expressions
1106-
1107- In string-type *repl * arguments, in addition to the character escapes and
1108- backreferences described above,
1109- ``\g<name> `` will use the substring matched by the group named ``name ``, as
1110- defined by the ``(?P<name>...) `` syntax. ``\g<number> `` uses the corresponding
1111- group number; ``\g<2> `` is therefore equivalent to ``\2 ``, but isn't ambiguous
1112- in a replacement such as ``\g<2>0 ``. ``\20 `` would be interpreted as a
1113- reference to group 20, not a reference to group 2 followed by the literal
1114- character ``'0' ``. The backreference ``\g<0> `` substitutes in the entire
1115- substring matched by the RE.
1116-
1117- The expression's behaviour can be modified by specifying a *flags * value.
1118- Values can be any of the `flags `_ variables, combined using bitwise OR
1119- (the ``| `` operator).
1090+ *repl * can be a string template or a function:
1091+
1092+ * If it is callable, it is called for every non-overlapping occurrence of
1093+ *pattern *. The function takes a single :class: `~re.Match ` argument, and
1094+ returns the replacement string. For example::
1095+
1096+ >>> def dashrepl(matchobj):
1097+ ... if matchobj.group(0) == '-': return ' '
1098+ ... else: return '-'
1099+ ...
1100+ >>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
1101+ 'pro--gram files'
1102+
1103+ * If *repl * is a string, it's processed as a template based on backslash escapes:
1104+
1105+ .. index :: single: \g; in regular expressions
1106+
1107+ - ``\1 `` .. ``\99 `` are replaced by the substring matched by corresponding
1108+ ``(...) `` groups in the pattern.
1109+ - However other ``\numbers `` get interpretted as *octal * character literals.
1110+ - ``\g<name> `` are replaced by the substring matched by named ``(?P<name>...) ``
1111+ groups.
1112+ - ``\g<number> `` is another way to refer to numbered groups.
1113+ ``\g<2>0 `` inserts group 2 followed by the literal character ``'0' ``,
1114+ whereas ``\20 `` can only express a reference to group 20. ``\g<100> `` etc.
1115+ can refer to groups higher than 99, and the backreference ``\g<0> ``
1116+ substitutes in the entire substring matched by the RE.
1117+ - ``\\ `` is converted to a single backslash.
1118+ - Basic escapes ``\n\r\t\v\f\a\b `` work like in Python string literals.
1119+ That is, ``\n `` is converted to a single newline character, and so forth.
1120+ - Unknown escapes of ASCII letters are reserved for future use and
1121+ treated as errors. This includes ``\x.. ``, ``\u... ``, ``\U... `` and
1122+ ``\N{...} `` which are not presently supported.
1123+ - Other unknown escapes such as ``\& `` are left alone.
1124+
1125+ For example::
1126+
1127+ >>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',
1128+ ... r'static PyObject*\npy_\1(void)\n{',
1129+ ... 'def myfunc():')
1130+ 'static PyObject*\npy_myfunc(void)\n{'
1131+
1132+ (Note the use of raw string notation for *repl * as well. Otherwise you'd have
1133+ to write ``'\\1' `` for Python to parse it into ``\1 `` to be replaced by
1134+ ``myfunc `` at substitution time...)
11201135
11211136 .. versionchanged :: 3.1
11221137 Added the optional flags argument.
0 commit comments