Bug
The \cX control-character escape in the s/// replacement is
mis-handled in two ways:
\c\\ (escape control-char of \, which is \x1c) is mis-parsed
as a substitute terminator and triggers an error instead of
producing the control character.
\c\d (a backslash escape immediately after \c) is silently
accepted as d instead of being rejected as "recursive escaping
after \c not allowed".
Simple cases like \cA (→ \x01) work correctly.
Reproduction
$ echo a | /usr/bin/sed 's/./\cA/' | od -c | head -1
0000000 001 \n # \cA -> \x01 (Ctrl-A)
$ echo a | ./target/release/sed 's/./\cA/' | od -c | head -1
0000000 001 \n # OK
$ echo a | /usr/bin/sed 's/./\c\\/' | od -c | head -1
0000000 034 \n # \c\ -> \x1c (Ctrl-\)
$ echo a | ./target/release/sed 's/./\c\\/'
sed: :0:10: error: ...
# parser bails before producing the char
$ echo a | /usr/bin/sed '1s/./\c\d/'
sed: -e expression #1, char 10: recursive escaping after \c not allowed
$ echo a | ./target/release/sed '1s/./\c\d/'
d # silently produced 'd', no error
What it should do
GNU semantics for \cX in a replacement string:
- The character following
\c is taken raw (no further escape
processing). Its low 5 bits are taken, XOR'd with 0x40, producing
the control char. So \cA → \x01, \c\ → \x1c, etc.
- A backslash escape (
\X) is not allowed after \c. GNU
rejects this with the wording above. (See
gnu.sed/testsuite/recursive-escape-c.sh for the exact error
string.)
Suspected place to add it
The \c handling is part of the replacement parser in
src/sed/compiler.rs:649 (compile_replacement). The fix is to
look at the literal next char (don't recurse into escape parsing) and
emit (next & 0x1f) ^ 0x40. If the next char is \, accept it as
the literal \ (giving \x1c); if the char after \ is anything
other than \ itself, raise the "recursive escaping" error.
Affected GNU testsuite tests
recursive-escape-c directly. Also relevant for normalize-text
which exercises \c in y strings.
Bug
The
\cXcontrol-character escape in thes///replacement ismis-handled in two ways:
\c\\(escape control-char of\, which is\x1c) is mis-parsedas a substitute terminator and triggers an error instead of
producing the control character.
\c\d(a backslash escape immediately after\c) is silentlyaccepted as
dinstead of being rejected as "recursive escapingafter
\cnot allowed".Simple cases like
\cA(→\x01) work correctly.Reproduction
What it should do
GNU semantics for
\cXin a replacement string:\cis taken raw (no further escapeprocessing). Its low 5 bits are taken, XOR'd with
0x40, producingthe control char. So
\cA→\x01,\c\→\x1c, etc.\X) is not allowed after\c. GNUrejects this with the wording above. (See
gnu.sed/testsuite/recursive-escape-c.shfor the exact errorstring.)
Suspected place to add it
The
\chandling is part of the replacement parser insrc/sed/compiler.rs:649(compile_replacement). The fix is tolook at the literal next char (don't recurse into escape parsing) and
emit
(next & 0x1f) ^ 0x40. If the next char is\, accept it asthe literal
\(giving\x1c); if the char after\is anythingother than
\itself, raise the "recursive escaping" error.Affected GNU testsuite tests
recursive-escape-cdirectly. Also relevant fornormalize-textwhich exercises
\cinystrings.