Skip to content

\c\\ and \c\d replacement escapes mis-handled #403

@sylvestre

Description

@sylvestre

Bug

The \cX control-character escape in the s/// replacement is
mis-handled in two ways:

  1. \c\\ (escape control-char of \, which is \x1c) is mis-parsed
    as a substitute terminator and triggers an error instead of
    producing the control character.
  2. \c\d (a backslash escape immediately after \c) is silently
    accepted as d instead of being rejected as "recursive escaping
    after \c not allowed".

Simple cases like \cA (→ \x01) work correctly.

Reproduction

$ echo a | /usr/bin/sed 's/./\cA/' | od -c | head -1
0000000 001  \n         # \cA -> \x01 (Ctrl-A)

$ echo a | ./target/release/sed 's/./\cA/' | od -c | head -1
0000000 001  \n         # OK
$ echo a | /usr/bin/sed 's/./\c\\/' | od -c | head -1
0000000 034  \n         # \c\ -> \x1c (Ctrl-\)

$ echo a | ./target/release/sed 's/./\c\\/'
sed: :0:10: error: ...
                        # parser bails before producing the char
$ echo a | /usr/bin/sed '1s/./\c\d/'
sed: -e expression #1, char 10: recursive escaping after \c not allowed

$ echo a | ./target/release/sed '1s/./\c\d/'
d                       # silently produced 'd', no error

What it should do

GNU semantics for \cX in a replacement string:

  • The character following \c is taken raw (no further escape
    processing). Its low 5 bits are taken, XOR'd with 0x40, producing
    the control char. So \cA\x01, \c\\x1c, etc.
  • A backslash escape (\X) is not allowed after \c. GNU
    rejects this with the wording above. (See
    gnu.sed/testsuite/recursive-escape-c.sh for the exact error
    string.)

Suspected place to add it

The \c handling is part of the replacement parser in
src/sed/compiler.rs:649 (compile_replacement). The fix is to
look at the literal next char (don't recurse into escape parsing) and
emit (next & 0x1f) ^ 0x40. If the next char is \, accept it as
the literal \ (giving \x1c); if the char after \ is anything
other than \ itself, raise the "recursive escaping" error.

Affected GNU testsuite tests

recursive-escape-c directly. Also relevant for normalize-text
which exercises \c in y strings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions