Description
PCRE patterns containing Unicode property escapes (\p{L}, \p{N}, \p{Lu}, \p{Ll}, \p{...} etc.) with the /u modifier do not work correctly. The pcre-to-POSIX translator appears to ignore or strip these escapes, leading to incorrect match results and replacement behavior.
Reproduction
echo preg_match('/\p{L}+/u', '日本語123'); // Expected: 1
echo preg_replace('/\p{N}+/u', 'X', 'abc123def456'); // Expected: abcXdefX
Expected behavior (PHP 8.4)
preg_match returns 1 (the Japanese characters match \p{L}+)
preg_replace returns abcXdefX
Actual behavior (elephc)
preg_match returns 0
preg_replace returns abc123def456 (no replacement performed)
Environment
Possible root cause
The PCRE compatibility layer (src/codegen/runtime/system/pcre_to_posix.rs and related files such as preg_match.rs, preg_replace.rs) does not implement translation for \p{...} Unicode property escapes.
When the translator encounters these sequences it likely either drops them or treats the backslash literally, breaking the intended regex semantics.
Additional context
Found during Round 2 stress testing focused on complex preg_* usage.
This is a PHP compatibility regression for anyone using modern Unicode-aware regular expressions.
Suggested labels: bug, php-compatibility, regex, runtime
Description
PCRE patterns containing Unicode property escapes (
\p{L},\p{N},\p{Lu},\p{Ll},\p{...}etc.) with the/umodifier do not work correctly. The pcre-to-POSIX translator appears to ignore or strip these escapes, leading to incorrect match results and replacement behavior.Reproduction
Expected behavior (PHP 8.4)
preg_matchreturns1(the Japanese characters match\p{L}+)preg_replacereturnsabcXdefXActual behavior (elephc)
preg_matchreturns0preg_replacereturnsabc123def456(no replacement performed)Environment
Possible root cause
The PCRE compatibility layer (
src/codegen/runtime/system/pcre_to_posix.rsand related files such aspreg_match.rs,preg_replace.rs) does not implement translation for\p{...}Unicode property escapes.When the translator encounters these sequences it likely either drops them or treats the backslash literally, breaking the intended regex semantics.
Additional context
Found during Round 2 stress testing focused on complex
preg_*usage.This is a PHP compatibility regression for anyone using modern Unicode-aware regular expressions.
Suggested labels:
bug,php-compatibility,regex,runtime