Skip to content

surrogate pair and lone surrogate support in stringLiteral #1701

@hardfist

Description

@hardfist

Steps to reproduce

for following code tsgo and typescript generate differents token text

"🦀\ud7ff\ud800\ud801\uD83E\uDD80"

It seems tsgo using go string to store codePoint(from JS string),

func (f *NodeFactory) NewStringLiteral(text string) *Node {

but JS string is not strict UTF16 string which may contain lone surrogate while go string will convert lone surrogate to U+FFFD which is a lossy conversion and lose the origin info

Behavior with typescript@5.8

🦀\ud7ff\ud800\ud801\uD83E\uDD80

https://ts-ast-viewer.com/#code/ESPg3AG7A6CuAmDsAzB0YA4AM6UYIzQCKoDMAogYesEA

Behavior with tsgo

🦀퟿����

https://rslint.rs/playground/?tab=ast&code=%22%F0%9F%A6%80%5Cud7ff%5Cud800%5Cud801%5CuD83E%5CuDD80%22

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions