Treat include_str! like it produces a raw string.#143077
Treat include_str! like it produces a raw string.#143077nnethercote wants to merge 1 commit intorust-lang:masterfrom
include_str! like it produces a raw string.#143077Conversation
There are two reasons to do this: - It just makes sense. The contents of an included file is just like the contents of a raw string literal, because string escapes like `\"` and `\x61` don't get special treatment. - We can avoid escaping it when putting it into `token::Lit::StrRaw`, unlike `token::Lit::Str`. On a tiny test program that included an 80 MiB file, this reduced compile time from 2.2s to 1.0s. The change is detectable from proc macros that use `to_string` on tokens, as the change to the `expand-expr.rs` test indicates. But this kind of change is allowable, and it seems very unlikely to cause problems in practice.
|
If we introduce |
Actually, that doesn't work. Because |
| pub fn expr_str_raw(&self, span: Span, s: Symbol) -> P<ast::Expr> { | ||
| let lit = token::Lit::new(token::StrRaw(0), s, None); | ||
| self.expr(span, ast::ExprKind::Lit(lit)) | ||
| } |
There was a problem hiding this comment.
I can only assume 0 means "with 0 #s". But that would produce an invalid token, right? I.e. if the file being included contains ", this would break. Moreover for any number of #s you can construct a file which would break it ("####...)
There was a problem hiding this comment.
fn desugar_doc_comments has some logic that counts how many hashes need to be added to keep the literal in #[doc = r"my arbitrary string from a sugared doc comment"] well formed.
There was a problem hiding this comment.
Ah, forgot to mention, for doc comments the hash counter can overflow too.
|
Technically, token kind changes are user-observable. macro_rules! expect_nonraw {
("a") => {}
}
macro_rules! expect_raw {
(r"a") => {}
}
expect_nonraw!("a");
expect_nonraw!(r"a"); // ERROR no rules expected `r"a"`
expect_raw!(r"a");
expect_raw!("a"); // ERROR no rules expected `"a"`
fn main() {} |
|
Apparently my memory is failing me. |
|
|
|
Neither normal nor raw strings really suit perfectly for representing included strings and doc comments. I'd rather introduce a new literal kind "undelimited raw string" for all this stuff, if not the concerns about token matching and compatibility (#143077 (comment)). If we discern the undelimited literals from regular raw strings (with |
We can perhaps crater a change like this. |
|
This is clearly more complicated than I realised. The alternative suggestions might be worthwhile, but they don't have to happen in this PR. |
There are two reasons to do this:
It just makes sense. The contents of an included file is just like the contents of a raw string literal, because string escapes like
\"and\x61don't get special treatment.We can avoid escaping it when putting it into
token::Lit::StrRaw, unliketoken::Lit::Str. On a tiny test program that included an 80 MiB file, this reduced compile time from 2.2s to 1.0s.The change is detectable from proc macros that use
to_stringon tokens, as the change to theexpand-expr.rstest indicates. But this kind of change is allowable, and it seems very unlikely to cause problems in practice.r? @petrochenkov