Skip to content

Allow parsing source code directly#54

Merged
ilevkivskyi merged 8 commits into
mypyc:mainfrom
bzoracler:parse-source
May 17, 2026
Merged

Allow parsing source code directly#54
ilevkivskyi merged 8 commits into
mypyc:mainfrom
bzoracler:parse-source

Conversation

@bzoracler
Copy link
Copy Markdown
Contributor

@bzoracler bzoracler commented Apr 17, 2026

Resolves #21

Tests are part of python/mypy#21260

@ilevkivskyi
Copy link
Copy Markdown
Collaborator

Thanks for the PR! Me or Jukka will try to take a look next week. In the meantime, could you please resolve the merge conflict?

Comment thread src/serialize_ast.rs Outdated
Comment on lines +280 to +287
match pyo3::exceptions::PyUnicodeDecodeError::new_utf8(
obj.py(),
e.as_bytes(),
utf8_err,
) {
Ok(err) => Self::Error::from_value(err.into_any()),
Err(err) => err,
}
Copy link
Copy Markdown
Contributor Author

@bzoracler bzoracler Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation taken from https://pyo3.rs/main/doc/src/pyo3/exceptions.rs#802-811 (PR https://github.com/PyO3/pyo3/pull/5668/changes), because new_err_from_utf8 is only in pyo3 0.29 which is not released yet.

The test is here: python/mypy@ac275e4

Comment thread src/serialize_ast.rs Outdated
)> {
serialize_module(
source,
PySourceType::Python,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've hard-coded Python source type here, because mypy parsing functions never previously exposed an option to let the user treat source code directly as a .pyi stub source.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to use the same logic as for serialize_module().

@bzoracler
Copy link
Copy Markdown
Contributor Author

Hmm...there might be some unnecessary allocations that are preventable. I'm going to do some benchmarking with another implementation.

@bzoracler bzoracler marked this pull request as draft April 28, 2026 08:54
@bzoracler bzoracler marked this pull request as ready for review April 29, 2026 06:46
@bzoracler
Copy link
Copy Markdown
Contributor Author

bzoracler commented Apr 29, 2026

If we're willing to use &str in downstream functions instead of String (which IMO is fine), 30bd655 removes unnecessary allocations when working with Python builtins.bytes passed as source code. Microbenchmarks on a test machine (Intel Core i7-10750H CPU @ 2.60GHz, x86_64, 12 cores) show ~45% reduction in allocation during type conversion from PyBytes to &str (tested with 100 MB source code), and the type conversion itself runs about 6x faster (although execution speed is dominated by parsing, so this speedup doesn't matter as much).

Performance of builtins.str passed as source code does not seem to be improvable unless stable-abi is increased to ["pyo3/abi3-py310"], after which we can use this extraction function. I tried the same microbenchmarks after increasing the stable-abi, got similar allocation reduction and a huge speedup in type conversion (thousands of times faster, but again the execution time is dominated by parsing so doesn't really matter here).

Copy link
Copy Markdown
Collaborator

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but will wait for Ivan or Jukka.

Mypy has dropped support for Python 3.9, so I would also take a PR dropping 3.9 support and increasing the stable ABI

Copy link
Copy Markdown
Collaborator

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, but I have couple comments about respecting the file name when parsing source. It is better be consistent in case mypy (or other user) will decide to open/read file manually for some reason and pass the source. They should get identical result as when passing the file name.

Comment thread src/serialize_ast.rs Outdated
PySourceType::Python,
skip_function_bodies,
options,
false,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be false, we should always respect the file name provided.

Comment thread src/serialize_ast.rs Outdated
)> {
serialize_module(
source,
PySourceType::Python,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to use the same logic as for serialize_module().

@bzoracler
Copy link
Copy Markdown
Contributor Author

@ilevkivskyi Whoops, I thought providing source contents was only used via mypy --command TEXT, which was why I split the handling between path-based and source contents. I've re-unified this now (and diff is now much smaller).

@bzoracler bzoracler requested a review from ilevkivskyi May 17, 2026 06:27
Copy link
Copy Markdown
Collaborator

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this looks good as a good starting point. I think we will need some more things to make native parser a full replacement, but it is probably better to move in small increments. I will merge this now, release a new version, then we can continue discussion in the mypy PR.

@ilevkivskyi ilevkivskyi merged commit 87b7843 into mypyc:main May 17, 2026
20 checks passed
@bzoracler bzoracler deleted the parse-source branch May 19, 2026 05:56
ilevkivskyi added a commit to python/mypy that referenced this pull request May 21, 2026
This is the mypy counterpart of
mypyc/ast_serialize#54

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ivan Levkivskyi <levkivskyi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support parsing strings in addition to files

3 participants