perf: skip URL round-trip when path is already safe by ichard26 · Pull Request #14045 · pypa/pip

ichard26 · 2026-06-10T00:07:43Z

_ensure_quoted_url() calls quote(unquote()) wrapped by an urlunsplit(urlsplit()) round-trip for remote URLs. This is quite slow.

The good news is that we can skip all of this safely for http(s):// URLs by quickly scanning the URL for any characters that need to be quoted and %-escapes. If there aren't any, then return the URL unmodified immediately.

In an (admittedly) unscientific test, this reduces the total wall time spent in collector.parse_links() by ~half, from 120ms to 50ms for

pip install black mypy --dry-run

as measured under Python's sampling profiler. The remaining time is spent on JSON parsing and Link instance initialization.

Flamegraphs

A new test_ensure_quoted_url_idempotent_for_clean_urls() asserts that the fast path is a true identity on the URL shapes Warehouse and common simple API responses serve.

Credit goes to Bernát Gábor who came up with this optimization.

This supersedes PR #13986. (closes #13986)

_ensure_quoted_url() calls quote(unquote()) wrapped by an urlunsplit(urlsplit()) round-trip for remote URLs. This is quite slow. The good news is that we can skip all of this safely for http(s):// URLs by quickly scanning the URL for any characters that need to be quoted and %-escapes. If there aren't any, then return the URL unmodified immediately. In an (admittedly) unscientific test, this reduces the total wall time spent in collector.parse_links() by ~half, from 120ms to 50ms for pip install black setuptools mypy --dry-run as measured under Python's sampling profiler. The remaining time is spent on JSON parsing and Link instance initialization. A new test_ensure_quoted_url_idempotent_for_clean_urls() asserts that the fast path is a true identity on the URL shapes Warehouse and common simple API responses serve. Credit goes to Bernát Gábor who came up with this optimization. Co-authored-by: Bernat Gabor <gaborjbernat@gmail.com>

sepehr-rs

Hi @ichard26, thank you for working on this!
I benchmarked this against main using real URLs captured from a live pip install black mypy --dry-run run (so the fast path coverage reflects actual PyPI traffic, not synthetic data).

Microbenchmark:
main:

------------------------------------------------- benchmark: 1 tests -------------------------------------------------
Name (time in ms)            Min       Max      Mean   StdDev    Median      IQR  Outliers     OPS  Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------
test_benchmark_urls     459.0823  486.0665  472.4067  11.9938  468.1179  21.1751       2;0  2.1168       5           1
----------------------------------------------------------------------------------------------------------------------

PR:

----------------------------------------------- benchmark: 1 tests ----------------------------------------------
Name (time in ms)           Min      Max     Mean  StdDev   Median     IQR  Outliers      OPS  Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------
test_benchmark_urls     41.4362  43.5073  42.3067  0.5526  42.4556  0.5768       7;1  23.6369      23           1
-----------------------------------------------------------------------------------------------------------------

~46x faster compared to main.

End-to-end: pip install black mypy --no-deps --dry-run with a local index serving real PyPI simple pages, 30 runs:

main:

  Time (mean ± σ):     610.0 ms ±   4.8 ms    [User: 547.1 ms, System: 63.9 ms]
  Range (min … max):   603.2 ms … 624.4 ms    30 runs

PR:

  Time (mean ± σ):     574.5 ms ±   8.4 ms    [User: 510.9 ms, System: 64.2 ms]
  Range (min … max):   565.1 ms … 599.5 ms    30 runs

−35.5 ms (6%). Ranges don't overlap so the improvement is statistically solid.

LGTM from me.

ichard26 · 2026-06-10T17:44:28Z

@sepehr-rs as a general note, it's good practice to include your benchmark commands and/or scripts when citing benchmark results. I don't know what test_benchmark_urls is measuring.

~46x faster compared to main.

This seems obviously wrong? It looks closer to 10x faster.

Also, where is 460ms coming from? Your end-to-end run on main takes only 610ms and I can promise you that URL parsing did not take 2/3 of the total runtime.

sepehr-rs · 2026-06-10T18:27:57Z

@ichard26, You're right, my apologies for the sloppy math. It's closer to 10x. I should have included the benchmark script from the start. Here's a link to it:
https://gist.github.com/sepehr-rs/17505445124a556dd8ec583e32014e2c

I used ~5,800 URLs captured from a live dry run to amplify the signal, which explains the 460ms baseline, that's not representative of a single pip install invocation.

notatallshaw · 2026-06-16T03:57:26Z

URLs are tricky and sensitive, I asked an AI to fuzz this and this is what it came back with (the code details could be tweaked but the tests look good):

The fast path can fire on empty-authority URLs. startswith("http://") doesn't guarantee a host. http:////host passes the char check and is returned unchanged, while the slow path normalizes it differently per Python version: http://host (host confusion) on 3.10/3.12, http:///host on 3.13, URLError on 3.14/3.15. Only malformed empty-authority URLs are affected; well-formed remote URLs are fine.

Fix: require a non-empty authority and an explicit path (//host/path) before the char check.

if url.startswith(("https://", "http://")):
    after_scheme = url.removeprefix("https:").removeprefix("http:")  # "//..."
    if after_scheme[2:].find("/") > 0 and _UNSAFE_URL_PATH_CHARS_RE.search(after_scheme) is None:
        return url

Diff-tested this against the slow path across 3.9-3.15 against ~29k fuzzed URLs per version, this has zero divergences and fast-paths 100% of PyPI file URLs.

Tests to add. Should take the fast path (returned unchanged):

https://files.pythonhosted.org/packages/12/34/pkg-1.2.3-py3-none-any.whl
https://ex.com/~user/dists/Foo-1.0.tar.gz
http://localhost/simple/foo/
https://a.b.c/x/y/z/deep_path-1.0~rc1/file.whl

Must not take the fast path (current #/:/? cases plus the empty-authority gap):

http:////evil.com          empty authority
https:////                 empty authority
http:///x                  empty authority
https://h                  no path
http://localhost:8181/simple/foo/   port
https://u@h/p              userinfo
https://h/a b              space
https://h/p%2Fq            percent-escape
https://h/p?q=1            query
https://h/p#sha256=x       fragment
https://héllo/p            non-ascii
https://h/\tx              control char
https://h\p                backslash

ichard26 added the type: performance Commands take too long to run label Jun 10, 2026

psf-chronographer Bot added the bot:chronographer:provided label Jun 10, 2026

ichard26 mentioned this pull request Jun 10, 2026

⚡ perf(link): skip URL roundtrip when path is already safe #13986

Closed

sepehr-rs approved these changes Jun 10, 2026

View reviewed changes

ichard26 mentioned this pull request Jun 13, 2026

⚡ perf(link): skip fragment regex when fragment marker absent #13987

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: skip URL round-trip when path is already safe#14045

perf: skip URL round-trip when path is already safe#14045
ichard26 wants to merge 1 commit into
pypa:mainfrom
ichard26:perf/ensure-quoted-url-fast-path

ichard26 commented Jun 10, 2026 •

edited

Loading

Uh oh!

sepehr-rs left a comment

Uh oh!

ichard26 commented Jun 10, 2026 •

edited

Loading

Uh oh!

sepehr-rs commented Jun 10, 2026

Uh oh!

notatallshaw commented Jun 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ichard26 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sepehr-rs left a comment

Choose a reason for hiding this comment

Uh oh!

ichard26 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sepehr-rs commented Jun 10, 2026

Uh oh!

notatallshaw commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ichard26 commented Jun 10, 2026 •

edited

Loading

ichard26 commented Jun 10, 2026 •

edited

Loading

notatallshaw commented Jun 16, 2026 •

edited

Loading