Skip to content

Conversation

@hugovk
Copy link
Member

@hugovk hugovk commented Jan 13, 2026

This is the same as python/cpython#143660.


We can apply @henryiii's improvement to packaging in pypa/packaging#1030 (see also https://iscinumpy.dev/post/packaging-faster/) to improve the performance of normalize and make it ~3.7 times faster.

Benchmark

Run Prepared.normalize(n) on every name in PyPI:

# benchmark_names.py
import sqlite3
import timeit
from importlib_metadata import Prepared

# Get data with:
# curl -L https://github.com/pypi-data/pypi-json-data/releases/download/latest/pypi-data.sqlite.gz | gzip -d > pypi-data.sqlite
# Or ues pre-cached files from:
# https://gist.github.com/hugovk/efdbee0620cc64df7b405b52cf0b6e42

CACHE_FILE = "/tmp/bench/names.txt"
DB_FILE = "/tmp/bench/pypi-data.sqlite"

try:
    with open(CACHE_FILE) as f:
        TEST_ALL_NAMES = [line.rstrip("\n") for line in f]
except FileNotFoundError:
    TEST_ALL_NAMES = []
    with sqlite3.connect(DB_FILE) as conn:
        with open(CACHE_FILE, "w") as cache:
            for (name,) in conn.execute("SELECT name FROM projects"):
                if name:
                    TEST_ALL_NAMES.append(name)
                    cache.write(name + "\n")


def bench():
    for n in TEST_ALL_NAMES:
        Prepared.normalize(n)


if __name__ == "__main__":
    print(f"Loaded {len(TEST_ALL_NAMES):,} names")
    t = timeit.timeit("bench()", globals=globals(), number=1)
    print(f"Time: {t:.4f} seconds")

Benchmark data can be found at https://gist.github.com/hugovk/efdbee0620cc64df7b405b52cf0b6e42

Before

python3.14 --version
Python 3.14.2python3.14 benchmark_names.py
Loaded 8,344,947 names
Time: 4.8933 seconds

After

python3.14 benchmark_names.py
Loaded 8,344,947 names
Time: 1.3266 seconds

3.7 times faster.

@hugovk
Copy link
Member Author

hugovk commented Jan 19, 2026

Following on from pypa/packaging#1064, Python 3.12 and 3.13 are slower, even though 3.10, 3.11 and 3.14 are much faster.

Measured with hyperfine on python.org versions on macOS:

Python main (s) PR (s) Result
3.10 6.632 ± 0.515 3.456 ± 0.122 PR 1.92x faster
3.11 6.331 ± 0.541 3.302 ± 0.031 PR 1.92x faster
3.12 5.690 ± 0.183 6.640 ± 0.069 main 1.17x faster
3.13 5.784 ± 0.109 5.957 ± 0.074 main 1.03x faster
3.14 5.749 ± 0.060 2.163 ± 0.125 PR 2.66x faster

Marking as draft for now.

Details
hyperfine --warmup 1 -r 3 \
    -n main --prepare 'git checkout main' 'python3.10 benchmark_names.py' \
    -n PR --prepare 'git checkout speedup-normalize -q' 'python3.10 benchmark_names.py'
Benchmark 1: main
  Time (mean ± σ):      6.632 s ±  0.515 s    [User: 6.138 s, System: 0.359 s]
  Range (min … max):    6.237 s …  7.214 s    3 runs

Benchmark 2: PR
  Time (mean ± σ):      3.456 s ±  0.122 s    [User: 3.257 s, System: 0.169 s]
  Range (min … max):    3.373 s …  3.596 s    3 runs

Summary
  PR ran
    1.92 ± 0.16 times faster than main

importlib_metadata on  speedup-normalize [?] via 🐍 v3.14.2 via 💎 v3.1.3 took 42shyperfine --warmup 1 -r 3 \
    -n main --prepare 'git checkout main' 'python3.11 benchmark_names.py' \
    -n PR --prepare 'git checkout speedup-normalize -q' 'python3.11 benchmark_names.py'
Benchmark 1: main
  Time (mean ± σ):      6.331 s ±  0.541 s    [User: 5.415 s, System: 0.420 s]
  Range (min … max):    5.859 s …  6.921 s    3 runs

Benchmark 2: PR
  Time (mean ± σ):      3.302 s ±  0.031 s    [User: 3.023 s, System: 0.209 s]
  Range (min … max):    3.273 s …  3.334 s    3 runs

Summary
  PR ran
    1.92 ± 0.16 times faster than main

importlib_metadata on  speedup-normalize [?] via 🐍 v3.14.2 via 💎 v3.1.3 took 38shyperfine --warmup 1 -r 3 \
    -n main --prepare 'git checkout main' 'python3.12 benchmark_names.py' \
    -n PR --prepare 'git checkout speedup-normalize -q' 'python3.12 benchmark_names.py'
Benchmark 1: main
  Time (mean ± σ):      5.690 s ±  0.183 s    [User: 5.281 s, System: 0.274 s]
  Range (min … max):    5.492 s …  5.852 s    3 runs

Benchmark 2: PR
  Time (mean ± σ):      6.640 s ±  0.069 s    [User: 6.169 s, System: 0.296 s]
  Range (min … max):    6.560 s …  6.688 s    3 runs

Summary
  main ran
    1.17 ± 0.04 times faster than PR

importlib_metadata on  speedup-normalize [?] via 🐍 v3.14.2 via 💎 v3.1.3 took 50shyperfine --warmup 1 -r 3 \
    -n main --prepare 'git checkout main' 'python3.13 benchmark_names.py' \
    -n PR --prepare 'git checkout speedup-normalize -q' 'python3.13 benchmark_names.py'
Benchmark 1: main
  Time (mean ± σ):      5.784 s ±  0.109 s    [User: 5.359 s, System: 0.253 s]
  Range (min … max):    5.674 s …  5.891 s    3 runs

Benchmark 2: PR
  Time (mean ± σ):      5.957 s ±  0.074 s    [User: 5.601 s, System: 0.241 s]
  Range (min … max):    5.876 s …  6.022 s    3 runs

Summary
  main ran
    1.03 ± 0.02 times faster than PR

importlib_metadata on  speedup-normalize [?] via 🐍 v3.14.2 via 💎 v3.1.3 took 50shyperfine --warmup 1 -r 3 \
    -n main --prepare 'git checkout main' 'python3.14 benchmark_names.py' \
    -n PR --prepare 'git checkout speedup-normalize -q' 'python3.14 benchmark_names.py'
Benchmark 1: main
  Time (mean ± σ):      5.749 s ±  0.060 s    [User: 5.442 s, System: 0.223 s]
  Range (min … max):    5.684 s …  5.802 s    3 runs

Benchmark 2: PR
  Time (mean ± σ):      2.163 s ±  0.125 s    [User: 1.922 s, System: 0.169 s]
  Range (min … max):    2.020 s …  2.257 s    3 runs

Summary
  PR ran
    2.66 ± 0.16 times faster than main

@hugovk
Copy link
Member Author

hugovk commented Feb 6, 2026

Updated to match python/cpython#144083 and now 3.0-3.6x speedup across all Python versions:

Python main (s) PR (s) Result
3.10 6.322 ± 0.229 2.046 ± 0.055 PR 3.09x faster
3.11 5.827 ± 0.083 1.612 ± 0.012 PR 3.62x faster
3.12 5.557 ± 0.041 1.889 ± 0.510 PR 2.94x faster
3.13 5.572 ± 0.069 1.640 ± 0.015 PR 3.40x faster
3.14 5.886 ± 0.266 1.667 ± 0.008 PR 3.53x faster
Details
hyperfine --warmup 1 -r 3 \
    -n main --prepare 'git checkout main' 'python3.10 benchmark_names.py' \
    -n PR --prepare 'git checkout speedup-normalize -q' 'python3.10 benchmark_names.py'
Benchmark 1: main
  Time (mean ± σ):      6.322 s ±  0.229 s    [User: 5.905 s, System: 0.292 s]
  Range (min … max):    6.057 s …  6.460 s    3 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Benchmark 2: PR
  Time (mean ± σ):      2.046 s ±  0.055 s    [User: 1.892 s, System: 0.128 s]
  Range (min … max):    2.004 s …  2.108 s    3 runs

Summary
  PR ran
    3.09 ± 0.14 times faster than main

importlib_metadata on  speedup-normalize [?⇡] via 🐍 v3.14.3 via 💎 v3.1.3 took 33shyperfine --warmup 1 -r 3 \
    -n main --prepare 'git checkout main' 'python3.11 benchmark_names.py' \
    -n PR --prepare 'git checkout speedup-normalize -q' 'python3.11 benchmark_names.py'
Benchmark 1: main
  Time (mean ± σ):      5.827 s ±  0.083 s    [User: 5.293 s, System: 0.315 s]
  Range (min … max):    5.731 s …  5.877 s    3 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Benchmark 2: PR
  Time (mean ± σ):      1.612 s ±  0.012 s    [User: 1.514 s, System: 0.095 s]
  Range (min … max):    1.602 s …  1.625 s    3 runs

Summary
  PR ran
    3.62 ± 0.06 times faster than main

importlib_metadata on  speedup-normalize [?⇡] via 🐍 v3.14.3 via 💎 v3.1.3 took 32shyperfine --warmup 1 -r 3 \
    -n main --prepare 'git checkout main' 'python3.12 benchmark_names.py' \
    -n PR --prepare 'git checkout speedup-normalize -q' 'python3.12 benchmark_names.py'
Benchmark 1: main
  Time (mean ± σ):      5.557 s ±  0.041 s    [User: 5.265 s, System: 0.216 s]
  Range (min … max):    5.516 s …  5.598 s    3 runs

Benchmark 2: PR
  Time (mean ± σ):      1.889 s ±  0.510 s    [User: 1.564 s, System: 0.179 s]
  Range (min … max):    1.578 s …  2.478 s    3 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Summary
  PR ran
    2.94 ± 0.79 times faster than main

importlib_metadata on  speedup-normalize [?⇡] via 🐍 v3.14.3 via 💎 v3.1.3 took 32shyperfine --warmup 1 -r 3 \
    -n main --prepare 'git checkout main' 'python3.13 benchmark_names.py' \
    -n PR --prepare 'git checkout speedup-normalize -q' 'python3.13 benchmark_names.py'
Benchmark 1: main
  Time (mean ± σ):      5.572 s ±  0.069 s    [User: 5.322 s, System: 0.195 s]
  Range (min … max):    5.492 s …  5.614 s    3 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Benchmark 2: PR
  Time (mean ± σ):      1.640 s ±  0.015 s    [User: 1.544 s, System: 0.092 s]
  Range (min … max):    1.626 s …  1.655 s    3 runs

Summary
  PR ran
    3.40 ± 0.05 times faster than main

importlib_metadata on  speedup-normalize [?⇡] via 🐍 v3.14.3 via 💎 v3.1.3 took 31shyperfine --warmup 1 -r 3 \
    -n main --prepare 'git checkout main' 'python3.14 benchmark_names.py' \
    -n PR --prepare 'git checkout speedup-normalize -q' 'python3.14 benchmark_names.py'
Benchmark 1: main
  Time (mean ± σ):      5.886 s ±  0.266 s    [User: 5.361 s, System: 0.283 s]
  Range (min … max):    5.579 s …  6.050 s    3 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Benchmark 2: PR
  Time (mean ± σ):      1.667 s ±  0.008 s    [User: 1.573 s, System: 0.091 s]
  Range (min … max):    1.661 s …  1.676 s    3 runs

Summary
  PR ran
    3.53 ± 0.16 times faster than main

@hugovk hugovk marked this pull request as ready for review February 6, 2026 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant