Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,10 +93,11 @@ enforces all of them; a stdlib-only port that goes through `urllib.parse` cannot

Every stable and tentative method in the WHATWG URLPattern IDL is implemented:
`URLPattern(input | string, baseURL?, options?)`, `test`, `exec`, all eight component
properties, `hasRegExpGroups`, `URLPattern.compareComponent`, and the tentative
`generate(component, groups)`. See [SPEC_DEVIATIONS.md](SPEC_DEVIATIONS.md) for the
intentional Python-flavour choices (camelCase method names, the additional `with_*`
derivers, escape-helper exposure).
properties, `has_regexp_groups`, `URLPattern.compare_component`, and the tentative
`generate(component, groups)`. The IDL camelCase spellings (`hasRegExpGroups`,
`compareComponent`) are kept as aliases so code ported verbatim from the spec or
browser JS reads identically. See [SPEC_DEVIATIONS.md](SPEC_DEVIATIONS.md) for the
intentional Python-flavour choices.

## How this differs from `aiohttp.web.UrlDispatcher`

Expand Down
20 changes: 13 additions & 7 deletions SPEC_DEVIATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,14 +134,16 @@ above what yarl itself does:
The WHATWG URLPattern Standard distinguishes between the *stable* API
surface (constructor, `test`, `exec`, `compareComponent`, component
properties, `hasRegExpGroups`) and the *tentative* surface (`generate`).
yarlpattern's posture:
yarlpattern exposes both PEP-8 snake_case names and the IDL camelCase
spellings (see the *Method-name capitalisation* note below); the
table reports against the canonical snake form:

| Surface | Status |
|---|---|
| Constructor + `test` + `exec` | Implemented; 100% WPT pass with `[regex]` |
| Per-component getter properties | Implemented |
| `compareComponent` | Implemented; 25 / 25 WPT cases pass |
| `hasRegExpGroups` | Implemented; 55 / 55 WPT cases pass |
| `compare_component` (alias `compareComponent`) | Implemented; 25 / 25 WPT cases pass |
| `has_regexp_groups` (alias `hasRegExpGroups`) | Implemented; 55 / 55 WPT cases pass |
| `generate()` (tentative spec) | Implemented; 19 / 19 WPT cases pass |

## What yarlpattern does *not* deviate on, despite Python's defaults
Expand All @@ -155,10 +157,14 @@ a spec deviation, and yarlpattern goes out of its way to match WHATWG:
`host`, `path`, `query`, `fragment`). Cross-runtime portability
with browser-side JS `URL` and `URLPattern` is preserved by
construction.
- **Method-name capitalisation**: `compareComponent` and
`hasRegExpGroups` keep their WHATWG IDL camelCase names. This
is intentional Python-PEP-8 deviation in favour of literal-text
compatibility with the spec and with cross-language patterns.
- **Method-name capitalisation**: the canonical names are PEP 8
snake_case (`compare_component`, `has_regexp_groups`), and the
WHATWG IDL camelCase forms (`compareComponent`, `hasRegExpGroups`)
are exposed as aliases that dispatch to the same callable / property
— no extra logic, no separate code path. Snake is what readers
should reach for in new Python code; the camel aliases exist so
code ported verbatim from the spec, browser JS, Deno, Bun, or
Cloudflare Workers reads identically.
- **Result shape**: `URLPatternResult` mirrors the JS-side shape
exactly: `result.<component>` is a dict with `'input'` and
`'groups'` keys; attribute access on a Pythonic `result.<component>.groups`
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,4 @@ Every example follows the same shape:
3. **With URLPattern** — one declarative pattern, structured match result.
4. **What you get for free** — which URLPattern feature carried the weight
(cross-component matching, optional segments, named groups with regex,
`compareComponent()`, custom-scheme support, …).
`compare_component()`, custom-scheme support, …).
8 changes: 4 additions & 4 deletions docs/examples/match-the-kserve-v2-inference-path.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,10 +66,10 @@ The `{/versions/:version}?` group is *optional* — when the segment is
absent, the named group is simply not in the result. Same pattern
handles both URL shapes.

## Multi-backend routing with `compareComponent()`
## Multi-backend routing with `compare_component()`

If you're fronting *several* inference servers — Triton at one URL
prefix, KServe at another, TorchServe at a third — `compareComponent`
prefix, KServe at another, TorchServe at a third — `compare_component`
gives you spec-defined specificity ordering rather than insertion-order
fragility:

Expand All @@ -82,7 +82,7 @@ ROUTES = [
]

# Sort by specificity per the spec — no manual "register specific first" discipline.
ROUTES.sort(key=cmp_to_key(lambda a, b: URLPattern.compareComponent("pathname", a, b)))
ROUTES.sort(key=cmp_to_key(lambda a, b: URLPattern.compare_component("pathname", a, b)))
```

## What you get for free
Expand All @@ -92,7 +92,7 @@ ROUTES.sort(key=cmp_to_key(lambda a, b: URLPattern.compareComponent("pathname",
- **Regex-constrained action enum** — `:action(infer|ready|generate)`
rejects `/v2/models/bert/explain` at the pattern level, before any
handler dispatch.
- **`compareComponent()` for specificity** — replaces the
- **`compare_component()` for specificity** — replaces the
"register specific patterns first" discipline every Python router
documents. A spec-defined deterministic ordering means a sidecar can
*compute* the right dispatch order from a route list it didn't write.
Expand Down
6 changes: 3 additions & 3 deletions docs/examples/pick-an-llm-backend-by-model-name.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def route(self, request: str):

`_pattern_to_regex` is essentially URLPattern's `*` wildcard handling
re-implemented; `calculate_pattern_specificity` is essentially
`compareComponent()` re-implemented. Both have known footguns — an
`compare_component()` re-implemented. Both have known footguns — an
asterisk inside a literal segment, complexity-char counting that ranks
two patterns the same when they shouldn't be.

Expand All @@ -55,7 +55,7 @@ ROUTES: list[tuple[URLPattern, str]] = [

# Spec-defined specificity: more specific patterns sort *before* more general
# ones. Replaces LiteLLM's manual "count complexity chars" heuristic.
ROUTES.sort(key=cmp_to_key(lambda a, b: URLPattern.compareComponent("pathname", a[0], b[0])))
ROUTES.sort(key=cmp_to_key(lambda a, b: URLPattern.compare_component("pathname", a[0], b[0])))

def pick_deployment(request_path: str) -> str | None:
for pat, deployment in ROUTES:
Expand All @@ -71,7 +71,7 @@ pick_deployment("/anthropic/claude-3-haiku") # 'anthropic-fast'

## What you get for free

- **`compareComponent()` is the spec-defined version of LiteLLM's
- **`compare_component()` is the spec-defined version of LiteLLM's
`calculate_pattern_specificity`.** Deterministic ordering, no manual
"count the wildcards" heuristic, identical results across
implementations (Chromium, Safari, Firefox, yarlpattern).
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/translate-google-api-http-to-urlpattern.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ def route_predict(url):
doesn't enforce this directly, but URLPattern can.
- **`additional_bindings` ↔ pattern list.** Multiple URLPattern
entries pointing at the same handler are the natural representation;
`compareComponent()` gives you the same specificity ordering grpc-
`compare_component()` gives you the same specificity ordering grpc-
gateway computes internally.
- **Same patterns work everywhere.** A Python sidecar fronting a
gRPC-gateway-translated service, a Cloudflare Worker fronting the
Expand Down
2 changes: 1 addition & 1 deletion docs/wpt-compliance.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# WHATWG URLPattern Conformance Report

Generated by `scripts/generate_compliance_report.py` on **2026-05-13 04:06:52 UTC**, running against [`web-platform-tests/wpt/urlpattern/`](https://github.com/web-platform-tests/wpt/tree/dd54691426c23a08c6f4a0972b2c40965307e5ce/urlpattern) pinned at [`dd54691`](https://github.com/web-platform-tests/wpt/commit/dd54691426c23a08c6f4a0972b2c40965307e5ce) with regex engine **`regex`** (set-operation support: yes). Suite names match the upstream WPT runner basenames.
Generated by `scripts/generate_compliance_report.py` on **2026-05-13 04:26:17 UTC**, running against [`web-platform-tests/wpt/urlpattern/`](https://github.com/web-platform-tests/wpt/tree/dd54691426c23a08c6f4a0972b2c40965307e5ce/urlpattern) pinned at [`dd54691`](https://github.com/web-platform-tests/wpt/commit/dd54691426c23a08c6f4a0972b2c40965307e5ce) with regex engine **`regex`** (set-operation support: yes). Suite names match the upstream WPT runner basenames.

> **Legend.** <kbd>✓</kbd> pass · <kbd>✗</kbd> fail · <kbd>◐</kbd> xfail (known engine gap) · <kbd>◑</kbd> skip · <kbd>⚠</kbd> error.

Expand Down
10 changes: 5 additions & 5 deletions scripts/generate_compliance_report.py
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,7 @@ def _case_id_for(idx: int, entry: dict[str, Any]) -> str:
return f"{idx:03d}-{summary}"


# -------------------------- compareComponent / generate harness -------------
# -------------------------- compare_component / generate harness ------------


def _run_compare_case(idx: int, entry: dict[str, Any]) -> CaseResult:
Expand All @@ -301,13 +301,13 @@ def _run_compare_case(idx: int, entry: dict[str, Any]) -> CaseResult:
right = URLPattern(entry["right"])
component = entry["component"]
expected = entry["expected"]
if URLPattern.compareComponent(component, left, right) != expected:
if URLPattern.compare_component(component, left, right) != expected:
return CaseResult(idx, case_id, "fail", f"forward != {expected}")
if URLPattern.compareComponent(component, right, left) != -expected:
if URLPattern.compare_component(component, right, left) != -expected:
return CaseResult(idx, case_id, "fail", f"reverse != {-expected}")
if URLPattern.compareComponent(component, left, left) != 0:
if URLPattern.compare_component(component, left, left) != 0:
return CaseResult(idx, case_id, "fail", "self(left) != 0")
if URLPattern.compareComponent(component, right, right) != 0:
if URLPattern.compare_component(component, right, right) != 0:
return CaseResult(idx, case_id, "fail", "self(right) != 0")
except Exception as exc: # noqa: BLE001
return CaseResult(idx, case_id, "error", f"{type(exc).__name__}: {exc}")
Expand Down
22 changes: 16 additions & 6 deletions src/yarlpattern/_pattern.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ def _strip_component_prefix_suffix(component: str, value: str) -> str:
}


# ------------------------------------------------------- compareComponent
# ------------------------------------------------------- compare_component
#
# Specificity ordering tables — the orderings here are the WHATWG tentative
# spec's intended ranks (which the polyfill and Chromium also implement
Expand Down Expand Up @@ -149,7 +149,7 @@ def _strip_component_prefix_suffix(component: str, value: str) -> str:
PartModifier.NONE: 3,
}

# Length-mismatch sentinel — used by :meth:`URLPattern.compareComponent`
# Length-mismatch sentinel — used by :meth:`URLPattern.compare_component`
# to pad the shorter part list. An empty fixed-text part is what the spec
# substitutes so that ``/foo/`` outranks ``/foo/*``: a literal-ending
# pattern is more restrictive than one that wildcards after a common prefix.
Expand Down Expand Up @@ -228,7 +228,7 @@ class _ComponentMatcher:
# genuinely ``""`` in both engines.
apply_ecma_narrowing: list[bool]
# Pre-built tuple of comparison keys, one per part, used by
# :meth:`URLPattern.compareComponent`. Each key is
# :meth:`URLPattern.compare_component`. Each key is
# ``(type_rank, modifier_rank, prefix, value, suffix)``; assembled
# once at compile time so every compare-call is a C-level tuple
# comparison (no Python-level attribute access on ``Part``).
Expand Down Expand Up @@ -480,7 +480,7 @@ def _compile_component(self, component: str, pattern_string: str) -> None:
for p in parts
if p.type is not PartType.FIXED_TEXT
]
# Compare-key tuple for :meth:`compareComponent` — built once at
# Compare-key tuple for :meth:`compare_component` — built once at
# compile time so every comparison is a pure C-level tuple-compare.
compare_keys = tuple(_part_to_compare_key(p) for p in parts)
self._matchers[component] = _ComponentMatcher(
Expand Down Expand Up @@ -756,8 +756,13 @@ def has_regexp_groups(self) -> bool:
"""
return any(m.has_custom_regexp for m in self._matchers.values())

# WHATWG IDL camelCase alias. Snake is the canonical Python form;
# ``hasRegExpGroups`` is kept so code ported verbatim from the spec /
# browser JS / Deno / Bun / Cloudflare-Workers reads identically.
hasRegExpGroups = has_regexp_groups # noqa: N815

@staticmethod
def compareComponent( # noqa: N802 — matches the WHATWG IDL method name
def compare_component(
component: str,
left: URLPattern,
right: URLPattern,
Expand All @@ -782,7 +787,7 @@ def compareComponent( # noqa: N802 — matches the WHATWG IDL method name
spec-defined names.
"""
if component not in COMPONENTS:
msg = f"URLPattern.compareComponent: unknown component {component!r}; expected one of {COMPONENTS}"
msg = f"URLPattern.compare_component: unknown component {component!r}; expected one of {COMPONENTS}"
raise TypeError(msg)
# Empty part lists stand in for ``*`` — see ``_FULL_WILDCARD_ONLY_KEYS``.
# Calling ``.compare_keys or _FULL_WILDCARD_ONLY_KEYS`` is a free
Expand All @@ -801,6 +806,11 @@ def compareComponent( # noqa: N802 — matches the WHATWG IDL method name
return -1 if lk < rk else 1
return 0

# WHATWG IDL camelCase alias. Snake is the canonical Python form;
# ``compareComponent`` is kept so code ported verbatim from the spec /
# browser JS / Deno / Bun / Cloudflare-Workers reads identically.
compareComponent = compare_component # noqa: N815

# -------------------------------------------------------------- generate
def generate(self, component: str, groups: Mapping[str, str] | None = None) -> str:
"""Produce the URL-component string that *this* pattern would have matched.
Expand Down
22 changes: 17 additions & 5 deletions tests/test_pattern.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,29 +222,41 @@ def test_has_regexp_groups_true_for_custom_regex_body() -> None:
assert pat.has_regexp_groups is True


# ------------------------------------------------------------ compareComponent
# ----------------------------------------------------------- compare_component


def test_compare_component_rejects_unknown_component_name() -> None:
pat = URLPattern({"pathname": "/foo"})
with pytest.raises(TypeError, match="unknown component"):
URLPattern.compareComponent("not-a-component", pat, pat)
URLPattern.compare_component("not-a-component", pat, pat)


def test_compare_component_self_equality_across_all_components() -> None:
# Self-compare must be 0 on every component, regardless of pattern shape.
pat = URLPattern({"pathname": "/foo/:id(\\d+)"})
for component in COMPONENTS:
assert URLPattern.compareComponent(component, pat, pat) == 0
assert URLPattern.compare_component(component, pat, pat) == 0


def test_compare_component_empty_treated_as_full_wildcard() -> None:
# Explicitly empty component pattern compares equal to ``*`` — the spec
# substitutes the same single-FULL_WILDCARD part list for both.
empty = URLPattern({"pathname": ""})
star = URLPattern({"pathname": "*"})
assert URLPattern.compareComponent("pathname", empty, star) == 0
assert URLPattern.compareComponent("pathname", star, empty) == 0
assert URLPattern.compare_component("pathname", empty, star) == 0
assert URLPattern.compare_component("pathname", star, empty) == 0


def test_camelcase_aliases_resolve_to_same_callable_and_property() -> None:
# ``compareComponent`` and ``hasRegExpGroups`` are kept as IDL-faithful
# camelCase aliases so code ported verbatim from the spec / browser JS
# reads identically. They must dispatch to the snake-case canonical
# forms, not duplicate the logic.
pat = URLPattern({"pathname": "/foo/:id(\\d+)"})
other = URLPattern({"pathname": "/foo/:id(\\d+)"})
assert URLPattern.compareComponent is URLPattern.compare_component
assert URLPattern.compareComponent("pathname", pat, other) == 0
assert pat.hasRegExpGroups is pat.has_regexp_groups


# ---------------------------------------------------------------------- with_
Expand Down
15 changes: 8 additions & 7 deletions tests/test_wpt_compare.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
"""Port of ``reference/wpt/urlpattern/resources/urlpattern-compare-tests.tentative.js``.

The compare suite tests :meth:`URLPattern.compareComponent` — a static
method that returns a three-way comparison between two patterns for a
single component. URL routing libraries use it to order patterns from
The compare suite tests :meth:`URLPattern.compare_component` (also exposed
under the IDL-faithful ``compareComponent`` alias) — a static method that
returns a three-way comparison between two patterns for a single
component. URL routing libraries use it to order patterns from
most-specific to least-specific.

The corresponding WPT file is marked ``.tentative`` because the spec
Expand Down Expand Up @@ -60,10 +61,10 @@ def test_wpt_compare(entry: dict[str, Any]) -> None:
component: str = entry["component"]
expected: int = entry["expected"]

assert URLPattern.compareComponent(component, left, right) == expected
assert URLPattern.compare_component(component, left, right) == expected
# Reverse: JS uses ``~~(expected * -1)`` to coerce ``-0`` to ``0``;
# Python's ints have no negative-zero, so a plain negation is enough.
assert URLPattern.compareComponent(component, right, left) == -expected
assert URLPattern.compare_component(component, right, left) == -expected
# Self-equality.
assert URLPattern.compareComponent(component, left, left) == 0
assert URLPattern.compareComponent(component, right, right) == 0
assert URLPattern.compare_component(component, left, left) == 0
assert URLPattern.compare_component(component, right, right) == 0
Loading