ObjectStoreRegistry.resolve() returns the wrong trailing path for path-style object storage HTTPS URLs when the registered store is already mounted to a container or bucket.
In these setups, the resolved path still includes the container or bucket name, but the underlying store already treats that container or bucket as the storage root. Passing the resolved path into store.head() or store.get_range() therefore targets a non-existent object.
Azure Blob Storage HTTPS URLs are a clear real-world example of this bug. The same class of bug can also affect path-style S3 HTTPS URLs.
Current behavior
Given an Azure Blob asset URL like:
https://ukmoeuwest.blob.core.windows.net/deterministic/uk/near-surface/...nc
and a store created from the same container:
store = AzureStore(
credential_provider=PlanetaryComputerCredentialProvider(
"https://ukmoeuwest.blob.core.windows.net/deterministic/"
)
)
registry = ObjectStoreRegistry(
{"https://ukmoeuwest.blob.core.windows.net/deterministic/": store}
)
registry.resolve(asset_url) returns:
"deterministic/uk/near-surface/...nc"
But AzureStore expects a path within the deterministic container:
The extra deterministic/ causes store.head() to 404.
Expected behavior
For path-style object storage HTTPS URLs, ObjectStoreRegistry.resolve() should return a path that is compatible with the mounted store, i.e. a path relative to the configured container or bucket.
For the Azure example above, the resolved path should be:
Steps to reproduce
Minimal repro with current behavior:
from obstore.auth.planetary_computer import PlanetaryComputerCredentialProvider
from obstore.store import AzureStore
from obspec_utils.registry import ObjectStoreRegistry
prefix = "https://ukmoeuwest.blob.core.windows.net/deterministic/"
asset_url = (
"https://ukmoeuwest.blob.core.windows.net/"
"deterministic/uk/near-surface/20260501T0000Z/"
"20260501T0000Z-PT0000H00M-temperature_at_surface.nc"
)
store = AzureStore(
credential_provider=PlanetaryComputerCredentialProvider(prefix)
)
registry = ObjectStoreRegistry({prefix: store})
_, path = registry.resolve(asset_url)
print(path)
# current output:
# deterministic/uk/near-surface/20260501T0000Z/
# 20260501T0000Z-PT0000H00M-temperature_at_surface.nc
print(store.head("uk/near-surface/20260501T0000Z/20260501T0000Z-PT0000H00M-temperature_at_surface.nc")["size"])
# works
print(store.head(path)["size"])
# FileNotFoundError / 404
A smaller non-network repro for the path resolution itself:
from obstore.auth.planetary_computer import PlanetaryComputerCredentialProvider
from obstore.store import AzureStore
from obspec_utils.registry import ObjectStoreRegistry
prefix = "https://ukmoeuwest.blob.core.windows.net/deterministic/"
asset_url = (
"https://ukmoeuwest.blob.core.windows.net/"
"deterministic/uk/near-surface/20260501T0000Z/"
"20260501T0000Z-PT0000H00M-temperature_at_surface.nc"
)
store = AzureStore(
credential_provider=PlanetaryComputerCredentialProvider(prefix)
)
registry = ObjectStoreRegistry({prefix: store})
_, path = registry.resolve(asset_url)
assert path == "uk/near-surface/20260501T0000Z/20260501T0000Z-PT0000H00M-temperature_at_surface.nc"
# currently fails because path includes the container name
Impact
This breaks downstream code that relies on ObjectStoreRegistry.resolve() to feed resolved paths into mounted object-store methods.
One concrete case is VirtualiZarr, where the resolved path is passed into a buffered reader backed by AzureStore.head() and AzureStore.get_range(). The dataset open fails even though the original HTTPS asset href is valid and the store is correctly authenticated.
This does not appear to affect every HTTPS object URL shape. For example, virtual-hosted S3 URLs like https://my-bucket.s3.us-west-2.amazonaws.com/... appear to resolve correctly because the bucket name lives in the hostname rather than in the path. The problem shows up when the storage identity is encoded in the path, as with Azure Blob HTTPS URLs and path-style S3 URLs like https://s3.us-west-2.amazonaws.com/my-bucket/....
Notes / possible cause
The issue appears to come from this branch in ObjectStoreRegistry.resolve():
elif hasattr(store, "url"):
prefix = urlparse(store.url).path.lstrip("/")
path_after_prefix = path.lstrip("/").removeprefix(prefix).lstrip("/")
else:
path_after_prefix = path.lstrip("/")
AzureStore does not expose .url, and the registry falls back to returning the full object path from the HTTPS URL.
For Azure Blob HTTPS URLs, that full path includes the container name. But AzureStore already knows the container and expects paths relative to it.
The same mismatch can arise for other path-style HTTPS object URLs where the bucket or container lives in the URL path but the mounted store already knows that storage root.
This suggests the bug is not Azure-specific. It is a mismatch between path-style HTTPS URL resolution and stores that are already mounted to a bucket or container.
Acceptance criteria
Open questions
- Should this be handled by teaching
resolve() about mounted object stores with path-style HTTPS URLs, or by exposing enough metadata on store objects to make the existing prefix-stripping logic work generically?
- Should
https://<account>.blob.core.windows.net/<container>/... and abfs://<container>/... be normalized to equivalent resolved paths in the registry?
- Should the same normalization apply to path-style S3 HTTPS URLs so they behave like
s3://bucket/... and virtual-hosted S3 HTTPS URLs?
ObjectStoreRegistry.resolve()returns the wrong trailing path for path-style object storage HTTPS URLs when the registered store is already mounted to a container or bucket.In these setups, the resolved path still includes the container or bucket name, but the underlying store already treats that container or bucket as the storage root. Passing the resolved path into
store.head()orstore.get_range()therefore targets a non-existent object.Azure Blob Storage HTTPS URLs are a clear real-world example of this bug. The same class of bug can also affect path-style S3 HTTPS URLs.
Current behavior
Given an Azure Blob asset URL like:
https://ukmoeuwest.blob.core.windows.net/deterministic/uk/near-surface/...ncand a store created from the same container:
registry.resolve(asset_url)returns:"deterministic/uk/near-surface/...nc"But
AzureStoreexpects a path within thedeterministiccontainer:"uk/near-surface/...nc"The extra
deterministic/causesstore.head()to 404.Expected behavior
For path-style object storage HTTPS URLs,
ObjectStoreRegistry.resolve()should return a path that is compatible with the mounted store, i.e. a path relative to the configured container or bucket.For the Azure example above, the resolved path should be:
"uk/near-surface/...nc"Steps to reproduce
Minimal repro with current behavior:
A smaller non-network repro for the path resolution itself:
Impact
This breaks downstream code that relies on
ObjectStoreRegistry.resolve()to feed resolved paths into mounted object-store methods.One concrete case is VirtualiZarr, where the resolved path is passed into a buffered reader backed by
AzureStore.head()andAzureStore.get_range(). The dataset open fails even though the original HTTPS asset href is valid and the store is correctly authenticated.This does not appear to affect every HTTPS object URL shape. For example, virtual-hosted S3 URLs like
https://my-bucket.s3.us-west-2.amazonaws.com/...appear to resolve correctly because the bucket name lives in the hostname rather than in the path. The problem shows up when the storage identity is encoded in the path, as with Azure Blob HTTPS URLs and path-style S3 URLs likehttps://s3.us-west-2.amazonaws.com/my-bucket/....Notes / possible cause
The issue appears to come from this branch in
ObjectStoreRegistry.resolve():AzureStoredoes not expose.url, and the registry falls back to returning the full object path from the HTTPS URL.For Azure Blob HTTPS URLs, that full path includes the container name. But
AzureStorealready knows the container and expects paths relative to it.The same mismatch can arise for other path-style HTTPS object URLs where the bucket or container lives in the URL path but the mounted store already knows that storage root.
This suggests the bug is not Azure-specific. It is a mismatch between path-style HTTPS URL resolution and stores that are already mounted to a bucket or container.
Acceptance criteria
ObjectStoreRegistry.resolve()returns container-relative or bucket-relative paths for path-style object storage HTTPS URLs when resolving against mounted storesAzureStoreregistered under ahttps://<account>.blob.core.windows.net/<container>/prefixhttps://s3.<region>.amazonaws.com/<bucket>/...Open questions
resolve()about mounted object stores with path-style HTTPS URLs, or by exposing enough metadata on store objects to make the existing prefix-stripping logic work generically?https://<account>.blob.core.windows.net/<container>/...andabfs://<container>/...be normalized to equivalent resolved paths in the registry?s3://bucket/...and virtual-hosted S3 HTTPS URLs?