fix(perf): batch-fetch missing entities in _measurement_requests_cursor_to_pydantic#671
Open
michael-johnston wants to merge 1 commit intomainfrom
Open
fix(perf): batch-fetch missing entities in _measurement_requests_cursor_to_pydantic#671michael-johnston wants to merge 1 commit intomainfrom
michael-johnston wants to merge 1 commit intomainfrom
Conversation
…or_to_pydantic
Previously, _measurement_requests_cursor_to_pydantic called
entityWithIdentifier in a loop for each entity not already in the local
cache, issuing one DB round-trip (~240ms) per unique entity. For an
operation touching N distinct entities this was an N+1 query pattern.
This is via a pattern of initializing the store an then calling measurement_requests_for_operation or related calls which happens with `ado show X operation`
It might be expected that self._entities is pre-populated before this method runs. There are two reasons it is not:
1. The bulk-load mechanism is bypassed. The public `entities` property
triggers refresh(force_fetch_all_entities=True) when self._entities is
empty, loading all entities in one query. But _measurement_requests_-
cursor_to_pydantic accesses the private self._entities dict directly,
never touching the property, so the lazy-load never fires.
2. Nothing in the construction path populates the cache. SQLSampleStore
starts with self._entities = {}. By the time
_measurement_requests_cursor_to_pydantic runs, the cache is still
empty and every entity lookup falls through to entityWithIdentifier.
PyMySQL already buffers the full result set in memory before the Python
loop begins, so consuming the cursor up-front with fetchall() adds no
network round-trip. The fix uses this to collect all missing entity IDs
before any entity is resolved, then issues a single _fetch_entities
query for all of them at once regardless of N, ensuring the cache is populated
with the required entities.
The entityWithIdentifier fallback is retained and still fires for entities
added by a concurrent distributed process in the window between the batch
fetch and the cursor scan — the intended use of that code path.
Measured on an operation with 4 distinct entities (197ms RTT to DB):
_measurement_requests_cursor_to_pydantic: 3,378ms → 859ms (-2,519ms)
Saving scales linearly with the number of distinct entities touched by
the operation: N round-trips collapsed to 1.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously, _measurement_requests_cursor_to_pydantic called entityWithIdentifier in a loop for each entity not already in the local cache, issuing one DB round-trip (~240ms) per unique entity. For an operation touching N distinct entities this was an N+1 query pattern. This is via a pattern of initializing the store an then calling measurement_requests_for_operation or related calls which happens with
ado show X operationIt might be expected that self._entities is pre-populated before this method runs. There are two reasons it is not:
The bulk-load mechanism is bypassed. The public
entitiesproperty triggers refresh(force_fetch_all_entities=True) when self._entities is empty, loading all entities in one query. But measurement_requests- cursor_to_pydantic accesses the private self._entities dict directly, never touching the property, so the lazy-load never fires.Nothing in the construction path populates the cache. SQLSampleStore starts with self._entities = {}. By the time _measurement_requests_cursor_to_pydantic runs, the cache is still empty and every entity lookup falls through to entityWithIdentifier.
PyMySQL already buffers the full result set in memory before the Python loop begins, so consuming the cursor up-front with fetchall() adds no network round-trip. The fix uses this to collect all missing entity IDs before any entity is resolved, then issues a single _fetch_entities query for all of them at once regardless of N, ensuring the cache is populated with the required entities.
The entityWithIdentifier fallback is retained and still fires for entities added by a concurrent distributed process in the window between the batch fetch and the cursor scan — the intended use of that code path.
Measured on an operation with 4 distinct entities (197ms RTT to DB):
_measurement_requests_cursor_to_pydantic: 3,378ms → 859ms (-2,519ms)
Saving scales linearly with the number of distinct entities touched by the operation: N round-trips collapsed to 1.