Replace caching prethread with simpler request context copy-over#12414
Replace caching prethread with simpler request context copy-over#12414cdrini wants to merge 1 commit intointernetarchive:masterfrom
Conversation
e0b7241 to
01fe6ad
Compare
- avoids the complicated closure capture - avoids cache leaking when new request context variables are introduced - fixes cache leak with is_bot and is_recognized_bot
01fe6ad to
c05f1a8
Compare
There was a problem hiding this comment.
Pull request overview
Refactors memcache async-refresh behavior to avoid the previous caching_prethread closure approach by carrying request context explicitly into cache keys and (intended) background refresh threads, preventing cache leakage across language/bot/cookie contexts.
Changes:
- Add
RequestContextVars.copy_for_cache_thread()and allowset_context_from_legacy_web_py()to accept an override context. - Extend
memcache_memoizewithcache_request_contextsupport; update key computation + async refresh plumbing accordingly. - Update multiple call sites/tests to the new caching approach and new cache helper APIs (
memcache_delete_by_args, newmemcache_set/memcache_getsignatures).
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| openlibrary/utils/request_context.py | Adds cache-thread context copy helper and override support for legacy web.py context setup. |
| openlibrary/core/cache.py | Reworks memcache memoization keys/async refresh to optionally include request context and forward it to refresh threads. |
| openlibrary/tests/core/test_cache.py | Updates tests for new memoize signatures/behavior and default hashing. |
| openlibrary/plugins/upstream/utils.py | Switches cached macro rendering to new cache_request_context approach. |
| openlibrary/plugins/upstream/tests/test_utils.py | Removes caching_prethread patching now that it’s no longer used. |
| openlibrary/plugins/upstream/mybooks.py | Updates feed caching to use cache_request_context instead of caching_prethread. |
| openlibrary/plugins/upstream/borrow.py | Updates cache invalidation calls to memcache_delete_by_args. |
| openlibrary/plugins/openlibrary/home.py | Removes caching_prethread and uses cache_request_context for homepage/carousel-related caching. |
| openlibrary/plugins/openlibrary/api.py | Updates OPDS homepage caching to use cache_request_context instead of caching_prethread. |
| openlibrary/core/lending.py | Updates cache rehydration to use new compute_key/memcache_set signatures. |
| openlibrary/core/admin.py | Updates caching wrapper for unique login counts to new memoize options. |
|
|
||
| def copy_for_cache_thread(self): | ||
| """Create a copy of the context vars for use in a new thread.""" | ||
| # Everything the same, except x_forwarded_for and user_agent are see |
There was a problem hiding this comment.
Typo in comment: “are see” should be “are set”.
| # Everything the same, except x_forwarded_for and user_agent are see | |
| # Everything the same, except x_forwarded_for and user_agent are set |
| Extracts context from the global web.ctx and populates ContextVars. | ||
| """ | ||
| if req_context_override is not None: | ||
| req_context.set(req_context_override) |
There was a problem hiding this comment.
When req_context_override is provided, this returns after only setting req_context. For cache refresh threads, this leaves request_context.site unset (so request_context.site.get() can raise LookupError) and doesn’t synchronize legacy web.ctx fields like web.ctx.lang which some templates/helpers still read. Consider calling setup_site() here and mirroring the relevant web.ctx attributes (e.g., lang, is_bot) from the override.
| req_context.set(req_context_override) | |
| req_context.set(req_context_override) | |
| setup_site() | |
| web.ctx.lang = req_context_override.lang | |
| web.ctx.is_bot = req_context_override.is_bot |
| self.prethread() | ||
| if background_req_context: | ||
| delegate.fakeload() | ||
| set_context_from_legacy_web_py(background_req_context) |
There was a problem hiding this comment.
In the async refresh thread, delegate.fakeload() is called but the legacy web.ctx fields aren’t restored from background_req_context (only req_context is set). Some cached codepaths/templates still read web.ctx.lang (e.g. homepage carousels via get_lang()), so the refresh can recompute the wrong-language value under a context-specific cache key. Suggest explicitly setting web.ctx.lang / web.ctx.is_bot (and initializing request_context.site) from background_req_context before calling self.update() (or have set_context_from_legacy_web_py do that when given an override).
| set_context_from_legacy_web_py(background_req_context) | |
| set_context_from_legacy_web_py(background_req_context) | |
| web.ctx.lang = getattr(background_req_context, "lang", None) | |
| web.ctx.is_bot = getattr(background_req_context, "is_bot", False) | |
| req_context.site = getattr(background_req_context, "site", None) |
| prethread=caching_prethread(), | ||
| "logins_since", | ||
| timeout=60 * 60 * 12, # 12 hours | ||
| cache_request_context=True, |
There was a problem hiding this comment.
get_unique_logins_since doesn’t appear to depend on request-specific context (lang/pd/sfw/bot), so setting cache_request_context=True will unnecessarily shard the cache per request context and reduce hit rate. Consider leaving it as the default (False) unless there’s a concrete dependency on req_context/web.ctx during async refresh.
| cache_request_context=True, |
Closes #
Technical
Testing
Screenshot
Stakeholders