Skip to content

Replace caching prethread with simpler request context copy-over#12414

Draft
cdrini wants to merge 1 commit intointernetarchive:masterfrom
cdrini:refactor/replace-prethread-with-req-context-copy
Draft

Replace caching prethread with simpler request context copy-over#12414
cdrini wants to merge 1 commit intointernetarchive:masterfrom
cdrini:refactor/replace-prethread-with-req-context-copy

Conversation

@cdrini
Copy link
Copy Markdown
Collaborator

@cdrini cdrini commented Apr 20, 2026

  • avoids the complicated closure capture
  • avoids cache leaking when new request context variables are introduced
  • fixes cache leak with is_bot and is_recognized_bot
  • fixes various other cache leaks where the caching prethread method was used without the corresponding variables being added to the cache key.

Closes #

Technical

Testing

Screenshot

Stakeholders

@cdrini cdrini force-pushed the refactor/replace-prethread-with-req-context-copy branch 3 times, most recently from e0b7241 to 01fe6ad Compare April 20, 2026 08:02
- avoids the complicated closure capture
- avoids cache leaking when new request context variables are introduced
- fixes cache leak with is_bot and is_recognized_bot
@cdrini cdrini force-pushed the refactor/replace-prethread-with-req-context-copy branch from 01fe6ad to c05f1a8 Compare April 20, 2026 08:06
@cdrini cdrini requested a review from Copilot April 20, 2026 08:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors memcache async-refresh behavior to avoid the previous caching_prethread closure approach by carrying request context explicitly into cache keys and (intended) background refresh threads, preventing cache leakage across language/bot/cookie contexts.

Changes:

  • Add RequestContextVars.copy_for_cache_thread() and allow set_context_from_legacy_web_py() to accept an override context.
  • Extend memcache_memoize with cache_request_context support; update key computation + async refresh plumbing accordingly.
  • Update multiple call sites/tests to the new caching approach and new cache helper APIs (memcache_delete_by_args, new memcache_set/memcache_get signatures).

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
openlibrary/utils/request_context.py Adds cache-thread context copy helper and override support for legacy web.py context setup.
openlibrary/core/cache.py Reworks memcache memoization keys/async refresh to optionally include request context and forward it to refresh threads.
openlibrary/tests/core/test_cache.py Updates tests for new memoize signatures/behavior and default hashing.
openlibrary/plugins/upstream/utils.py Switches cached macro rendering to new cache_request_context approach.
openlibrary/plugins/upstream/tests/test_utils.py Removes caching_prethread patching now that it’s no longer used.
openlibrary/plugins/upstream/mybooks.py Updates feed caching to use cache_request_context instead of caching_prethread.
openlibrary/plugins/upstream/borrow.py Updates cache invalidation calls to memcache_delete_by_args.
openlibrary/plugins/openlibrary/home.py Removes caching_prethread and uses cache_request_context for homepage/carousel-related caching.
openlibrary/plugins/openlibrary/api.py Updates OPDS homepage caching to use cache_request_context instead of caching_prethread.
openlibrary/core/lending.py Updates cache rehydration to use new compute_key/memcache_set signatures.
openlibrary/core/admin.py Updates caching wrapper for unique login counts to new memoize options.


def copy_for_cache_thread(self):
"""Create a copy of the context vars for use in a new thread."""
# Everything the same, except x_forwarded_for and user_agent are see
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: “are see” should be “are set”.

Suggested change
# Everything the same, except x_forwarded_for and user_agent are see
# Everything the same, except x_forwarded_for and user_agent are set

Copilot uses AI. Check for mistakes.
Extracts context from the global web.ctx and populates ContextVars.
"""
if req_context_override is not None:
req_context.set(req_context_override)
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When req_context_override is provided, this returns after only setting req_context. For cache refresh threads, this leaves request_context.site unset (so request_context.site.get() can raise LookupError) and doesn’t synchronize legacy web.ctx fields like web.ctx.lang which some templates/helpers still read. Consider calling setup_site() here and mirroring the relevant web.ctx attributes (e.g., lang, is_bot) from the override.

Suggested change
req_context.set(req_context_override)
req_context.set(req_context_override)
setup_site()
web.ctx.lang = req_context_override.lang
web.ctx.is_bot = req_context_override.is_bot

Copilot uses AI. Check for mistakes.
Comment thread openlibrary/core/cache.py
self.prethread()
if background_req_context:
delegate.fakeload()
set_context_from_legacy_web_py(background_req_context)
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the async refresh thread, delegate.fakeload() is called but the legacy web.ctx fields aren’t restored from background_req_context (only req_context is set). Some cached codepaths/templates still read web.ctx.lang (e.g. homepage carousels via get_lang()), so the refresh can recompute the wrong-language value under a context-specific cache key. Suggest explicitly setting web.ctx.lang / web.ctx.is_bot (and initializing request_context.site) from background_req_context before calling self.update() (or have set_context_from_legacy_web_py do that when given an override).

Suggested change
set_context_from_legacy_web_py(background_req_context)
set_context_from_legacy_web_py(background_req_context)
web.ctx.lang = getattr(background_req_context, "lang", None)
web.ctx.is_bot = getattr(background_req_context, "is_bot", False)
req_context.site = getattr(background_req_context, "site", None)

Copilot uses AI. Check for mistakes.
Comment thread openlibrary/core/admin.py
prethread=caching_prethread(),
"logins_since",
timeout=60 * 60 * 12, # 12 hours
cache_request_context=True,
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_unique_logins_since doesn’t appear to depend on request-specific context (lang/pd/sfw/bot), so setting cache_request_context=True will unnecessarily shard the cache per request context and reduce hit rate. Consider leaving it as the default (False) unless there’s a concrete dependency on req_context/web.ctx during async refresh.

Suggested change
cache_request_context=True,

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants