Fall back across blob sources when topic downloads fail by tonidero · Pull Request #3466 · RevenueCat/purchases-android

tonidero · 2026-05-12T13:57:45Z

Motivation

RemoteConfigManager previously picked a single blob source per refresh and gave up if any topic failed to download from it. With weighted multi-source distribution in place (#previous stack PR), we want to gracefully fall back to other sources when one is degraded so a refresh can still succeed.

Description

Classify topic-fetch outcomes (TopicFetcher.kt): fetchTopicIfNeeded now returns a TopicFetchResult sealed type — Success, TransientFailure (timeouts / no network — worth retrying against a different source), or InvalidatingFailure (malformed blob ref, checksum mismatch, other IO). Transient/invalidating is decided via ConnectionErrorReason.fromIOException.
Fallback loop in RemoteConfigManager.kt: introduces downloadTopicsWithFallback, which repeatedly picks the next weighted source, attempts the still-failing tasks against it, and stops as soon as everything succeeds or sources are exhausted. The most recent source's error is preserved for the completion callback so callers see the freshest failure reason rather than the first one encountered.
Sticky source selection: tracks the source that produced the last successful refresh in cachedSourceId. On the next refresh, that source is tried first (bypassing weighted selection) to keep clients on a known-good source. If the sticky source returns an invalidating failure, the cached id is cleared so a fresh weighted draw can take over next time.
Pin reset on response change: when the backend returns a RemoteConfigResponse different from the one currently cached (e.g. an updated source list, new priorities, or new weights), cachedSourceId is cleared before the fallback loop runs. This guarantees a new top-priority source can take over instead of the client staying pinned to a now-lower-priority cached source.
selectWeightedExcluding helper (WeightedSource.kt): weighted selection filtered to sources not already tried in the current refresh. Negative weights are clamped to zero so a single misconfigured entry can't poison the picker.
Refactored refresh result into a RefreshOutcome sealed class (Success / VacuousSuccess / Failure) so caching, cleanup, and cached-source clearing are driven by a single decision rather than nested null checks.

Tests

RemoteConfigManagerTest: new coverage for cross-source fallback on transient failures, exhaustion behavior, invalidating-failure short-circuit per task, sticky-source preference on subsequent refreshes, cachedSourceId invalidation when the sticky source fails with an invalidating error (rewritten with a staged QueuedRandom so the test actually distinguishes cleared/not-cleared paths), and cachedSourceId reset when the backend returns a new response.
TopicFetcherTest: updated for the new TopicFetchResult return type, including timeout/no-network → TransientFailure and checksum/malformed/other-IO → InvalidatingFailure classification.
WeightedSourceSelectionTest: new tests for selectWeightedExcluding and for negative-weight clamping behavior.

Checklist

Unit tests
If applicable, create follow-up issues for purchases-ios and hybrids

Note

Medium Risk
Changes the remote-config refresh path to retry topic downloads across multiple blob sources with sticky source selection, which can affect config update reliability and caching behavior under network failures.

Overview
Remote config refresh now falls back across blob sources when topic downloads fail, retrying only the still-failing topics until one source succeeds or all sources are exhausted; the completion error now reflects the latest failure.

TopicFetcher.fetchTopicIfNeeded now returns a TopicFetchResult (Success, TransientFailure, InvalidatingFailure) with error classification driven by ConnectionErrorReason, and RemoteConfigManager tracks a sticky cachedSourceId (prefer last-good source, clear it on invalidating failures or when the backend response changes). WeightedSource adds selectWeightedExcluding to avoid re-trying sources within a refresh, and tests are expanded/updated to cover fallback, stickiness/reset, and new result classifications.

^{Reviewed by Cursor Bugbot for commit 4650c71. Bugbot is set up for automated code reviews on this repo. Configure here.}

tonidero · 2026-05-12T13:58:13Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

When a chosen BlobSource fails, attempt the remaining topics on other sources (same-priority alternates first via weighted random, then lower priority tiers) before giving up. Cache the last successful source in memory and prefer it on subsequent refreshes, invalidating only when the cached source itself returned a non-network failure (HTTP error, checksum mismatch, malformed blob ref).

ajpallares

Wow, there's some mindbending stuff in here (or maybe I'm not having my best day 😅). I think I finally understood the logic here and I think it looks mostly ok. But I believe there are a couple of things to fix 🙏
Great job on this! 🙌

ajpallares · 2026-05-13T16:02:16Z

+
+internal fun <T : WeightedSource> List<T>.selectWeightedExcluding(
+    excludedIds: Set<String>,
+    idOf: (T) -> String,


Would it be cleaner to add id: String to the WeightedSource interface instead?

ajpallares · 2026-05-13T16:36:48Z

        val tasks = response.manifest.topics.mapNotNull { (topic, entries) ->
            val entry = entries[DEFAULT_ENTRY_ID] ?: return@mapNotNull null
            TopicTask(topic, DEFAULT_ENTRY_ID, entry)
        }
-        val firstError: PurchasesError? = if (source == null || tasks.isEmpty()) {
-            null
+        val outcome = if (response.blobSources.isEmpty() || tasks.isEmpty()) {
+            RefreshOutcome.VacuousSuccess
        } else {
-            coroutineScope {
-                tasks.map { task ->
-                    async {
-                        topicFetcher.fetchTopicIfNeeded(
-                            topic = task.topic,
-                            entryId = task.entryId,
-                            topicEntry = task.entry,
-                            source = source,
-                        )
-                    }
-                }.awaitAll().firstNotNullOfOrNull { it }
-            }
+            downloadTopicsWithFallback(response.blobSources, tasks)


This is a HUGE nit, but we could defer the calculation of tasks to only when response.blobSources is not empty (that would avoid some unnecessary work). Total nit as I said, so FFTI.

ajpallares · 2026-05-13T17:11:10Z

+            return TopicFetchResult.InvalidatingFailure(
+                PurchasesError(PurchasesErrorCode.UnexpectedBackendResponseError, message),
+            )


I think this is incorrect. Returning InvalidatingFailure here makes it so that the source is invalidated, whereas it's the blob in the topic what's invalid. In addition, since this blobRef would be the same for all sources, this would effectively mean that all sources would be invalidated for all topics only because one blobRef is invalid.

First, to avoid these confusions, I would rename InvalidatingFailure to SourceInvalidatingFailure, meaning the source is invalid. Then, for this kind of error (where the blob is invalid regardless of the source) I would introduce a new TopicFetchResult meaning "this task is bad regardless of the source", maybe calling it TaskInvalid.

ajpallares · 2026-05-13T17:14:00Z

+        val previouslyCachedSourceId = cachedSourceId
+
+        while (remaining.isNotEmpty()) {
+            val source = pickNextSource(blobSources, triedIds, previouslyCachedSourceId) ?: break


Should we add any logs when using a fallback? when we switch sources, when we fall back from the cached source, or when we exhaust all sources. TopicFetcher logs individual fetch failures but the manager's decisions are invisible.

ajpallares · 2026-05-13T17:25:56Z

+    @Volatile
+    private var cachedSourceId: String? = null
+
    fun updateRemoteConfigIfNeeded(


I think I raised this in a previous PR as well, so I'm sorry if I'm repeating myself. But I worry about race conditions here.

If caller A and caller B both invoke updateRemoteConfigIfNeeded:

Thread Main:

A: updateRemoteConfigIfNeeded() → cache is stale ✓ → scope.launch { refresh() } ← returns immediately, coroutine A queued on IO pool B: updateRemoteConfigIfNeeded() → cache is stale ✓ (A hasn't written cache yet) → scope.launch { refresh() } ← returns immediately, coroutine B queued on IO pool

Then in parallel:

IO Thread 1 (coroutine A) IO Thread 2 (coroutine B)

refresh() refresh()

getRemoteConfig() … getRemoteConfig() …

read cache?.response read cache?.response

write cachedSourceId write cachedSourceId ← race

downloadTopicsWithFallback() downloadTopicsWithFallback()

write cache write cache ← race

diskCache.write(response) diskCache.write(response) ← race

JayShortway · 2026-05-14T12:26:29Z


+    private sealed class RefreshOutcome {
+        data class Success(val source: BlobSource) : RefreshOutcome()
+        object VacuousSuccess : RefreshOutcome()


I don't think many people know what this means without looking it up (unless it's just me) 😅 Maybe just EmptySuccess?

first time I've seen Vacuous as well 😄

Haha so it's not just me 😛

That’s… true 🤣 will fix. Thanks!

Hello Toni from the plane! 👋 ✈️

It was the first time for me as well. And then, I just saw this
https://github.com/RevenueCat/purchases-ios/blob/fed4baea51cb961f6238f068aa39eb0404b33316/RulesEngine/Operators/LogicOperators.swift#L33
It seems like AI likes this term 😄

Haha it's the new "delve"

tonidero mentioned this pull request May 12, 2026

Select blob source by priority and weighted random #3458

Merged

4 tasks

tonidero added the pr:other label May 12, 2026

tonidero force-pushed the toniricodiez/fall-back-across-blob-sources branch from 47a4da1 to 91699e8 Compare May 13, 2026 08:49

tonidero force-pushed the toniricodiez/select-blob-source-by-priority-and-weight branch from a29029f to 2172324 Compare May 13, 2026 08:49

tonidero force-pushed the toniricodiez/fall-back-across-blob-sources branch from 91699e8 to d88dd12 Compare May 13, 2026 09:40

tonidero force-pushed the toniricodiez/select-blob-source-by-priority-and-weight branch from 337c191 to ff1124a Compare May 13, 2026 11:09

tonidero force-pushed the toniricodiez/fall-back-across-blob-sources branch 2 times, most recently from b32ecf1 to 1b90c96 Compare May 13, 2026 11:52

tonidero marked this pull request as ready for review May 13, 2026 12:40

tonidero requested a review from a team as a code owner May 13, 2026 12:40

tonidero force-pushed the toniricodiez/select-blob-source-by-priority-and-weight branch from ff1124a to 9d28ec1 Compare May 13, 2026 13:23

tonidero force-pushed the toniricodiez/fall-back-across-blob-sources branch from 1b90c96 to 3049129 Compare May 13, 2026 13:24

tonidero force-pushed the toniricodiez/select-blob-source-by-priority-and-weight branch from 9d28ec1 to 98ba254 Compare May 13, 2026 15:19

tonidero force-pushed the toniricodiez/fall-back-across-blob-sources branch from 3049129 to 6a85f55 Compare May 13, 2026 15:19

Base automatically changed from toniricodiez/select-blob-source-by-priority-and-weight to main May 13, 2026 16:02

tonidero added 2 commits May 13, 2026 18:12

Remove outdated tests

4650c71

tonidero force-pushed the toniricodiez/fall-back-across-blob-sources branch from 6a85f55 to 4650c71 Compare May 13, 2026 16:12

ajpallares reviewed May 13, 2026

View reviewed changes

JayShortway reviewed May 14, 2026

View reviewed changes

IO Thread 1 (coroutine A)	IO Thread 2 (coroutine B)
`refresh()`	`refresh()`
`getRemoteConfig()` …	`getRemoteConfig()` …
read `cache?.response`	read `cache?.response`
write `cachedSourceId`	write `cachedSourceId` ← race
`downloadTopicsWithFallback()`	`downloadTopicsWithFallback()`
write `cache`	write `cache` ← race
`diskCache.write(response)`	`diskCache.write(response)` ← race

Conversation

tonidero commented May 12, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Tests

Checklist

Uh oh!

tonidero commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajpallares left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajpallares May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tonidero commented May 12, 2026 •

edited by cursor Bot

Loading

tonidero commented May 12, 2026 •

edited

Loading

ajpallares May 17, 2026 •

edited

Loading