Skip to content

Add a back-off for Muzzle Version Range#11433

Open
bric3 wants to merge 1 commit into
masterfrom
bdu/muzzle-version-range-back-off
Open

Add a back-off for Muzzle Version Range#11433
bric3 wants to merge 1 commit into
masterfrom
bdu/muzzle-version-range-back-off

Conversation

@bric3
Copy link
Copy Markdown
Contributor

@bric3 bric3 commented May 21, 2026

What Does This Do

Adds a back-off for muzzle version range checks.

Motivation

Several muzzle jobs are failing and just need to be retried. This creates the burden to watch for transient network failures that could be automatically recovered.

Additional Notes

Contributor Checklist

Jira ticket: [PROJ-IDENT]

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

@bric3 bric3 requested a review from a team as a code owner May 21, 2026 10:09
@bric3 bric3 added tag: no release notes Changes to exclude from release notes comp: tooling Build & Tooling labels May 21, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 35b427b2ef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread buildSrc/src/main/kotlin/datadog/gradle/plugin/muzzle/MuzzleMavenRepoUtils.kt Outdated
@datadog-datadog-prod-us1

This comment has been minimized.

@bric3 bric3 force-pushed the bdu/muzzle-version-range-back-off branch from 35b427b to 6585632 Compare May 21, 2026 10:46
@bric3 bric3 requested a review from AlexeyKuznetsov-DD May 22, 2026 15:18
Copy link
Copy Markdown
Contributor

@AlexeyKuznetsov-DD AlexeyKuznetsov-DD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but WDYT about adding detection of 429 errors and log them for more visibility?
Something like this:

  private fun Throwable.isRateLimited(): Boolean {
    var t: Throwable? = this
    while (t != null) {
      val msg = t.message ?: ""
      if ("429" in msg || "Too Many Requests" in msg) return true
      t = t.cause
    }
    return false
  }

And also WDYT about fallback to real Maven Central if empty result from proxy received, Claude generated something like this:

 /**
   * Resolves the version range for a given MuzzleDirective using the provided RepositorySystem and RepositorySystemSession.
   * Retries up to 4 times with a 5 s linear backoff on both empty results and HTTP 429 responses.
   * When a 429 carries a Retry-After header, that value overrides the calculated wait.
   * If all attempts against the configured repos return empty (not rate-limited) and
   * MAVEN_REPOSITORY_PROXY is set, one additional attempt is made directly against Maven Central
   * to handle cases where the proxy has a cache miss but Central has the artifact.
   */
  fun resolveVersionRange(
    muzzleDirective: MuzzleDirective,
    system: RepositorySystem,
    session: RepositorySystemSession,
    defaultRepos: List<RemoteRepository> = defaultMuzzleRepos()
  ): VersionRangeResult {
    val directiveArtifact: Artifact = DefaultArtifact(
      muzzleDirective.group,
      muzzleDirective.module,
      muzzleDirective.classifier ?: "",
      "jar",
      muzzleDirective.versions
    )
    val repos = muzzleDirective.getRepositories(defaultRepos)

    var rateLimited = false
    var lastRateLimitException: Exception? = null

    for (attempt in 0..3) {
      val rangeRequest = VersionRangeRequest().apply {
        repositories = repos
        artifact = directiveArtifact
      }
      try {
        val range = system.resolveVersionRange(session, rangeRequest)
        if (range.lowestVersion != null && range.highestVersion != null) return range
        // empty result from proxy — wait before retrying, unless this was the last attempt
        if (attempt < 3) Thread.sleep((attempt + 1) * 5_000L)
      } catch (e: VersionRangeResolutionException) {
        if (!e.isRateLimited()) throw e
        rateLimited = true
        lastRateLimitException = e
        if (attempt < 3) {
          // honour Retry-After from the 429 response if present, otherwise linear backoff
          val retryAfter = e.retryAfterSeconds()
          val waitMs = retryAfter?.times(1000L) ?: ((attempt + 1) * 5_000L)
          logger.warn(
            "Muzzle: HTTP 429 rate limit detected for " +
              "${muzzleDirective.group}:${muzzleDirective.module} ${muzzleDirective.versions} " +
              "(attempt ${attempt + 1}/4). " +
              (if (retryAfter != null) "Retry-After: ${retryAfter}s. " else "") +
              "Waiting ${waitMs / 1000}s before retrying."
          )
          Thread.sleep(waitMs)
        } else {
          logger.warn(
            "Muzzle: HTTP 429 rate limit detected for " +
              "${muzzleDirective.group}:${muzzleDirective.module} ${muzzleDirective.versions} " +
              "(attempt ${attempt + 1}/4). All retries exhausted."
          )
        }
      }
    }

    // Explicit Maven Central fallback: when the proxy returned empty results (not a rate-limit
    // error), try Central directly — the proxy may have a cache miss while Central has the data.
    if (!rateLimited && System.getenv("MAVEN_REPOSITORY_PROXY") != null) {
      logger.warn(
        "Muzzle: proxy returned empty version range for " +
          "${muzzleDirective.group}:${muzzleDirective.module} ${muzzleDirective.versions} " +
          "after 4 attempts. Falling back to Maven Central directly."
      )
      val central = RemoteRepository.Builder("central", "default", "https://repo1.maven.org/maven2/").build()
      val fallbackRequest = VersionRangeRequest().apply {
        repositories = muzzleDirective.getRepositories(listOf(central))
        artifact = directiveArtifact
      }
      val fallback = system.resolveVersionRange(session, fallbackRequest)
      if (fallback.lowestVersion != null && fallback.highestVersion != null) return fallback
    }

    if (rateLimited) {
      throw IllegalStateException(
        "The version range resolution failed due to rate limiting (HTTP 429). " +
          "Advised course of action: Restart the job later.",
        lastRateLimitException
      )
    }
    throw IllegalStateException(
      "The version range resolution failed during report, this is not expected. " +
        "Advised course of action: Restart the job later."
    )
  }

  private fun Throwable.isRateLimited(): Boolean {
    var t: Throwable? = this
    while (t != null) {
      val msg = t.message ?: ""
      if ("429" in msg || "Too Many Requests" in msg) return true
      t = t.cause
    }
    return false
  }

  /** Returns the Retry-After value in seconds if the 429 response included it. */
  private fun Throwable.retryAfterSeconds(): Long? {
    var t: Throwable? = this
    while (t != null) {
      Regex("""[Rr]etry-[Aa]fter:\s*(\d+)""").find(t.message ?: "")
        ?.groupValues?.get(1)?.toLongOrNull()
        ?.let { return it }
      t = t.cause
    }
    return null
  }

Copy link
Copy Markdown
Contributor

@AlexeyKuznetsov-DD AlexeyKuznetsov-DD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but consider to apply any ideas of improvements I suggested in my comment.

@bric3
Copy link
Copy Markdown
Contributor Author

bric3 commented May 22, 2026

@AlexeyKuznetsov-DD I'm not sure, the suggest code assumes there's a VersionRangeResolutionException being thrown, but I don't remember seeing one (maybe there are). Usually the retries fail, and then end up throwing our own exception. Maybe this can happen but I need to look at the aether-api doc / code.

@AlexeyKuznetsov-DD
Copy link
Copy Markdown
Contributor

@AlexeyKuznetsov-DD I'm not sure, the suggest code assumes there's a VersionRangeResolutionException being thrown, but I don't remember seeing one (maybe there are). Usually the retries fail, and then end up throwing our own exception. Maybe this can happen but I need to look at the aether-api doc / code.

@bric3 Let's try simple approach first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: tooling Build & Tooling tag: no release notes Changes to exclude from release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants