Skip to content

fix: handle dpkg lock contention during apt installs#110

Merged
dushimsam merged 4 commits intomainfrom
fix/dpkg-lock
Mar 16, 2026
Merged

fix: handle dpkg lock contention during apt installs#110
dushimsam merged 4 commits intomainfrom
fix/dpkg-lock

Conversation

@dushimsam
Copy link
Copy Markdown
Collaborator

@dushimsam dushimsam commented Mar 16, 2026

fix: handle dpkg lock contention during apt installs

Problem

During agent installation on freshly provisioned VMs, apt-get operations were failing immediately when unattended-upgrades was running in the background holding the dpkg frontend lock:

E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 1952 (unattended-upgr)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?
Error: Failed to install prerequisites
Retrying Console Agent...

This was a consistent pattern on Ubuntu 24.04 (noble) VMs — During VM provisioning triggering our agent installation, but unattended-upgrades is still running in parallel, holding the dpkg lock. Since our apt-get calls had no lock-wait configured, they failed instantly and fell into a tight retry loop, exhausting all attempts before unattended-upgrades had even finished.

Root Cause

Three script paths were affected:

  • packaging/scripts/install.sh — the direct package installer (apt-get update had no lock timeout at all; only apt-get install had a partial DPkg::Lock::Timeout=300)
  • packaging/scripts/sc-console-agent-updater.sh — the auto-updater (clean, update, and install -f had no lock timeout)
  • setup_repo.sh — the bootstrap installer called during VM provisioning (all apt-get calls had no lock timeout, including the "Installing prerequisites" step visible in the error logs above)

Fix

Introduced a shared apt_get_with_retry() wrapper in each script that:

  1. Passes DPkg::Lock::Timeout=300 to every apt-get invocation, making apt wait for the lock to be released rather than failing immediately
  2. Retries up to 5 times with a 15s delay between attempts for additional resilience against transient failures

All three scripts now route every apt-get operation (update, install, clean, install -f, install --reinstall) through this wrapper.

Why 300 seconds?

unattended-upgrades on Ubuntu typically holds the dpkg frontend lock for at most ~75 seconds during a normal run (package download + configure phase on a freshly booted VM). Setting the timeout to 300 seconds gives a comfortable 4× safety margin over that maximum, ensuring apt will always wait out the lock rather than fail. Combined with up to 5 retries (25 minutes total max window), this makes the installation resilient even in worst-case scenarios such as slow mirrors or large upgrade batches.

All timeout/retry values are tunable via environment variables if needed:

Variable Default Purpose
SC_APT_LOCK_TIMEOUT 300 Seconds apt waits for dpkg lock
SC_APT_MAX_RETRIES 5 Max retry attempts per apt operation
SC_APT_RETRY_DELAY 15 Seconds between retries

Files Changed

  • packaging/scripts/install.sh
  • packaging/scripts/sc-console-agent-updater.sh
  • setup_repo.sh

Fixes #1264

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Mar 16, 2026

Greptile Summary

Adds dpkg lock contention handling across all three shell scripts that run apt-get operations, preventing immediate failures when unattended-upgrades holds the lock on freshly provisioned VMs.

  • packaging/scripts/install.sh: Adds a well-structured apt_get_with_retry() wrapper using "$@" argument passing with configurable lock timeout (300s), retry count (5), and retry delay (15s). All apt-get calls now go through this wrapper.
  • packaging/scripts/sc-console-agent-updater.sh: Adds a simpler apt_get_with_lock() wrapper (lock timeout only, no additional retry) since the script already has its own retry loops. All apt-get calls updated.
  • setup_repo.sh: Adds an eval-based apt_get_with_retry() to fit the existing log_and_run pattern. Functional but less robust than the install.sh approach.
  • All timeout/retry values are tunable via environment variables (SC_APT_LOCK_TIMEOUT, SC_APT_MAX_RETRIES, SC_APT_RETRY_DELAY).

Confidence Score: 4/5

  • This PR is safe to merge — it adds resilience to apt operations without changing any functional behavior.
  • The changes are straightforward defensive wrappers around existing apt-get calls. The install.sh and updater implementations are clean and correct. The setup_repo.sh implementation works but uses eval-based argument passing which is less robust. No breaking changes or regressions expected.
  • setup_repo.sh uses a different (eval-based) implementation pattern that is worth understanding but is functionally correct for the current call sites.

Important Files Changed

Filename Overview
packaging/scripts/install.sh Clean implementation of apt_get_with_retry using "$@" for safe argument passing. All apt-get calls now routed through the wrapper. No issues found.
packaging/scripts/sc-console-agent-updater.sh Simple apt_get_with_lock wrapper adds lock timeout to all apt-get calls. Correctly defers retry logic to the existing retry loops in the script. No issues found.
setup_repo.sh Uses eval-based apt_get_with_retry to fit within the existing log_and_run pattern. Functional but less robust than the "$@" approach in install.sh. Works correctly for current call sites.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[apt_get_with_retry called] --> B{attempt ≤ MAX_RETRIES?}
    B -->|Yes| C[Run apt-get with DPkg::Lock::Timeout]
    C --> D{Command succeeded?}
    D -->|Yes| E[Return 0 - success]
    D -->|No| F{More attempts left?}
    F -->|Yes| G[Log retry message]
    G --> H[Sleep RETRY_DELAY seconds]
    H --> I[Increment attempt]
    I --> B
    F -->|No| J[Return 1 - failure]
    B -->|No| J
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: setup_repo.sh
Line: 505-523

Comment:
**Inconsistent retry wrapper uses `eval`**

The `apt_get_with_retry` in this file takes a single string argument and uses `eval` to execute it (line 510), whereas `install.sh` uses the safer `"$@"` pattern for argument passing. This `eval`-based approach is fragile because shell metacharacters in `$apt_args` (like the `> /dev/null` redirects baked into the call sites at lines 533-534) are interpreted by `eval`, making the behavior harder to reason about and maintain.

I understand this is constrained by the existing `log_and_run` function which already uses `eval`. However, if this script is ever refactored, consider aligning all three implementations to use `"$@"` for safer argument handling.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 9578688

Comment thread setup_repo.sh
Copy link
Copy Markdown

@izabayo7 izabayo7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dushimsam dushimsam merged commit e368d81 into main Mar 16, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants