Skip to content

nixos-tests: disable SQLite WAL to prevent SIGBUS in CI#1615

Open
amaanq wants to merge 1 commit intoNixOS:masterfrom
obsidiansystems:sigbus-fix
Open

nixos-tests: disable SQLite WAL to prevent SIGBUS in CI#1615
amaanq wants to merge 1 commit intoNixOS:masterfrom
obsidiansystems:sigbus-fix

Conversation

@amaanq
Copy link
Copy Markdown
Member

@amaanq amaanq commented Mar 30, 2026

Problem

SQLite in WAL mode mmaps a shared memory file that can fault under concurrent
access, which kills nix with SIGBUS.

Solution

Disabling WAL eliminates the shared memory entirely, which will never allow for the fault to occur anymore.

Additional Context

This was the cause for spurious CI failures we've been seeing recently (1 2)

I spammed CI in my fork with a sigbus handler to actually find the root cause of this....see here :) https://github.com/amaanq/hydra/actions/runs/23701271798/job/69045258619?pr=1#step:5:8309

Relevant snippet
( STDERR )  job 56    *** SIGBUS (Bus error) at address 0x0000fffff75f4000
( STDERR )  job 56    hydra-queue-runner(+0x39f434) [0xaaaaaae3f434]
( STDERR )  job 56    linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xfffff7ffa850]
( STDERR )  job 56    /nix/store/2z8w3q6z3yskyj2ng3bga5h7x30sxdab-glibc-2.40-66/lib/libc.so.6(+0xaedb8) [0xfffff73dedb8]
( STDERR )  job 56    /nix/store/1b3almilx7nyrqacfrgnsmayja1dyrdc-sqlite-3.50.4/lib/libsqlite3.so(+0x8251c) [0xfffff707251c]
( STDERR )  job 56    /nix/store/1b3almilx7nyrqacfrgnsmayja1dyrdc-sqlite-3.50.4/lib/libsqlite3.so(+0x828fc) [0xfffff70728fc]
( STDERR )  job 56    /nix/store/1b3almilx7nyrqacfrgnsmayja1dyrdc-sqlite-3.50.4/lib/libsqlite3.so(+0xae6b4) [0xfffff709e6b4]
( STDERR )  job 56    /nix/store/1b3almilx7nyrqacfrgnsmayja1dyrdc-sqlite-3.50.4/lib/libsqlite3.so(+0xaf338) [0xfffff709f338]
( STDERR )  job 56    /nix/store/1b3almilx7nyrqacfrgnsmayja1dyrdc-sqlite-3.50.4/lib/libsqlite3.so(+0xeb5c4) [0xfffff70db5c4]
( STDERR )  job 56    /nix/store/1b3almilx7nyrqacfrgnsmayja1dyrdc-sqlite-3.50.4/lib/libsqlite3.so(sqlite3_step+0x29c) [0xfffff70e18bc]
( STDERR )  job 56    /nix/store/m016npyg40c9qin11s0qwhv1rzibvgv7-nix-store-2.34.1/lib/libnixstore.so.2.34.1(_ZN3nix10SQLiteStmt3Use4nextEv+0x34) [0xfffff7c934b4]
( STDERR )  job 56    /nix/store/m016npyg40c9qin11s0qwhv1rzibvgv7-nix-store-2.34.1/lib/libnixstore.so.2.34.1(_ZN3nix10LocalStore21queryPathInfoInternalERNS0_5StateERKNS_9StorePathE+0xb8) [0xfffff7c2c998]
( STDERR )  job 56    /nix/store/m016npyg40c9qin11s0qwhv1rzibvgv7-nix-store-2.34.1/lib/libnixstore.so.2.34.1(_ZN3nix10LocalStore21queryPathInfoUncachedERKNS_9StorePathENS_8CallbackISt10shared_ptrIKNS_13ValidPathInfoEEEE+0x90) [0xfffff7c2d110]
( STDERR )  job 56    /nix/store/m016npyg40c9qin11s0qwhv1rzibvgv7-nix-store-2.34.1/lib/libnixstore.so.2.34.1(_ZN3nix5Store13queryPathInfoERKNS_9StorePathENS_8CallbackINS_3refIKNS_13ValidPathInfoEEEEE+0x3a8) [0xfffff7ca6038]
( STDERR )  job 56    /nix/store/m016npyg40c9qin11s0qwhv1rzibvgv7-nix-store-2.34.1/lib/libnixstore.so.2.34.1(_ZN3nix5Store13queryPathInfoERKNS_9StorePathE+0x110) [0xfffff7ca6490]
( STDERR )  job 56    /nix/store/m016npyg40c9qin11s0qwhv1rzibvgv7-nix-store-2.34.1/lib/libnixstore.so.2.34.1(+0x225d00) [0xfffff7c45d00]
( STDERR )  job 56    /nix/store/m016npyg40c9qin11s0qwhv1rzibvgv7-nix-store-2.34.1/lib/libnixstore.so.2.34.1(_ZN3nix5Store13topoSortPathsERKSt3setINS_9StorePathESt4lessIS2_ESaIS2_EE+0x130) [0xfffff7c3b030]
( STDERR )  job 56    /nix/store/m016npyg40c9qin11s0qwhv1rzibvgv7-nix-store-2.34.1/lib/libnixstore.so.2.34.1(_ZN3nix9copyPathsERNS_5StoreES1_RKSt3setINS_9StorePathESt4lessIS3_ESaIS3_EENS_10RepairFlagENS_13CheckSigsFlagENS_14SubstituteFlagE+0x1b0) [0xfffff7cab970]
( STDERR )  job 56    hydra-queue-runner(+0xcec488) [0xaaaaab78c488]
( STDERR )  job 56    hydra-queue-runner(+0xce22cc) [0xaaaaab7822cc]
( STDERR )  job 56    hydra-queue-runner(+0xce16ac) [0xaaaaab7816ac]
( STDERR )  job 56    hydra-queue-runner(+0x4a7098) [0xaaaaaaf47098]
( STDERR )  job 56    hydra-queue-runner(+0x38ef38) [0xaaaaaae2ef38]
( STDERR )  job 56    hydra-queue-runner(+0x317864) [0xaaaaaadb7864]
( STDERR )  job 56    hydra-queue-runner(+0xd4add8) [0xaaaaab7eadd8]
( STDERR )  job 56    hydra-queue-runner(+0xd6b4cc) [0xaaaaab80b4cc]
( STDERR )  job 56    hydra-queue-runner(+0xd6bdb4) [0xaaaaab80bdb4]
( STDERR )  job 56    hydra-queue-runner(+0xe2289c) [0xaaaaab8c289c]
( STDERR )  job 56    /nix/store/2z8w3q6z3yskyj2ng3bga5h7x30sxdab-glibc-2.40-66/lib/libc.so.6(+0x901ec) [0xfffff73c01ec]
( STDERR )  job 56    /nix/store/2z8w3q6z3yskyj2ng3bga5h7x30sxdab-glibc-2.40-66/lib/libc.so.6(+0x10034c) [0xfffff743034c]
( STDERR )  job 56    2026-03-29T04:30:45.539314Z ERROR start_bidirectional_stream: hydra_builder::grpc: stream message delivery failed: code: 'Unknown error', message: "h2 protocol error: error reading a body from connection", source: hyper::Error(Body, Error { kind: Io(Custom { kind: BrokenPipe, error: "stream closed because of a broken pipe" }) })
( STDERR )  job 56    2026-03-29T04:30:45.539342Z ERROR start_bidirectional_stream: hydra_builder::grpc: stream message delivery failed: code: 'Unknown error', message: "h2 protocol error: error reading a body from connection", source: hyper::Error(Body, Error { kind: Io(Custom { kind: BrokenPipe, error: "stream closed because of a broken pipe" }) })
( STDERR )  job 56    2026-03-29T04:30:45.563882Z ERROR process_build: hydra_builder::state: error=Import failure: `code: 'The service is currently unavailable', message: "tcp connect error", source: tonic::transport::Error(Transport, ConnectError(ConnectError("tcp connect error", [::1]:7001, Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })))` timings=BuildTimings { import_elapsed: 0ns, build_elapsed: 0ns, upload_elapsed: 0ns } drv=zym9dr516lin3f31z2kmy6pkgj0a96ka-out-is-directory.drv
( STDERR )  job 56    2026-03-29T04:30:45.563936Z ERROR hydra_builder::state: Build of zym9dr516lin3f31z2kmy6pkgj0a96ka-out-is-directory.drv failed with Import failure: `code: 'The service is currently unavailable', message: "tcp connect error", source: tonic::transport::Error(Transport, ConnectError(ConnectError("tcp connect error", [::1]:7001, Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })))`
( STDERR )  job 56    2026-03-29T04:30:45.563967Z ERROR hydra_builder::state: Failed to submit build failure info: err=code: 'Unknown error', message: "Service was not ready: transport error", retrying in=4.850542545s

Note that it is impossible to debug these without such a handler, but I don't think this should be installed by default here.

Alternatives Considered

I'd considered setting exclusive locking mode for sqlite in Nix upstream via PRAGMA locking_mode = EXCLUSIVE, as it'll keep the WAL-index in heap memory rather than in a mmapped shared memory file, but this has a huge downside of each process locking the database as the index cannot be shared across multiple processes. In practice I doubt it would have much of an effect as I imagine most users aren't running many concurrent Nix processes that need database access, .but in Hydra we ran into this SIGBUS due to our concurrent VM tests spawning multiple nix processes.

I'm not too particularly happy with this solution, I'd also thought of maybe just increasing the VM's memory to 2GB but that isn't guaranteed to fix it depending on whether the cause for the faults was the file being truncated or the kernel evicting the backing page.

SQLite in WAL mode mmaps a shared memory file that can fault under concurrent
access, which kills nix with SIGBUS. This has been causing spurious CI
failures with no useful logs occasioanlly. Disabling WAL eliminates the shared
memory file entirely.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant