Skip to content

Add core dump support to Docker Compose services#35

Open
RiteshKtiet wants to merge 4 commits intoosdldbt:mainfrom
RiteshKtiet:add-core-dump-support
Open

Add core dump support to Docker Compose services#35
RiteshKtiet wants to merge 4 commits intoosdldbt:mainfrom
RiteshKtiet:add-core-dump-support

Conversation

@RiteshKtiet
Copy link
Copy Markdown

Hey @markwkm, with reference to issue #34, I have implemented a minimal solution for core dump support:

Changes:

  • Add ulimits: core: -1 to broker, market, and driver services
  • Add ./cores:/cores volume mount to persist core dumps on host
  • Add ulimit -c unlimited in startup scripts
  • Update .gitignore and .dockerignore to exclude cores/

If you are using WSL, you'll need to configure the host:

sudo sysctl -w kernel.core_pattern="$(pwd)/cores/core.%e.%p.%t"

Another option is using privileged: true, but i guess that would be less secure.

Test:

docker compose exec broker bash -c 'cat > /tmp/crash.c << "EOF"
#include <stdio.h>
int main() { int *p = NULL; *p = 42; return 0; }
EOF
gcc -g /tmp/crash.c -o /tmp/crash && /tmp/crash'

ls -lh cores/  # Should show core dump file

I hope this solves the purpose. Should I add documentation if this approach is acceptable?

- Enable unlimited core dumps via ulimits in broker, market, and driver services
- Add volume mount for ./cores directory to persist core dumps on host
- Add ulimit -c unlimited in service startup scripts
- Update .gitignore to exclude cores/ directory
- Update .dockerignore to exclude cores/ directory from build context
- Add ulimit configuration to base image bashrc

This allows automatic core dump generation when services crash.

Host configuration may be required on systems with systemd-coredump
or other crash handlers. Will add it in the docs later on.
@markwkm
Copy link
Copy Markdown
Contributor

markwkm commented Mar 14, 2026

I hope this solves the purpose. Should I add documentation if this approach is acceptable?

Two things items to work on. You should add documentation, don't need to ask if it's acceptable. I don't think this takes into account my comment about in #34 about the the host and guest OS matching.

@Riteshk1314
Copy link
Copy Markdown

Hey @markwkm, firstly sorry for the delay. I have made a few changes which I feel will address all the issues.

When a binary crashes inside a container, core dumps are automatically captured and persisted to ./cores/ on the host. Both container-side and local (host-side) debugging are supported, even when the host and container run different distros.

Here are the changes that I have made in the docker file:-

Key Changes

Change Why
gdb + libc6-dbg added to base image Provides a debugger and glibc debug symbols inside the container.
-DCMAKE_BUILD_TYPE=RelWithDebInfo on all services Embeds debug symbols so GDB can show function names, source files, and line numbers instead of raw addresses.
ulimits: core: -1 on broker, market, load, driver Docker defaults to ulimit -c 0 (discard core dumps). This removes that limit.
./cores:/cores volume mount Persists core dumps to the host so they survive container restarts.
cd /cores && ulimit -c unlimited before each exec Sets the working directory so core_pattern=core.%e.%p writes dumps into the mounted volume, and opts the shell into generating dumps.
Binary + shared library copy into ./cores/sysroot/ Enables local debugging by providing the exact binary and container libraries to GDB on the host.

The database service is intentionally excluded because PostgreSQL manages its own crash handling via docker-entrypoint.sh.

How the Host/Guest OS Mismatch Is Solved

A core dump is just a memory snapshot. To read it, GDB needs the exact binary and exact shared libraries that were loaded when the crash happened. If the host runs a diff os say, Ubuntu 24.04 and since the container runs Debian Bookworm, the library versions differ (libc, libssl, libpq, etc.). Loading wrong libraries produces garbage backtraces like:

#0  two_way_long_needle () at str-two-way.h:438
#1  init () at fmtmsg.c:266
Backtrace stopped: corrupt stack?

So we solve it by :-

  1. Copying the binary to ./cores/ at service startup.
  2. Copying every shared library (found via ldd) into ./cores/sysroot/, preserving the container's full directory layout:
    ./cores/sysroot/
    ├── lib/x86_64-linux-gnu/
    │   ├── libc.so.6          (container's version)
    │   ├── libstdc++.so.6
    │   └── ...
    ├── usr/lib/x86_64-linux-gnu/
    │   └── libpq.so.5
    └── opt/egen/bin/
        └── BrokerageHouseMain
    
  3. Using set sysroot when running GDB locally. This tells GDB to prefix all library paths with the sysroot directory, so it loads ./cores/sysroot/lib/x86_64-linux-gnu/libc.so.6 instead of the host's /lib/x86_64-linux-gnu/libc.so.6.

Thus, a clean backtrace regardless of host distro:

#0  ?? () from ./sysroot/lib/x86_64-linux-gnu/libc.so.6
#1  raise () from ./sysroot/lib/x86_64-linux-gnu/libc.so.6
#2  abort () from ./sysroot/lib/x86_64-linux-gnu/libc.so.6
#3  main (argc=9, argv=0x7ffc9618c628) at BrokerageHouseMain.cpp:122

Usage

1. Host prerequisite (one-time)

sudo sysctl -w kernel.core_pattern=core.%e.%p
# Ubuntu only — disable Apport if it intercepts core dumps:
sudo systemctl disable apport && sudo systemctl stop apport

2. Start services

docker compose up -d
docker compose run load           # first time only
docker compose run driver -d 120  # run workload

3. After a crash

ls ./cores/
# BrokerageHouseMain  core.BrokerageHouseM.483  sysroot/

4a. Debug inside the container (easier)

docker compose exec broker \
    gdb /opt/egen/bin/BrokerageHouseMain /cores/core.BrokerageHou.<pid>

4b. Debug locally on the host

sudo chmod 644 ./cores/core.BrokerageHou.<pid>
cd cores/
gdb -ex "set sysroot ./sysroot" \
    ./BrokerageHouseMain ./core.BrokerageHou.<pid>

5. Inside GDB

bt full                 # full backtrace with variable values
info threads            # list all threads
thread apply all bt     # backtrace for every thread

I hope this meets all your expectations (Although I have a feeling that this a little complex solution). Let me know if I should make any change, I'll do it asap!

@markwkm
Copy link
Copy Markdown
Contributor

markwkm commented Mar 22, 2026

Yeah, the complexity bothers me a little, because it doesn't look natural. i.e. Docker really isn't meant to be used like that. Yet I say that as someone not particularly Docker savvy.

Although I still wonder if there is a way to fall through coredumpctl to output or save any cores. While I like the sound of that thought, I don't know if that's any more possible to expect to be doable.

@Riteshk1314
Copy link
Copy Markdown

Thanks for the feedback! I have tried to simplify the approach a little.

On the coredumpctl question: tested it and here is how it works.
Since kernel.core_pattern is a kernel-level setting shared across all containers, crashes inside containers are captured automatically by the host's systemd-coredump. No special host setup needed.

Debugging workflow

When a container crashes, coredumpctl list on the host shows the crash with the correct executable name and the Docker control group confirming it came from the container.

coredumpctl info <PID> gives the signal and a partial backtrace, but only partial because the binary lives inside the container so the host cannot resolve debug symbols to get full function names and line numbers.

To get the full backtrace, extract the core with
sudo coredumpctl dump <PID> -o /tmp/core.file and debug inside the container where the binary and libs match the core exactly:

  1. List crashes: coredumpctl list
  2. Inspect signal and partial backtrace: coredumpctl info <PID>
  3. Extract the core: sudo coredumpctl dump <PID> -o /tmp/core.file
  4. Debug inside the container:
docker compose run --no-deps --rm \
  -v /tmp/core.file:/tmp/core.file \
  broker \
  gdb /opt/egen/bin/BrokerageHouseMain /tmp/core.file

basically: “spin up a fresh broker container, bring the core file into it, and launch GDB where everything matches.”

This still uses Docker in a slightly “creative” way. Let me know if this bothers you I’ll refactor the PR in that case.
Full workflow also documented as comments at the top of compose.yaml.

@markwkm
Copy link
Copy Markdown
Contributor

markwkm commented Mar 24, 2026

It would be additionally helpful if you commented on your experiences about using the various methods you are proposing.

@Riteshk1314
Copy link
Copy Markdown

Hey @markwkm, here's what I experienced testing the different approaches:

Sysroot copy (earlier iteration)

This worked when host and container distros didn't match, copied the binary + shared libs via ldd into ./cores/sysroot/, then used set sysroot in GDB on the host. Got clean backtraces, but honestly it felt hacky. You have to redo the copy every time you rebuild the image, and having a partial rootfs sitting next to your core files isn't great. Fine for a one-off debug session, not something I'd want baked into a default compose setup.

coredumpctl + docker compose run (current approach)

This was a better experience, On my Ubuntu 22.04 setup, crashes inside containers just showed up in coredumpctl list automatically since kernel.core_pattern is kernel-level. Did coredumpctl dump to extract the core, mounted it into the container, and GDB gave clean traces right away, so no symbol mismatch headaches since the container has the exact binary and libs that produced the core.

One thing to note, for Ubuntu desktop, Apport likes to hijack core_pattern, so I had to disable that first. Noted it in the compose.yaml comments.

On WSL

This was the messiest of the three. WSL2 doesn't run systemd by default so coredumpctl just isn't there. Had to manually set kernel.core_pattern to a writable path, and even then the behavior felt flaky compared to a native Linux host cores sometimes didn't land where expected until I got the path and permissions just right. It works once you get past the setup, but definitely the least smooth experience.

Overall method 2 was better, it piggybacks on what the host already does.
I'll be happy to work any changes you have to suggest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants