Skip to content

fstrm causing the process to block on a "stuck" socket #69

@borjam

Description

@borjam

I discovered this by testing bind 9.16 and 9.18 with dnstap.

In particular I was wondering about the scenario in which the named daemon would be feeding dnstap data to a misbehaving dnstap server process via a Unix domain socket.

So I created a dnstap server process and, after named connected to the socket, I sent it a STOP signal so that it would stop consuming data from the socket.

After running a benchark I saw it wasn't affecting named (it didn't stop serving queries, etc) but when trying to do a clean shutdown of the named process (either via rndc stop or a kill -TERM) it turns out the named process hungs until I either kill the dnstap server process or I resume it with a KILL -CONT.

I filed a bug for bind, [https://gitlab.isc.org/isc-projects/bind9/-/issues/3382] and after verifying the issue they point to a bug in fstrm. Copying from their response, "Some quick analyses show that fstrm blocks on read(2), so fstrm_iothr_destroy() hangs while waiting for pthread_join() to complete."

The bug report on ISCs Gitlaba contains a crude ktrace done on FreeBSD and a simple program they wrote to check the issue on fstrm.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions