Skip to content

Nil pointer panic in sendRequest.done() during producer reconnection via FailTimeoutMessages #1497

@lin-sh

Description

@lin-sh

Search before asking

  • I searched in the issues and found no similar issues.

Version

v0.17.0

Minimal reproduction steps

  1. Create an async producer with SendTimeout configured
  2. Trigger sustained broker unavailability (all brokers return timeout for sends)
  3. Wait for the producer to attempt reconnection (newPartitionProducer)
  4. During reconnection, FailTimeoutMessages is called to clean up pending messages
  5. Process panics with nil pointer dereference in sendRequest.done()

Expected behavior

FailTimeoutMessages should safely fail all pending messages without panicking, even when MessageID is nil.

Actual behavior

The process panics with:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x19df0ef]

goroutine 1079 [running]:
github.com/apache/pulsar-client-go/pulsar.(*sendRequest).done(0xc0a657e2c0, {0x0, 0x0}, {0x248fea0, 0xc000736750})
    /root/go/pkg/mod/github.com/apache/pulsar-client-go@v0.17.0/pulsar/producer_partition.go:1656 +0x1cf
github.com/apache/pulsar-client-go/pulsar.(*partitionProducer).FailTimeoutMessages(0xc005436000)
    /root/go/pkg/mod/github.com/apache/pulsar-client-go@v0.17.0/pulsar/producer_partition.go:1006 +0x4ea
github.com/apache/pulsar-client-go/pulsar.newPartitionProducer in goroutine 724
    /root/go/pkg/mod/github.com/apache/pulsar-client-go@v0.17.0/pulsar/producer_partition.go:239 +0xa3b

Analysis

  • The panic occurs inside sendRequest.done() at line 1656 of producer_partition.go
  • MessageID passed to done() is nil ({0x0, 0x0} in the stack trace)
  • The nil dereference is at offset 0x28 (40 bytes), suggesting a field access on the nil MessageID
  • This is triggered during producer reconnection when newPartitionProducer (line 239) calls FailTimeoutMessages (line 1006) to clean up timed-out pending messages
  • The user callback is never reached — the panic happens inside done() before invoking the callback

Context

  • We are aware of PR fix: normalize all send request resource release into sr.done #1121 which normalized sendRequest resource release into sr.done(). This fix is included in v0.17.0, but the panic still occurs.
  • The issue happens under heavy timeout load (multiple brokers simultaneously returning send timeout errors)
  • This crash kills the entire Go process since the panic occurs in an internal library goroutine that we cannot recover from

Suggested fix

Add a nil check for MessageID inside sendRequest.done() before accessing any of its fields, or ensure FailTimeoutMessages passes a non-nil sentinel MessageID when failing messages.

Are you willing to submit a PR?

  • [] I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions