Skip to content

Add a metric for geo replication for tracking replicated subscriptions snapshot timeouts #21793

@lhotari

Description

@lhotari

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Geo replication replicated subscriptions (PIP-33) snapshot creation might time out.
The code contains a debug log message when this happens:

log.debug("[{}] Snapshot creation timed out for {}", topic.getName(), entry.getKey());

When this happens, the subscription state won't be reflected on the remote side and a backlog would build up.
There's no metric to detect this situation.

Solution

Add a new metric pulsar_replicated_subscriptions_snapshot_timeouts which is a counter (that only resets when the broker restarts).

Alternatives

No response

Anything else?

Increasing the timeout threshold replicatedSubscriptionsSnapshotTimeoutSeconds=30 -> replicatedSubscriptionsSnapshotTimeoutSeconds=60 could help resolve the situation. This metric would help detect when it would be necessary.

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/enhancementThe enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions