Discussion: multiple TCP connections per peer pair

## Context

The current transport (see `ARCHITECTURE.md` §4 "TCPTransport") uses **one TCP connection per peer pair**, with all Raft groups multiplexed by `group_id`. This keeps connection count O(N²) in cluster size rather than O(N² × groups), which is essential for multi-raft scaling.

The trade-off is **head-of-line blocking** on the shared connection: a slow large message on group A delays a small heartbeat on group B. The 1 MB `max_append_entries_size` cap mitigates the worst case but doesn't eliminate the underlying issue.

For workloads where individual user messages are large (e.g. AMQP brokers where messages can be many MB), this trade-off becomes a real concern.

## Mitigation paths

`ARCHITECTURE.md` §6 "What is intentionally not optimized (yet)" lays out four options for the HOL problem:

1. **Don't put bodies in the Raft log.** Keep metadata + reference in Raft; replicate bodies out-of-band. This is what RabbitMQ quorum queues do via the shared message store. Recommended for the AMQP integration. *Application-level decision; out of scope for the transport.*
2. **Multiple TCP connections per peer pair** — typically a "control" connection (heartbeats, votes, small AppendEntries) plus a "bulk" connection. Cheap to implement, retains most of the multi-raft connection-count benefit. **This issue.**
3. **gRPC / HTTP/2 streams** — partial benefit; TCP-layer HOL persists; significant complexity.
4. **QUIC** — eliminates TCP HOL but immature for this workload.

## Open questions for discussion

- **Is option 2 the right fix, or should we lean on option 1 instead** for AMQP integrations and accept the current single-connection design for everything else?
- **If we add a second connection, what's the routing rule?**
  - Size-based (entries > N bytes go on the bulk connection)?
  - Type-based (heartbeats + votes always on control)?
  - Priority-based (a `priority` field on `Message`)?
- **Per-group or per-peer split?** Per-peer is simpler; per-group affinity might give better isolation but multiplies connection count.
- **Connection lifecycle.** Both connections need symmetric lifecycle, error handling, and reconnect logic — the existing per-peer fiber model doubles.
- **Interaction with heartbeat aggregation** (see related issue) — if heartbeats are batched into one small message per peer per interval, the HOL pressure on the control plane drops significantly, possibly making this less urgent.

## Why this is a discussion, not a fix

The right answer depends on:

- The expected message-size distribution for the target workload.
- Whether option 1 (bodies outside the log) is being pursued in parallel for AMQP.
- The complexity budget for the transport layer.

This is opening the conversation, not committing to a design.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: multiple TCP connections per peer pair #4

Context

Mitigation paths

Open questions for discussion

Why this is a discussion, not a fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Discussion: multiple TCP connections per peer pair #4

Description

Context

Mitigation paths

Open questions for discussion

Why this is a discussion, not a fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions