lfq

Languages: English | 简体中文 | 日本語 | Español | Français

Lock-free and wait-free FIFO queue implementations for Go.

Overview

Package lfq provides lock-free and wait-free bounded FIFO queues optimized for different producer/consumer patterns (SPSC, MPSC, SPMC, MPMC), ensuring predictable, zero-allocation scaling under high contention.

Theory and Background

lfq builds on foundational concurrency research to overcome the limitations of traditional mutexes and CAS (Compare-And-Swap) loops:

Lamport Ring Buffer (1977): Powers our Single-Producer/Single-Consumer (SPSC) paths, offering strictly $O(1)$ wait-free latency without expensive atomic read-modify-write instructions.
Scalable Circular Queue (SCQ, 2019): Our multi-party (MPSC, SPMC, MPMC) paths are implemented based on Ruslan Nikolaev's SCQ algorithm. By utilizing hardware Fetch-And-Add (FAA) instructions instead of CAS loops, lfq inherently avoids contention bottlenecks and ABA problems, operating as a standalone queue without external safe memory reclamation (e.g., hazard pointers).

The `iox` Non-Blocking I/O Stacks

lfq is built for iox-based non-blocking I/O stacks. It eschews implicit runtime blocking in favor of explicit backpressure. If a queue is full or empty, it immediately returns ErrWouldBlock. This concise error model integrates perfectly with event loops and caller-side backoff primitives (like iox.Backoff), allowing predictable traffic management and sub-microsecond latency boundaries.

// Direct constructor (recommended for most cases)
q := lfq.NewSPSC[Event](1024)

// Builder API - auto-selects algorithm based on constraints
q := lfq.Build[Event](lfq.New(1024).SingleProducer().SingleConsumer())  // → SPSC
q := lfq.Build[Event](lfq.New(1024).SingleConsumer())                   // → MPSC
q := lfq.Build[Event](lfq.New(1024).SingleProducer())                   // → SPMC
q := lfq.Build[Event](lfq.New(1024))                                    // → MPMC

Installation

go get code.hybscloud.com/lfq

Requirements: Go 1.26+

Compiler Requirement

For better performance, compile with the intrinsics-optimized Go compiler:

# Using Makefile (recommended)
make install-compiler   # Download pre-built release (~30 seconds)
make build              # Build with intrinsics compiler
make test               # Test with intrinsics compiler

# Or build compiler from source (bleeding-edge)
make install-compiler-source

Manual installation:

# Pre-built release (recommended)
OS=$(uname -s | tr '[:upper:]' '[:lower:]')
ARCH=$(uname -m | sed 's/x86_64/amd64/;s/aarch64/arm64/')
URL=$(curl -fsSL https://api.github.com/repos/hayabusa-cloud/go/releases/latest | grep "browser_download_url.*${OS}-${ARCH}\.tar\.gz\"" | cut -d'"' -f4)
curl -fsSL "$URL" | tar -xz -C ~/sdk
mv ~/sdk/go ~/sdk/go-atomix

# Use for building lfq-dependent code
GOROOT=~/sdk/go-atomix ~/sdk/go-atomix/bin/go build ./...

The intrinsics compiler inlines atomix operations with proper memory ordering. The standard Go compiler works for basic testing but may exhibit issues under high contention.

Queue Types

Type	Pattern	Progress Guarantee	Use Case
SPSC	Single-Producer Single-Consumer	Wait-free	Pipeline stages, channels
MPSC	Multi-Producer Single-Consumer	Lock-free (wait-free dequeue)	Event aggregation, logging
SPMC	Single-Producer Multi-Consumer	Lock-free (wait-free enqueue)	Work distribution
MPMC	Multi-Producer Multi-Consumer	Lock-free	General purpose

Progress Guarantees

Wait-free: Every operation completes in bounded steps
Lock-free: System-wide progress guaranteed; at least one thread makes progress

Algorithms

SPSC: Lamport Ring Buffer

Classic bounded buffer with cached index optimization.

q := lfq.NewSPSC[int](1024)

// Producer
q.Enqueue(&value)  // Wait-free O(1)

// Consumer
elem, err := q.Dequeue()  // Wait-free O(1)

MPSC/SPMC/MPMC: FAA-Based (Default)

By default, multi-access queues implement the SCQ (Scalable Circular Queue) algorithm using FAA (Fetch-And-Add) instructions. FAA blindly increments position counters, requiring 2n physical slots for capacity n, but scales better under high contention than CAS-based alternatives.

Trade-off: Requires 2n physical slots for nominal capacity n.
Transient Capacity: With P concurrent producers, up to P-1 additional items may be transiently enqueued beyond Cap() before backpressure applies.

// Multiple producers, single consumer
q := lfq.NewMPSC[Event](1024)  // FAA producers, wait-free dequeue

// Single producer, multiple consumers
q := lfq.NewSPMC[Task](1024)   // Wait-free enqueue, FAA consumers

// Multiple producers and consumers
q := lfq.NewMPMC[*Request](4096)  // FAA-based SCQ algorithm

Cycle-based slot validation provides ABA safety without epoch counters or hazard pointers.

Indirect/Ptr Variants: 128-bit Atomic Operations

Indirect and Ptr queue variants (all non-SPSC variants except Compact Indirect) pack sequence number and payload into a single 128-bit atomic. This reduces cache line contention and improves throughput under high concurrency.

For Ptr variants that encode pointers into 128-bit slots (FAA and PtrSeq variants), keep a typed Go reference to enqueued objects until they are dequeued (or otherwise guaranteed consumed). Do not rely on queued pointer bits alone as a GC reachability root.

// Indirect - single 128-bit atomic per operation
q := lfq.NewMPMCIndirect(4096)

// Ptr - same optimization for unsafe.Pointer
q := lfq.NewMPMCPtr(4096)

Builder API

Automatic algorithm selection based on constraints:

// SPSC - both constraints → Lamport ring
q := lfq.Build[T](lfq.New(1024).SingleProducer().SingleConsumer())

// MPSC - single consumer only
q := lfq.Build[T](lfq.New(1024).SingleConsumer())

// SPMC - single producer only
q := lfq.Build[T](lfq.New(1024).SingleProducer())

// MPMC - no constraints (default)
q := lfq.Build[T](lfq.New(1024))

Variants

Each queue type has three variants:

Variant	Element Type	Use Case
Generic	`[T any]`	Type-safe, general purpose
Indirect	`uintptr`	Index-based pools, handles
Ptr	`unsafe.Pointer`	Zero-copy pointer passing

// Generic
q := lfq.NewMPMC[MyStruct](1024)

// Indirect - for pool indices
q := lfq.NewMPMCIndirect(1024)
q.Enqueue(uintptr(poolIndex))

// Pointer - zero-copy
q := lfq.NewMPMCPtr(1024)
q.Enqueue(unsafe.Pointer(obj))

Compact Mode

Compact() selects CAS-based algorithms that use n physical slots (vs 2n for FAA-based default). Use when memory efficiency is more important than contention scalability:

// Compact mode - CAS-based, n slots
q := lfq.New(4096).Compact().BuildIndirect()

Mode	Algorithm	Physical Slots	Use When
Default	FAA-based	2n	High contention, scalability
Compact	CAS-based	n	Memory constrained

SPSC variants already use n slots (Lamport ring buffer) and ignore Compact(). For Indirect queues with Compact(), values are limited to 63 bits.

Operations

Operation	Returns	Description
`Enqueue(elem)`	`error`	Add element; returns `ErrWouldBlock` if full
`Dequeue()`	`(T, error)`	Remove element; returns `ErrWouldBlock` if empty
`Cap()`	`int`	Queue capacity

Error Handling

err := q.Enqueue(&item)
if lfq.IsWouldBlock(err) {
    // Queue is full - backpressure or retry
}

elem, err := q.Dequeue()
if lfq.IsWouldBlock(err) {
    // Queue is empty - wait or poll
}

Usage Patterns

The following patterns demonstrate how lfq can be integrated into concurrent systems. By avoiding allocations in the hot path and utilizing appropriate queue variants, you can achieve substantial latency reductions.

Buffer Pool

A common pattern for zero-allocation I/O is pre-allocating a pool of buffers and tracking their available indices using a wait-free SPSC queue. This eliminates GC pressure entirely.

const poolSize = 1024
const bufSize = 4096

// Pre-allocate buffers
pool := make([][]byte, poolSize)
for i := range pool {
    pool[i] = make([]byte, bufSize)
}

// Free list tracks available indices
freeList := lfq.NewSPSCIndirect(poolSize)
for i := range poolSize {
    freeList.Enqueue(uintptr(i))
}

// Allocate
func Alloc() ([]byte, uintptr, bool) {
    idx, err := freeList.Dequeue()
    if err != nil {
        return nil, 0, false
    }
    return pool[idx], idx, true
}

// Free
func Free(idx uintptr) {
    freeList.Enqueue(idx)
}

Event Aggregation

type Event struct {
    Source    string
    Timestamp time.Time
    Data      any
}

// Multiple sources → Single processor
events := lfq.NewMPSC[Event](8192)

// Event sources (multiple producers)
for sensor := range slices.Values(sensors) {
    go func(s Sensor) {
        for reading := range s.Readings() {
            ev := Event{
                Source:    s.Name(),
                Timestamp: time.Now(),
                Data:      reading,
            }
            events.Enqueue(&ev)
        }
    }(sensor)
}

// Single aggregator (single consumer)
go func() {
    for {
        ev, err := events.Dequeue()
        if err == nil {
            aggregate(ev)
        }
    }
}()

Backpressure Handling

// With retry and yield
func EnqueueWithRetry(q lfq.Queue[Item], item Item, maxRetries int) bool {
	ba := iox.Backoff{}
    for i := range maxRetries {
        if q.Enqueue(&item) == nil {
            return true
        }
        ba.Wait() // Yield to let consumers drain
    }
    return false // Apply backpressure to caller
}

Graceful Shutdown

FAA-based multi-consumer queues (MPMC, SPMC) include a threshold mechanism to prevent livelock. MPSC also implements Drainer, but as a graceful-shutdown signal only (no dequeue threshold skip). For graceful shutdown where producers finish before consumers, use the Drainer interface:

// Producer goroutines finish
prodWg.Wait()

// Signal no more enqueues will occur
if d, ok := q.(lfq.Drainer); ok {
    d.Drain()
}

// Consumers can now drain all remaining items.
// For MPMC/SPMC, Drain avoids threshold-based early exit.
for {
    item, err := q.Dequeue()
    if err != nil {
        break // Queue is empty
    }
    process(item)
}

Drain() is a hint — the caller must ensure no further Enqueue() calls will be made. For MPMC/SPMC, Drain() avoids threshold-based early exit in Dequeue. For MPSC, Drain() is a shutdown signal only. SPSC queues do not implement Drainer; the type assertion naturally handles this case.

When to Use Which Queue

┌─────────────────────────────────────────────────────────────────┐
│                    How many producers?                          │
│                                                                 │
│      ┌──────────────────┐          ┌────────────────────┐      │
│      │    One (SPSC/     │          │   Multiple (MPMC/  │      │
│      │    SPMC)          │          │   MPSC)            │      │
│      └────────┬─────────┘          └─────────┬──────────┘      │
│               │                               │                 │
│               ▼                               ▼                 │
│   ┌──────────────────┐              ┌──────────────────┐       │
│   │ One consumer?    │              │ One consumer?    │       │
│   └────────┬─────────┘              └────────┬─────────┘       │
│    Yes     │     No                  Yes     │     No          │
│     │      │      │                   │      │      │          │
│     ▼      │      ▼                   ▼      │      ▼          │
│   SPSC     │    SPMC                MPSC     │    MPMC         │
│            │                                 │                  │
└────────────┴─────────────────────────────────┴─────────────────┘

Variant Selection:
• Generic [T]     → Type-safe, copying semantics
• Indirect        → Pool indices, buffer offsets (uintptr)
• Ptr             → Zero-copy object passing (unsafe.Pointer)

Capacity

Capacity rounds up to the next power of 2:

q := lfq.NewMPMC[int](3)     // Actual capacity: 4
q := lfq.NewMPMC[int](4)     // Actual capacity: 4
q := lfq.NewMPMC[int](1000)  // Actual capacity: 1024
q := lfq.NewMPMC[int](1024)  // Actual capacity: 1024

Minimum capacity is 2. Constructors panic if capacity < 2.

Memory Layout

All queues use cache-line padding (64 bytes) to prevent false sharing:

type MPMC[T any] struct {
    _        [64]byte      // Padding
    tail     atomix.Uint64 // Producer index
    _        [64]byte      // Padding
    head     atomix.Uint64 // Consumer index
    _        [64]byte      // Padding
    buffer   []slot[T]
    // ...
}

Race Detection

Go's race detector is not designed for lock-free algorithm verification. It tracks explicit sync primitives (mutex, channels) but cannot observe happens-before relationships from atomic memory orderings.

Tests use two protection mechanisms:

Build tag //go:build !race excludes example files from race testing
Runtime check if lfq.RaceEnabled { t.Skip() } skips concurrent tests in lockfree_test.go

Run go test -race ./... for race-safe tests, or go test ./... for all tests.

Dependencies

code.hybscloud.com/iox — Semantic errors (ErrWouldBlock)
code.hybscloud.com/atomix — Atomic primitives with explicit memory ordering
code.hybscloud.com/spin — Spin primitives

Platform Support

Platform	Status
linux/amd64	Primary
linux/arm64	Supported
linux/riscv64	Supported
linux/loong64	Supported
darwin/amd64, darwin/arm64	Supported
freebsd/amd64, freebsd/arm64	Supported

References

Nikolaev, R. (2019). A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue. DISC 2019 (LIPIcs). https://doi.org/10.4230/LIPIcs.DISC.2019.28. Preprint: https://arxiv.org/abs/1908.04511.
Lamport, L. (1977). Proving the Correctness of Multiprocess Programs. IEEE Transactions on Software Engineering, 3(2), 125–143.
Vyukov, D. (2010). Bounded MPMC Queue. 1024cores.net. https://1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue.
Herlihy, M. (1991). Wait-Free Synchronization. ACM Transactions on Programming Languages and Systems, 13(1), 124–149.
Herlihy, M., & Wing, J. M. (1990). Linearizability: A Correctness Condition for Concurrent Objects. ACM Transactions on Programming Languages and Systems, 12(3), 463–492.
Michael, M. M., & Scott, M. L. (1996). Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms. In Proceedings of the 15th ACM Symposium on Principles of Distributed Computing (PODC '96), pp. 267–275.
Adve, S. V., & Gharachorloo, K. (1996). Shared Memory Consistency Models: A Tutorial. IEEE Computer, 29(12), 66–76.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github		.github
internal/asm		internal/asm
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.es.md		README.es.md
README.fr.md		README.fr.md
README.ja.md		README.ja.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md
basic_test.go		basic_test.go
benchmark_128_test.go		benchmark_128_test.go
builder_128_test.go		builder_128_test.go
codecov.yml		codecov.yml
compact_seq_test.go		compact_seq_test.go
consistency_test.go		consistency_test.go
correctness_test.go		correctness_test.go
coverage_test.go		coverage_test.go
doc.go		doc.go
errors.go		errors.go
example_concurrent_test.go		example_concurrent_test.go
example_test.go		example_test.go
go.mod		go.mod
go.sum		go.sum
lockfree_test.go		lockfree_test.go
mpmc.go		mpmc.go
mpmc_128.go		mpmc_128.go
mpmc_128_seq.go		mpmc_128_seq.go
mpmc_128_test.go		mpmc_128_test.go
mpmc_compact.go		mpmc_compact.go
mpmc_seq.go		mpmc_seq.go
mpsc.go		mpsc.go
mpsc_128.go		mpsc_128.go
mpsc_128_seq.go		mpsc_128_seq.go
mpsc_compact.go		mpsc_compact.go
mpsc_seq.go		mpsc_seq.go
mpsc_spmc_128_test.go		mpsc_spmc_128_test.go
options.go		options.go
race.go		race.go
race_off.go		race_off.go
seq_stress_test.go		seq_stress_test.go
spmc.go		spmc.go
spmc_128.go		spmc_128.go
spmc_128_seq.go		spmc_128_seq.go
spmc_compact.go		spmc_compact.go
spmc_seq.go		spmc_seq.go
spsc.go		spsc.go
spsc_indirect_asm.go		spsc_indirect_asm.go
spsc_indirect_generic.go		spsc_indirect_generic.go
spsc_indirect_test.go		spsc_indirect_test.go
types.go		types.go

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

lfq

Overview

Theory and Background

The iox Non-Blocking I/O Stacks

Installation

Compiler Requirement

Queue Types

Progress Guarantees

Algorithms

SPSC: Lamport Ring Buffer

MPSC/SPMC/MPMC: FAA-Based (Default)

Indirect/Ptr Variants: 128-bit Atomic Operations

Builder API

Variants

Compact Mode

Operations

Error Handling

Usage Patterns

Buffer Pool

Event Aggregation

Backpressure Handling

Graceful Shutdown

When to Use Which Queue

Capacity

Memory Layout

Race Detection

Dependencies

Platform Support

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

The `iox` Non-Blocking I/O Stacks

Packages