Skip to content

DNS Server: Critical concurrency bug causes query blocking under load #136

@jbarwick

Description

@jbarwick

Bug Description

The DNS server has a critical concurrency bug where all DNS queries are serialized due to a global mutex being held during the entire request lifecycle, including external DNS forwarding.

Impact

  • Under load, DNS queries timeout waiting for the mutex
  • A single slow upstream DNS response (e.g., 200ms) blocks ALL other queries
  • Cascading failures occur as timeouts cause retry storms

Root Cause

In handleDNSRequest(), the mutex is locked at the start and only released when the function returns:

func handleDNSRequest(w dns.ResponseWriter, r *dns.Msg) {
    mutex.Lock()
    defer mutex.Unlock()  // Not released until function returns!
    
    // ... check blockedDomains, exceptionDomains ...
    
    // PROBLEM: External DNS call takes 50-500ms while holding lock
    resp, err := forwardDNSRequest(r)
}

Additional Issues Found

  1. Race condition in filter initialization - Map pointer reassignments in InitializeFilters() done without mutex protection
  2. No TCP support - Server only supports UDP, causing issues with responses >512 bytes

Resolution

This has been fixed in PR #135: #135

Changes Made

  • Changed sync.Mutex to sync.RWMutex for concurrent read access
  • Release lock BEFORE external DNS forwarding
  • Added proper locking in InitializeFilters()
  • Added TCP protocol support
  • Added environment variable configuration support

Test Results

  • 85/85 tests passed (100% pass rate)
  • Concurrent query test: 50 simultaneous queries complete in ~80ms (vs hanging before)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions