fix(cipher-enum): replace context.Background()/TODO() with timeout-aware contexts to fix indefinite hang (issue #819)#944
Conversation
…are contexts Fixes tlsx hanging indefinitely for large target lists (issue projectdiscovery#819). Root cause: EnumerateCiphers() in both pkg/tlsx/tls/tls.go and pkg/tlsx/ztls/ztls.go used context.Background() (and context.TODO() in ztls) for: 1. connpool.NewOneTimePool(context.Background(), ...) 2. pool.Acquire(context.Background()) 3. c.tlsHandshakeWithTimeout(conn, context.TODO()) [ztls only] When all pool connections are exhausted and a host stops responding, pool.Acquire blocks forever because the context passed in has no deadline. With -concurrency 300 and -cipher-enum against 30k hosts, a single stalled host can permanently occupy a goroutine slot. Once all 300 slots are occupied by stalled hosts, the entire scan hangs. The truncated JSONL output line (mid-write) observed in issue projectdiscovery#819 confirms the hang occurs during active cipher enumeration, consistent with a pool.Acquire deadlock. Fix: build a timeout-aware enumCtx for each host's cipher enumeration: - If -timeout N is set: deadline = N seconds * ceil(ciphers/threads) giving the whole per-host enumeration a bounded ceiling - Otherwise: cancellable context (cancelled by defer on function exit) - Propagate enumCtx to NewOneTimePool, pool.Acquire, and tlsHandshakeWithTimeout - On DeadlineExceeded/Canceled: return partial results instead of blocking; scan continues with next host Applied to both crypto/tls (tls/tls.go) and zcrypto/tls (ztls/ztls.go) implementations.
Neo - PR Security ReviewNo security issues found Highlights
Comment |
WalkthroughAdds per-host, deadline-aware contexts for cipher enumeration in TLS and zTLS paths; these contexts are used for pool creation, connection acquisition, and handshakes so enumeration can abort on deadline/cancellation instead of hanging. Changes
Sequence Diagram(s)sequenceDiagram
participant Scanner
participant EnumCtx
participant Pool
participant Conn
participant TLSHandshake
Scanner->>EnumCtx: create per-host enumCtx (deadline or cancel)
Scanner->>Pool: pool := NewPool(enumCtx)
loop per cipher
Scanner->>Pool: conn := Acquire(enumCtx)
alt Acquire deadline/canceled
Pool-->>Scanner: DeadlineExceeded / Canceled
Scanner-->>Scanner: return collected ciphers
else Acquire success
Pool-->>Conn: provide connection
Scanner->>TLSHandshake: HandshakeContext(enumCtx, conn)
alt Handshake success or tls.ErrCertsOnly
TLSHandshake-->>Scanner: cipher result
else Handshake error
TLSHandshake-->>Scanner: error (propagate or continue)
end
Conn->>Pool: Release
end
end
Scanner->>EnumCtx: enumCancel() (deferred cleanup)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pkg/tlsx/tls/tls.go (1)
261-267:⚠️ Potential issue | 🟠 MajorHandshake should use
HandshakeContext(enumCtx)to propagate the timeout.While
pool.Acquirenow respects the enumeration context, the handshake itself still usesconn.Handshake()which can block indefinitely. Standardcrypto/tlsprovidesHandshakeContextfor exactly this purpose, and it's already used correctly inConnectWithOptionsat Line 126.🔧 Proposed fix
conn := tls.Client(baseConn, baseCfg) - if err := conn.Handshake(); err == nil { + if err := conn.HandshakeContext(enumCtx); err == nil { ciphersuite := conn.ConnectionState().CipherSuite enumeratedCiphers = append(enumeratedCiphers, tls.CipherSuiteName(ciphersuite)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/tlsx/tls/tls.go` around lines 261 - 267, The handshake in the cipher enumeration path currently calls conn.Handshake() which can block; replace that call with conn.HandshakeContext(enumCtx) so the enumeration context timeout/cancellation is respected. Locate the block where tls.Client(baseConn, baseCfg) is assigned to conn (the same area that appends to enumeratedCiphers from conn.ConnectionState().CipherSuite) and change the handshake invocation to use the existing enumCtx; ensure you still Close() conn afterwards and preserve the logic that appends tls.CipherSuiteName when HandshakeContext succeeds, mirroring how ConnectWithOptions uses HandshakeContext.pkg/tlsx/ztls/ztls.go (1)
343-359:⚠️ Potential issue | 🟠 MajorBug:
tlsHandshakeWithTimeoutdoes not actually respect context timeout during handshake.The
selectstatement evaluatestlsConn.Handshake()synchronously before choosing a case. If the handshake blocks indefinitely (e.g., the server stops responding mid-handshake), thectx.Done()case will never be evaluated because the function call completes before theselectruns.The correct pattern is to run the handshake in a separate goroutine:
🔧 Proposed fix
func (c *Client) tlsHandshakeWithTimeout(tlsConn *tls.Conn, ctx context.Context) error { errChan := make(chan error, 1) - defer close(errChan) + go func() { + errChan <- tlsConn.Handshake() + }() + select { case <-ctx.Done(): - return errorutil.NewWithTag("ztls", "timeout while attempting handshake") //nolint - case errChan <- tlsConn.Handshake(): + // Connection will be closed by caller, which will unblock the goroutine + return ctx.Err() + case err := <-errChan: + if err == tls.ErrCertsOnly { + return nil + } + return err } - - err := <-errChan - if err == tls.ErrCertsOnly { - err = nil - } - return err }Note: With this fix, if the context expires mid-handshake, the goroutine will eventually complete when the caller closes the connection. Ensure the caller always closes the connection to prevent goroutine leaks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/tlsx/ztls/ztls.go` around lines 343 - 359, The tlsHandshakeWithTimeout function currently calls tlsConn.Handshake() inside a select which evaluates the call synchronously; change it to run tlsConn.Handshake() in a separate goroutine that sends its error result into errChan (e.g., go func() { errChan <- tlsConn.Handshake() }()), then use select to wait on either ctx.Done() or the err from errChan; after receiving the error, preserve the existing special-case handling for tls.ErrCertsOnly and return the error; ensure errChan is buffered and closed appropriately and note that callers must close the connection to avoid leaking the handshake goroutine.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@pkg/tlsx/tls/tls.go`:
- Around line 261-267: The handshake in the cipher enumeration path currently
calls conn.Handshake() which can block; replace that call with
conn.HandshakeContext(enumCtx) so the enumeration context timeout/cancellation
is respected. Locate the block where tls.Client(baseConn, baseCfg) is assigned
to conn (the same area that appends to enumeratedCiphers from
conn.ConnectionState().CipherSuite) and change the handshake invocation to use
the existing enumCtx; ensure you still Close() conn afterwards and preserve the
logic that appends tls.CipherSuiteName when HandshakeContext succeeds, mirroring
how ConnectWithOptions uses HandshakeContext.
In `@pkg/tlsx/ztls/ztls.go`:
- Around line 343-359: The tlsHandshakeWithTimeout function currently calls
tlsConn.Handshake() inside a select which evaluates the call synchronously;
change it to run tlsConn.Handshake() in a separate goroutine that sends its
error result into errChan (e.g., go func() { errChan <- tlsConn.Handshake()
}()), then use select to wait on either ctx.Done() or the err from errChan;
after receiving the error, preserve the existing special-case handling for
tls.ErrCertsOnly and return the error; ensure errChan is buffered and closed
appropriately and note that callers must close the connection to avoid leaking
the handshake goroutine.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 12e02942-0cca-4c21-ba47-cf9c8973e0f5
📒 Files selected for processing (2)
pkg/tlsx/tls/tls.gopkg/tlsx/ztls/ztls.go
…covery#944 tls/tls.go: - Replace conn.Handshake() with conn.HandshakeContext(enumCtx) in the cipher enumeration loop so the enumeration timeout is respected during the handshake itself, not only during pool.Acquire. ztls/ztls.go: - Fix tlsHandshakeWithTimeout: the previous select statement had 'case errChan <- tlsConn.Handshake()' which evaluates Handshake() synchronously before entering the select. If the server stops responding mid-handshake ctx.Done() was never evaluated and the function blocked forever. Fix: run Handshake in a goroutine and select on the result channel vs ctx.Done() concurrently.
|
Updated the branch to address CodeRabbit findings: tls/tls.go — ztls/ztls.go — |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/tlsx/ztls/ztls.go (1)
355-355: Consider:context.Contextas first parameter (Go convention).Go convention recommends
context.Contextas the first parameter. Since this is an internal method and the PR focuses on the hang fix, this is a minor style nit that could be addressed separately.-func (c *Client) tlsHandshakeWithTimeout(tlsConn *tls.Conn, ctx context.Context) error { +func (c *Client) tlsHandshakeWithTimeout(ctx context.Context, tlsConn *tls.Conn) error {This would require updating call sites at lines 143 and 280.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/tlsx/ztls/ztls.go` at line 355, The function signature tlsHandshakeWithTimeout should take context.Context as the first parameter per Go conventions; update the method from func (c *Client) tlsHandshakeWithTimeout(tlsConn *tls.Conn, ctx context.Context) error to place ctx first (func (c *Client) tlsHandshakeWithTimeout(ctx context.Context, tlsConn *tls.Conn) error) and then update all call sites that invoke tlsHandshakeWithTimeout (the callers located near the previous change points) to pass the context as the first argument accordingly; ensure imports remain correct and run go vet/go fmt to catch any leftover call-site issues.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/tlsx/ztls/ztls.go`:
- Line 355: The function signature tlsHandshakeWithTimeout should take
context.Context as the first parameter per Go conventions; update the method
from func (c *Client) tlsHandshakeWithTimeout(tlsConn *tls.Conn, ctx
context.Context) error to place ctx first (func (c *Client)
tlsHandshakeWithTimeout(ctx context.Context, tlsConn *tls.Conn) error) and then
update all call sites that invoke tlsHandshakeWithTimeout (the callers located
near the previous change points) to pass the context as the first argument
accordingly; ensure imports remain correct and run go vet/go fmt to catch any
leftover call-site issues.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6a52db4e-7689-4026-888e-c7f29161e47c
📒 Files selected for processing (2)
pkg/tlsx/tls/tls.gopkg/tlsx/ztls/ztls.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/tlsx/tls/tls.go
Root Cause
EnumerateCiphers()in bothpkg/tlsx/tls/tls.goandpkg/tlsx/ztls/ztls.gousedcontext.Background()(andcontext.TODO()in ztls) for:connpool.NewOneTimePool(context.Background(), ...)pool.Acquire(context.Background())c.tlsHandshakeWithTimeout(conn, context.TODO())(ztls only)When a host stops responding mid cipher-enumeration,
pool.Acquireblocks forever because the passed context has no deadline. With-concurrency 300and-cipher-enumagainst 30k hosts, a single stalled host permanently occupies a goroutine slot. Once all 300 slots fill up with stalled hosts, the scan hangs entirely.The truncated JSONL output line reported in #819 (
"subject_cn":with no closing value) confirms the hang occurs during an activeWrite()call, whose goroutine was waiting for a pool slot that never freed.Fix
Build a
enumCtxfor each per-host cipher enumeration:-timeout Nis configured:deadline = N sec × ⌈ciphers÷threads⌉— bounds the whole enumeration without cutting off legitimately slow hostsdefer enumCancel()on function exit)enumCtxtoNewOneTimePool,pool.Acquire, andtlsHandshakeWithTimeoutDeadlineExceeded/Canceled: return partial results gracefully instead of hangingFixes applied to both
crypto/tls(tls/tls.go) andzcrypto/tls(ztls/ztls.go).Testing
Builds clean (
go build ./...).To reproduce the hang before this fix:
Bounty
Solana wallet for payout:
8ZwtwvaosENNDyGB5dDzHGrMA8bkD1Cw6wcavds9fNyzCloses #819
Summary by CodeRabbit