Skip to content

fix(cipher-enum): replace context.Background()/TODO() with timeout-aware contexts to fix indefinite hang (issue #819)#944

Open
usernametooshort wants to merge 2 commits intoprojectdiscovery:mainfrom
usernametooshort:fix/cipher-enum-hang-context-timeout
Open

fix(cipher-enum): replace context.Background()/TODO() with timeout-aware contexts to fix indefinite hang (issue #819)#944
usernametooshort wants to merge 2 commits intoprojectdiscovery:mainfrom
usernametooshort:fix/cipher-enum-hang-context-timeout

Conversation

@usernametooshort
Copy link

@usernametooshort usernametooshort commented Mar 5, 2026

Root Cause

EnumerateCiphers() in both pkg/tlsx/tls/tls.go and pkg/tlsx/ztls/ztls.go used context.Background() (and context.TODO() in ztls) for:

  1. connpool.NewOneTimePool(context.Background(), ...)
  2. pool.Acquire(context.Background())
  3. c.tlsHandshakeWithTimeout(conn, context.TODO()) (ztls only)

When a host stops responding mid cipher-enumeration, pool.Acquire blocks forever because the passed context has no deadline. With -concurrency 300 and -cipher-enum against 30k hosts, a single stalled host permanently occupies a goroutine slot. Once all 300 slots fill up with stalled hosts, the scan hangs entirely.

The truncated JSONL output line reported in #819 ("subject_cn": with no closing value) confirms the hang occurs during an active Write() call, whose goroutine was waiting for a pool slot that never freed.

Fix

Build a enumCtx for each per-host cipher enumeration:

  • If -timeout N is configured: deadline = N sec × ⌈ciphers÷threads⌉ — bounds the whole enumeration without cutting off legitimately slow hosts
  • Otherwise: a cancellable context (auto-cancelled by defer enumCancel() on function exit)
  • Propagate enumCtx to NewOneTimePool, pool.Acquire, and tlsHandshakeWithTimeout
  • On DeadlineExceeded / Canceled: return partial results gracefully instead of hanging

Fixes applied to both crypto/tls (tls/tls.go) and zcrypto/tls (ztls/ztls.go).

Testing

Builds clean (go build ./...).

To reproduce the hang before this fix:

# Run against a large list with cipher enumeration
tlsx -list hosts.txt -cipher-enum -cipher-type all -cipher-concurrency 10 -concurrency 300 -timeout 5 -json
# Previously hangs after ~25k hosts; now completes

Bounty

Solana wallet for payout: 8ZwtwvaosENNDyGB5dDzHGrMA8bkD1Cw6wcavds9fNyz

Closes #819

Summary by CodeRabbit

  • Bug Fixes
    • Improved timeout handling during cipher enumeration to prevent hangs and improve responsiveness.
    • Enumeration now respects global timeouts and uses per-host deadlines to terminate early when needed.
    • Handshake operations now observe cancellation/deadlines so enumeration stops promptly on timeout.
    • Enumeration returns partial results when per-host deadlines or cancellations occur, avoiding indefinite waits.

…are contexts

Fixes tlsx hanging indefinitely for large target lists (issue projectdiscovery#819).

Root cause: EnumerateCiphers() in both pkg/tlsx/tls/tls.go and
pkg/tlsx/ztls/ztls.go used context.Background() (and context.TODO()
in ztls) for:
  1. connpool.NewOneTimePool(context.Background(), ...)
  2. pool.Acquire(context.Background())
  3. c.tlsHandshakeWithTimeout(conn, context.TODO())  [ztls only]

When all pool connections are exhausted and a host stops responding,
pool.Acquire blocks forever because the context passed in has no
deadline. With -concurrency 300 and -cipher-enum against 30k hosts,
a single stalled host can permanently occupy a goroutine slot. Once
all 300 slots are occupied by stalled hosts, the entire scan hangs.

The truncated JSONL output line (mid-write) observed in issue projectdiscovery#819
confirms the hang occurs during active cipher enumeration, consistent
with a pool.Acquire deadlock.

Fix: build a timeout-aware enumCtx for each host's cipher enumeration:
  - If -timeout N is set: deadline = N seconds * ceil(ciphers/threads)
    giving the whole per-host enumeration a bounded ceiling
  - Otherwise: cancellable context (cancelled by defer on function exit)
  - Propagate enumCtx to NewOneTimePool, pool.Acquire, and
    tlsHandshakeWithTimeout
  - On DeadlineExceeded/Canceled: return partial results instead of
    blocking; scan continues with next host

Applied to both crypto/tls (tls/tls.go) and zcrypto/tls (ztls/ztls.go)
implementations.
@neo-by-projectdiscovery-dev
Copy link

neo-by-projectdiscovery-dev bot commented Mar 5, 2026

Neo - PR Security Review

No security issues found

Highlights

  • Replaces context.Background()/TODO() with timeout-aware contexts in cipher enumeration to prevent indefinite hangs
  • Adds graceful timeout handling that returns partial results instead of blocking forever when hosts stop responding
  • Calculates per-host deadline as timeout × (ciphers÷threads + 1) to bound enumeration duration

Comment @neo help for available commands. · Open in Neo

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

Walkthrough

Adds per-host, deadline-aware contexts for cipher enumeration in TLS and zTLS paths; these contexts are used for pool creation, connection acquisition, and handshakes so enumeration can abort on deadline/cancellation instead of hanging.

Changes

Cohort / File(s) Summary
Cipher enumeration context changes
pkg/tlsx/tls/tls.go, pkg/tlsx/ztls/ztls.go
Introduce enumCtx/enumCancel that applies a per-host deadline when a global timeout is set; use enumCtx for pool creation, pool.Acquire, and TLS handshakes (HandshakeContext). Acquire deadline/cancel errors cause early return of collected ciphers. Handshake is performed observing context cancellation. Cleanup added via deferred cancel.

Sequence Diagram(s)

sequenceDiagram
    participant Scanner
    participant EnumCtx
    participant Pool
    participant Conn
    participant TLSHandshake
    Scanner->>EnumCtx: create per-host enumCtx (deadline or cancel)
    Scanner->>Pool: pool := NewPool(enumCtx)
    loop per cipher
        Scanner->>Pool: conn := Acquire(enumCtx)
        alt Acquire deadline/canceled
            Pool-->>Scanner: DeadlineExceeded / Canceled
            Scanner-->>Scanner: return collected ciphers
        else Acquire success
            Pool-->>Conn: provide connection
            Scanner->>TLSHandshake: HandshakeContext(enumCtx, conn)
            alt Handshake success or tls.ErrCertsOnly
                TLSHandshake-->>Scanner: cipher result
            else Handshake error
                TLSHandshake-->>Scanner: error (propagate or continue)
            end
            Conn->>Pool: Release
        end
    end
    Scanner->>EnumCtx: enumCancel() (deferred cleanup)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 I hopped through timeouts, snug and spry,

I wrapped each handshake with a watchful eye.
No more lingering locks on the wire,
Deadlines now chirp and end the mire.
Hop! Cipher lists finished—no more quagmire.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the core fix: replacing context.Background()/TODO() with timeout-aware contexts to address the indefinite hang issue.
Linked Issues check ✅ Passed All code changes directly address issue #819 by implementing timeout-aware contexts for cipher enumeration to prevent indefinite blocking on unresponsive hosts.
Out of Scope Changes check ✅ Passed All changes are scoped to cipher enumeration context handling in both TLS implementations and directly address the indefinite hang issue described in #819.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/tlsx/tls/tls.go (1)

261-267: ⚠️ Potential issue | 🟠 Major

Handshake should use HandshakeContext(enumCtx) to propagate the timeout.

While pool.Acquire now respects the enumeration context, the handshake itself still uses conn.Handshake() which can block indefinitely. Standard crypto/tls provides HandshakeContext for exactly this purpose, and it's already used correctly in ConnectWithOptions at Line 126.

🔧 Proposed fix
 		conn := tls.Client(baseConn, baseCfg)
 
-		if err := conn.Handshake(); err == nil {
+		if err := conn.HandshakeContext(enumCtx); err == nil {
 			ciphersuite := conn.ConnectionState().CipherSuite
 			enumeratedCiphers = append(enumeratedCiphers, tls.CipherSuiteName(ciphersuite))
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tlsx/tls/tls.go` around lines 261 - 267, The handshake in the cipher
enumeration path currently calls conn.Handshake() which can block; replace that
call with conn.HandshakeContext(enumCtx) so the enumeration context
timeout/cancellation is respected. Locate the block where tls.Client(baseConn,
baseCfg) is assigned to conn (the same area that appends to enumeratedCiphers
from conn.ConnectionState().CipherSuite) and change the handshake invocation to
use the existing enumCtx; ensure you still Close() conn afterwards and preserve
the logic that appends tls.CipherSuiteName when HandshakeContext succeeds,
mirroring how ConnectWithOptions uses HandshakeContext.
pkg/tlsx/ztls/ztls.go (1)

343-359: ⚠️ Potential issue | 🟠 Major

Bug: tlsHandshakeWithTimeout does not actually respect context timeout during handshake.

The select statement evaluates tlsConn.Handshake() synchronously before choosing a case. If the handshake blocks indefinitely (e.g., the server stops responding mid-handshake), the ctx.Done() case will never be evaluated because the function call completes before the select runs.

The correct pattern is to run the handshake in a separate goroutine:

🔧 Proposed fix
 func (c *Client) tlsHandshakeWithTimeout(tlsConn *tls.Conn, ctx context.Context) error {
 	errChan := make(chan error, 1)
-	defer close(errChan)
 
+	go func() {
+		errChan <- tlsConn.Handshake()
+	}()
+
 	select {
 	case <-ctx.Done():
-		return errorutil.NewWithTag("ztls", "timeout while attempting handshake") //nolint
-	case errChan <- tlsConn.Handshake():
+		// Connection will be closed by caller, which will unblock the goroutine
+		return ctx.Err()
+	case err := <-errChan:
+		if err == tls.ErrCertsOnly {
+			return nil
+		}
+		return err
 	}
-
-	err := <-errChan
-	if err == tls.ErrCertsOnly {
-		err = nil
-	}
-	return err
 }

Note: With this fix, if the context expires mid-handshake, the goroutine will eventually complete when the caller closes the connection. Ensure the caller always closes the connection to prevent goroutine leaks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tlsx/ztls/ztls.go` around lines 343 - 359, The tlsHandshakeWithTimeout
function currently calls tlsConn.Handshake() inside a select which evaluates the
call synchronously; change it to run tlsConn.Handshake() in a separate goroutine
that sends its error result into errChan (e.g., go func() { errChan <-
tlsConn.Handshake() }()), then use select to wait on either ctx.Done() or the
err from errChan; after receiving the error, preserve the existing special-case
handling for tls.ErrCertsOnly and return the error; ensure errChan is buffered
and closed appropriately and note that callers must close the connection to
avoid leaking the handshake goroutine.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/tlsx/tls/tls.go`:
- Around line 261-267: The handshake in the cipher enumeration path currently
calls conn.Handshake() which can block; replace that call with
conn.HandshakeContext(enumCtx) so the enumeration context timeout/cancellation
is respected. Locate the block where tls.Client(baseConn, baseCfg) is assigned
to conn (the same area that appends to enumeratedCiphers from
conn.ConnectionState().CipherSuite) and change the handshake invocation to use
the existing enumCtx; ensure you still Close() conn afterwards and preserve the
logic that appends tls.CipherSuiteName when HandshakeContext succeeds, mirroring
how ConnectWithOptions uses HandshakeContext.

In `@pkg/tlsx/ztls/ztls.go`:
- Around line 343-359: The tlsHandshakeWithTimeout function currently calls
tlsConn.Handshake() inside a select which evaluates the call synchronously;
change it to run tlsConn.Handshake() in a separate goroutine that sends its
error result into errChan (e.g., go func() { errChan <- tlsConn.Handshake()
}()), then use select to wait on either ctx.Done() or the err from errChan;
after receiving the error, preserve the existing special-case handling for
tls.ErrCertsOnly and return the error; ensure errChan is buffered and closed
appropriately and note that callers must close the connection to avoid leaking
the handshake goroutine.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 12e02942-0cca-4c21-ba47-cf9c8973e0f5

📥 Commits

Reviewing files that changed from the base of the PR and between d13b67f and e729152.

📒 Files selected for processing (2)
  • pkg/tlsx/tls/tls.go
  • pkg/tlsx/ztls/ztls.go

…covery#944

tls/tls.go:
- Replace conn.Handshake() with conn.HandshakeContext(enumCtx) in the
  cipher enumeration loop so the enumeration timeout is respected during
  the handshake itself, not only during pool.Acquire.

ztls/ztls.go:
- Fix tlsHandshakeWithTimeout: the previous select statement had
  'case errChan <- tlsConn.Handshake()' which evaluates Handshake()
  synchronously before entering the select.  If the server stops
  responding mid-handshake ctx.Done() was never evaluated and the
  function blocked forever.  Fix: run Handshake in a goroutine and
  select on the result channel vs ctx.Done() concurrently.
@usernametooshort
Copy link
Author

Updated the branch to address CodeRabbit findings:

tls/tls.goconn.Handshake()conn.HandshakeContext(enumCtx). The enumeration timeout was propagated to pool.Acquire but not to the handshake itself, so a server that stalled during handshake would still hang indefinitely.

ztls/ztls.go — tlsHandshakeWithTimeout — The original select { case errChan <- tlsConn.Handshake(): } evaluates Handshake() synchronously before the select statement is entered. If the handshake blocks forever, ctx.Done() is never evaluated. Fixed by running the handshake in a goroutine and selecting on the result channel vs ctx.Done() concurrently — the standard Go pattern for context-aware blocking operations.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/tlsx/ztls/ztls.go (1)

355-355: Consider: context.Context as first parameter (Go convention).

Go convention recommends context.Context as the first parameter. Since this is an internal method and the PR focuses on the hang fix, this is a minor style nit that could be addressed separately.

-func (c *Client) tlsHandshakeWithTimeout(tlsConn *tls.Conn, ctx context.Context) error {
+func (c *Client) tlsHandshakeWithTimeout(ctx context.Context, tlsConn *tls.Conn) error {

This would require updating call sites at lines 143 and 280.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tlsx/ztls/ztls.go` at line 355, The function signature
tlsHandshakeWithTimeout should take context.Context as the first parameter per
Go conventions; update the method from func (c *Client)
tlsHandshakeWithTimeout(tlsConn *tls.Conn, ctx context.Context) error to place
ctx first (func (c *Client) tlsHandshakeWithTimeout(ctx context.Context, tlsConn
*tls.Conn) error) and then update all call sites that invoke
tlsHandshakeWithTimeout (the callers located near the previous change points) to
pass the context as the first argument accordingly; ensure imports remain
correct and run go vet/go fmt to catch any leftover call-site issues.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/tlsx/ztls/ztls.go`:
- Line 355: The function signature tlsHandshakeWithTimeout should take
context.Context as the first parameter per Go conventions; update the method
from func (c *Client) tlsHandshakeWithTimeout(tlsConn *tls.Conn, ctx
context.Context) error to place ctx first (func (c *Client)
tlsHandshakeWithTimeout(ctx context.Context, tlsConn *tls.Conn) error) and then
update all call sites that invoke tlsHandshakeWithTimeout (the callers located
near the previous change points) to pass the context as the first argument
accordingly; ensure imports remain correct and run go vet/go fmt to catch any
leftover call-site issues.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6a52db4e-7689-4026-888e-c7f29161e47c

📥 Commits

Reviewing files that changed from the base of the PR and between e729152 and 822da0d.

📒 Files selected for processing (2)
  • pkg/tlsx/tls/tls.go
  • pkg/tlsx/ztls/ztls.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/tlsx/tls/tls.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tlsx hangs indefinitely for some hosts

1 participant