Skip to content

fix: indefinite hangs during TLS handshake and cipher enumeration#960

Open
allornothingai wants to merge 1 commit intoprojectdiscovery:mainfrom
allornothingai:fix/indefinite-hangs
Open

fix: indefinite hangs during TLS handshake and cipher enumeration#960
allornothingai wants to merge 1 commit intoprojectdiscovery:mainfrom
allornothingai:fix/indefinite-hangs

Conversation

@allornothingai
Copy link

@allornothingai allornothingai commented Mar 15, 2026

This PR fixes multiple indefinite hangs in tlsx:

  1. Fixed a race condition/deadlock in ztls.tlsHandshakeWithTimeout where the select statement would block during function call evaluation, making the context timeout ineffective.
  2. Added missing context timeouts to EnumerateCiphers in both ztls and ctls (stdlib) clients.
  3. Increased the buffer size and implemented non-blocking sends for the PDCP UploadWriter to prevent worker goroutines from stalling on slow cloud uploads.

These fixes resolve issue #819 where tlsx hangs indefinitely after processing a large number of targets.

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Optimized data response handling with increased buffering capacity and non-blocking operations to reduce potential processing delays
    • Enhanced TLS connection operations with improved timeout implementation for more reliable cipher enumeration
    • Refined error handling in TLS handshake operations to better manage various failure scenarios

@neo-by-projectdiscovery-dev
Copy link

neo-by-projectdiscovery-dev bot commented Mar 15, 2026

Neo - PR Security Review

Caution

Review could not be completed

Review could not be completed. Please retry with @pdneo review.

Suggestion: Try again with @pdneo review.

Comment @pdneo help for available commands.

@coderabbitai
Copy link

coderabbitai bot commented Mar 15, 2026

Walkthrough

The changes modify TLS handshake mechanisms to use context-aware timeouts for improved timeout control and refactor handshake logic in ztls to use goroutines with context cancellation. Additionally, a PDCP data channel buffer size is increased and channel sends are made non-blocking with warning logging on overflow.

Changes

Cohort / File(s) Summary
PDCP Buffer Management
internal/pdcp/writer.go
Increases data channel buffer from 8 to 1000; modifies channel sends to non-blocking with warning log on overflow instead of blocking.
TLS Handshake Context Integration
pkg/tlsx/tls/tls.go, pkg/tlsx/ztls/ztls.go
Adds per-connection context-based timeouts to TLS handshakes; ztls refactors handshake into separate goroutine with channel-based result handling and context cancellation; adjusts error handling for tls.ErrCertsOnly cases.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HandshakeHandler as Handshake Handler<br/>(ztls)
    participant Goroutine as Handshake<br/>Goroutine
    participant Server as TLS Server
    participant ContextMgr as Context<br/>Manager

    Client->>HandshakeHandler: EnumerateCiphers()
    HandshakeHandler->>ContextMgr: Create context with timeout
    ContextMgr-->>HandshakeHandler: ctx (with cancel)
    
    HandshakeHandler->>Goroutine: Launch goroutine
    HandshakeHandler->>ContextMgr: Monitor context.Done()
    
    Goroutine->>Server: HandshakeContext(ctx)
    
    par Goroutine Path
        Server-->>Goroutine: Handshake result/error
        Goroutine-->>HandshakeHandler: Send result on errChan
    and Context Path
        ContextMgr-->>HandshakeHandler: Context cancellation signal
    end
    
    HandshakeHandler->>HandshakeHandler: Select on result or context
    alt Handshake completes
        HandshakeHandler->>HandshakeHandler: Check error type
        note over HandshakeHandler: If tls.ErrCertsOnly, treat as nil
    else Context timeout expires
        HandshakeHandler->>HandshakeHandler: Abort handshake
    end
    
    HandshakeHandler->>ContextMgr: Cancel context
    ContextMgr-->>HandshakeHandler: Cleanup
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 With timeout's embrace, the handshakes now fly,
Goroutines dance where contexts won't lie,
Buffers expanded to catch every call,
Non-blocking and graceful—we handle it all! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: fixing indefinite hangs during TLS handshake and cipher enumeration, which aligns with the changeset modifications across three files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can customize the high-level summary generated by CodeRabbit.

Configure the reviews.high_level_summary_instructions setting to provide custom instructions for generating the high-level summary.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
pkg/tlsx/ztls/ztls.go (1)

326-341: Goroutine-based timeout pattern correctly fixes the race condition.

The buffered channel (size 1) ensures the goroutine won't leak when the context times out before the handshake completes. The ErrCertsOnly handling aligns with the client's CertsOnly mode.

Minor convention note: Go idiom places ctx context.Context as the first parameter. Consider reordering for consistency with standard library patterns.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tlsx/ztls/ztls.go` around lines 326 - 341, The function
tlsHandshakeWithTimeout currently takes (tlsConn *tls.Conn, ctx
context.Context); reorder parameters to place ctx context.Context first to
follow Go conventions (i.e., change signature to tlsHandshakeWithTimeout(ctx
context.Context, tlsConn *tls.Conn)), and update all call sites that invoke
tlsHandshakeWithTimeout to pass the context as the first argument; keep the
existing logic inside the function (channel/goroutine handshake and ErrCertsOnly
handling) unchanged.
internal/pdcp/writer.go (1)

94-98: Non-blocking send prevents hangs but silently drops data.

The select/default pattern correctly prevents goroutine stalls. However, when the buffer is full, results are permanently lost with only a warning log. Consider tracking dropped results with a counter (similar to u.counter for uploads) so users can verify scan completeness at the end.

♻️ Suggested enhancement to track dropped results

Add a field to UploadWriter:

droppedCounter atomic.Int32

Then update the callback:

 func (u *UploadWriter) GetWriterCallback() func(*clients.Response) {
 	return func(resp *clients.Response) {
 		select {
 		case u.data <- resp:
 		default:
+			u.droppedCounter.Add(1)
 			gologger.Warning().Msgf("PDCP upload buffer full, skipping result")
 		}
 	}
 }

And report in autoCommit defer:

if dropped := u.droppedCounter.Load(); dropped > 0 {
    gologger.Warning().Msgf("Dropped %v results due to upload buffer overflow", dropped)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/pdcp/writer.go` around lines 94 - 98, Add a dropped results counter
to UploadWriter and increment it whenever the non-blocking send to u.data fails;
specifically add a droppedCounter (e.g., atomic.Int32) to the UploadWriter
struct, replace the select/default branch that currently logs "PDCP upload
buffer full, skipping result" with logic that increments u.droppedCounter and
still logs the warning, and then in autoCommit's defer read/report the counter
(u.droppedCounter.Load()) and emit a warning if >0 so callers can verify dropped
results (keep existing u.counter for successful uploads).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/tlsx/tls/tls.go`:
- Around line 239-244: The handshake loop uses context.WithTimeout with
c.options.Timeout which if zero yields an already-expired context and makes
conn.HandshakeContext fail; modify the code around
context.WithTimeout/conn.HandshakeContext so it first checks c.options.Timeout
!= 0 and only creates a timeout context in that case, otherwise use
context.Background() (or context.WithCancel) for the handshake; update the
cancel() usage accordingly to only call cancel when a cancellable timeout
context was created (refer to c.options.Timeout, context.WithTimeout,
conn.HandshakeContext, and how ConnectWithOptions guards Timeout).

In `@pkg/tlsx/ztls/ztls.go`:
- Around line 260-265: The cipher-enumeration loop uses context.WithTimeout with
c.options.Timeout which when zero makes the context expire immediately; update
the code around tlsHandshakeWithTimeout and the block that builds ctx, cancel to
guard for c.options.Timeout == 0 (same pattern as ConnectWithOptions) by
choosing a sensible default or skipping the timeout logic so handshakes can run,
then call tlsHandshakeWithTimeout(conn, ctx) and cancel() as before; reference
c.options.Timeout, tlsHandshakeWithTimeout, conn.GetHandshakeLog and
enumeratedCiphers to locate and apply the guard.

---

Nitpick comments:
In `@internal/pdcp/writer.go`:
- Around line 94-98: Add a dropped results counter to UploadWriter and increment
it whenever the non-blocking send to u.data fails; specifically add a
droppedCounter (e.g., atomic.Int32) to the UploadWriter struct, replace the
select/default branch that currently logs "PDCP upload buffer full, skipping
result" with logic that increments u.droppedCounter and still logs the warning,
and then in autoCommit's defer read/report the counter (u.droppedCounter.Load())
and emit a warning if >0 so callers can verify dropped results (keep existing
u.counter for successful uploads).

In `@pkg/tlsx/ztls/ztls.go`:
- Around line 326-341: The function tlsHandshakeWithTimeout currently takes
(tlsConn *tls.Conn, ctx context.Context); reorder parameters to place ctx
context.Context first to follow Go conventions (i.e., change signature to
tlsHandshakeWithTimeout(ctx context.Context, tlsConn *tls.Conn)), and update all
call sites that invoke tlsHandshakeWithTimeout to pass the context as the first
argument; keep the existing logic inside the function (channel/goroutine
handshake and ErrCertsOnly handling) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5846bef1-b0a3-42d1-9379-34888db32676

📥 Commits

Reviewing files that changed from the base of the PR and between d13b67f and 37436e1.

📒 Files selected for processing (3)
  • internal/pdcp/writer.go
  • pkg/tlsx/tls/tls.go
  • pkg/tlsx/ztls/ztls.go

Comment on lines +239 to +244
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(c.options.Timeout)*time.Second)
if err := conn.HandshakeContext(ctx); err == nil {
ciphersuite := conn.ConnectionState().CipherSuite
enumeratedCiphers = append(enumeratedCiphers, tls.CipherSuiteName(ciphersuite))
}
cancel()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing timeout validation: Timeout == 0 causes immediate context expiration.

Unlike ConnectWithOptions (lines 110-113), this code does not check c.options.Timeout != 0 before creating the context. If Timeout is 0, context.WithTimeout(..., 0) creates an already-expired context, causing every handshake to fail immediately.

🐛 Proposed fix to guard against zero timeout
+		timeout := time.Duration(c.options.Timeout) * time.Second
+		if timeout == 0 {
+			timeout = 5 * time.Second // default timeout
+		}
+		ctx, cancel := context.WithTimeout(context.Background(), timeout)
-		ctx, cancel := context.WithTimeout(context.Background(), time.Duration(c.options.Timeout)*time.Second)
 		if err := conn.HandshakeContext(ctx); err == nil {

Alternatively, skip the timeout wrapper when Timeout == 0 to match ConnectWithOptions behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tlsx/tls/tls.go` around lines 239 - 244, The handshake loop uses
context.WithTimeout with c.options.Timeout which if zero yields an
already-expired context and makes conn.HandshakeContext fail; modify the code
around context.WithTimeout/conn.HandshakeContext so it first checks
c.options.Timeout != 0 and only creates a timeout context in that case,
otherwise use context.Background() (or context.WithCancel) for the handshake;
update the cancel() usage accordingly to only call cancel when a cancellable
timeout context was created (refer to c.options.Timeout, context.WithTimeout,
conn.HandshakeContext, and how ConnectWithOptions guards Timeout).

Comment on lines +260 to +265
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(c.options.Timeout)*time.Second)
if err := c.tlsHandshakeWithTimeout(conn, ctx); err == nil {
h1 := conn.GetHandshakeLog()
enumeratedCiphers = append(enumeratedCiphers, h1.ServerHello.CipherSuite.String())
}
cancel()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same timeout validation issue: Timeout == 0 causes immediate context expiration.

This matches the issue in pkg/tlsx/tls/tls.go. Unlike ConnectWithOptions (lines 118-122 in this file), there's no guard against Timeout == 0, which would cause all cipher enumeration handshakes to fail immediately.

🐛 Proposed fix to guard against zero timeout
+		timeout := time.Duration(c.options.Timeout) * time.Second
+		if timeout == 0 {
+			timeout = 5 * time.Second // default timeout
+		}
+		ctx, cancel := context.WithTimeout(context.Background(), timeout)
-		ctx, cancel := context.WithTimeout(context.Background(), time.Duration(c.options.Timeout)*time.Second)
 		if err := c.tlsHandshakeWithTimeout(conn, ctx); err == nil {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(c.options.Timeout)*time.Second)
if err := c.tlsHandshakeWithTimeout(conn, ctx); err == nil {
h1 := conn.GetHandshakeLog()
enumeratedCiphers = append(enumeratedCiphers, h1.ServerHello.CipherSuite.String())
}
cancel()
timeout := time.Duration(c.options.Timeout) * time.Second
if timeout == 0 {
timeout = 5 * time.Second // default timeout
}
ctx, cancel := context.WithTimeout(context.Background(), timeout)
if err := c.tlsHandshakeWithTimeout(conn, ctx); err == nil {
h1 := conn.GetHandshakeLog()
enumeratedCiphers = append(enumeratedCiphers, h1.ServerHello.CipherSuite.String())
}
cancel()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tlsx/ztls/ztls.go` around lines 260 - 265, The cipher-enumeration loop
uses context.WithTimeout with c.options.Timeout which when zero makes the
context expire immediately; update the code around tlsHandshakeWithTimeout and
the block that builds ctx, cancel to guard for c.options.Timeout == 0 (same
pattern as ConnectWithOptions) by choosing a sensible default or skipping the
timeout logic so handshakes can run, then call tlsHandshakeWithTimeout(conn,
ctx) and cancel() as before; reference c.options.Timeout,
tlsHandshakeWithTimeout, conn.GetHandshakeLog and enumeratedCiphers to locate
and apply the guard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant