fix: indefinite hangs during TLS handshake and cipher enumeration#960
fix: indefinite hangs during TLS handshake and cipher enumeration#960allornothingai wants to merge 1 commit intoprojectdiscovery:mainfrom
Conversation
Neo - PR Security ReviewCaution Review could not be completed Review could not be completed. Please retry with Suggestion: Try again with Comment |
WalkthroughThe changes modify TLS handshake mechanisms to use context-aware timeouts for improved timeout control and refactor handshake logic in ztls to use goroutines with context cancellation. Additionally, a PDCP data channel buffer size is increased and channel sends are made non-blocking with warning logging on overflow. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HandshakeHandler as Handshake Handler<br/>(ztls)
participant Goroutine as Handshake<br/>Goroutine
participant Server as TLS Server
participant ContextMgr as Context<br/>Manager
Client->>HandshakeHandler: EnumerateCiphers()
HandshakeHandler->>ContextMgr: Create context with timeout
ContextMgr-->>HandshakeHandler: ctx (with cancel)
HandshakeHandler->>Goroutine: Launch goroutine
HandshakeHandler->>ContextMgr: Monitor context.Done()
Goroutine->>Server: HandshakeContext(ctx)
par Goroutine Path
Server-->>Goroutine: Handshake result/error
Goroutine-->>HandshakeHandler: Send result on errChan
and Context Path
ContextMgr-->>HandshakeHandler: Context cancellation signal
end
HandshakeHandler->>HandshakeHandler: Select on result or context
alt Handshake completes
HandshakeHandler->>HandshakeHandler: Check error type
note over HandshakeHandler: If tls.ErrCertsOnly, treat as nil
else Context timeout expires
HandshakeHandler->>HandshakeHandler: Abort handshake
end
HandshakeHandler->>ContextMgr: Cancel context
ContextMgr-->>HandshakeHandler: Cleanup
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can customize the high-level summary generated by CodeRabbit.Configure the |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
pkg/tlsx/ztls/ztls.go (1)
326-341: Goroutine-based timeout pattern correctly fixes the race condition.The buffered channel (size 1) ensures the goroutine won't leak when the context times out before the handshake completes. The
ErrCertsOnlyhandling aligns with the client'sCertsOnlymode.Minor convention note: Go idiom places
ctx context.Contextas the first parameter. Consider reordering for consistency with standard library patterns.,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/tlsx/ztls/ztls.go` around lines 326 - 341, The function tlsHandshakeWithTimeout currently takes (tlsConn *tls.Conn, ctx context.Context); reorder parameters to place ctx context.Context first to follow Go conventions (i.e., change signature to tlsHandshakeWithTimeout(ctx context.Context, tlsConn *tls.Conn)), and update all call sites that invoke tlsHandshakeWithTimeout to pass the context as the first argument; keep the existing logic inside the function (channel/goroutine handshake and ErrCertsOnly handling) unchanged.internal/pdcp/writer.go (1)
94-98: Non-blocking send prevents hangs but silently drops data.The select/default pattern correctly prevents goroutine stalls. However, when the buffer is full, results are permanently lost with only a warning log. Consider tracking dropped results with a counter (similar to
u.counterfor uploads) so users can verify scan completeness at the end.♻️ Suggested enhancement to track dropped results
Add a field to
UploadWriter:droppedCounter atomic.Int32Then update the callback:
func (u *UploadWriter) GetWriterCallback() func(*clients.Response) { return func(resp *clients.Response) { select { case u.data <- resp: default: + u.droppedCounter.Add(1) gologger.Warning().Msgf("PDCP upload buffer full, skipping result") } } }And report in
autoCommitdefer:if dropped := u.droppedCounter.Load(); dropped > 0 { gologger.Warning().Msgf("Dropped %v results due to upload buffer overflow", dropped) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/pdcp/writer.go` around lines 94 - 98, Add a dropped results counter to UploadWriter and increment it whenever the non-blocking send to u.data fails; specifically add a droppedCounter (e.g., atomic.Int32) to the UploadWriter struct, replace the select/default branch that currently logs "PDCP upload buffer full, skipping result" with logic that increments u.droppedCounter and still logs the warning, and then in autoCommit's defer read/report the counter (u.droppedCounter.Load()) and emit a warning if >0 so callers can verify dropped results (keep existing u.counter for successful uploads).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/tlsx/tls/tls.go`:
- Around line 239-244: The handshake loop uses context.WithTimeout with
c.options.Timeout which if zero yields an already-expired context and makes
conn.HandshakeContext fail; modify the code around
context.WithTimeout/conn.HandshakeContext so it first checks c.options.Timeout
!= 0 and only creates a timeout context in that case, otherwise use
context.Background() (or context.WithCancel) for the handshake; update the
cancel() usage accordingly to only call cancel when a cancellable timeout
context was created (refer to c.options.Timeout, context.WithTimeout,
conn.HandshakeContext, and how ConnectWithOptions guards Timeout).
In `@pkg/tlsx/ztls/ztls.go`:
- Around line 260-265: The cipher-enumeration loop uses context.WithTimeout with
c.options.Timeout which when zero makes the context expire immediately; update
the code around tlsHandshakeWithTimeout and the block that builds ctx, cancel to
guard for c.options.Timeout == 0 (same pattern as ConnectWithOptions) by
choosing a sensible default or skipping the timeout logic so handshakes can run,
then call tlsHandshakeWithTimeout(conn, ctx) and cancel() as before; reference
c.options.Timeout, tlsHandshakeWithTimeout, conn.GetHandshakeLog and
enumeratedCiphers to locate and apply the guard.
---
Nitpick comments:
In `@internal/pdcp/writer.go`:
- Around line 94-98: Add a dropped results counter to UploadWriter and increment
it whenever the non-blocking send to u.data fails; specifically add a
droppedCounter (e.g., atomic.Int32) to the UploadWriter struct, replace the
select/default branch that currently logs "PDCP upload buffer full, skipping
result" with logic that increments u.droppedCounter and still logs the warning,
and then in autoCommit's defer read/report the counter (u.droppedCounter.Load())
and emit a warning if >0 so callers can verify dropped results (keep existing
u.counter for successful uploads).
In `@pkg/tlsx/ztls/ztls.go`:
- Around line 326-341: The function tlsHandshakeWithTimeout currently takes
(tlsConn *tls.Conn, ctx context.Context); reorder parameters to place ctx
context.Context first to follow Go conventions (i.e., change signature to
tlsHandshakeWithTimeout(ctx context.Context, tlsConn *tls.Conn)), and update all
call sites that invoke tlsHandshakeWithTimeout to pass the context as the first
argument; keep the existing logic inside the function (channel/goroutine
handshake and ErrCertsOnly handling) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5846bef1-b0a3-42d1-9379-34888db32676
📒 Files selected for processing (3)
internal/pdcp/writer.gopkg/tlsx/tls/tls.gopkg/tlsx/ztls/ztls.go
| ctx, cancel := context.WithTimeout(context.Background(), time.Duration(c.options.Timeout)*time.Second) | ||
| if err := conn.HandshakeContext(ctx); err == nil { | ||
| ciphersuite := conn.ConnectionState().CipherSuite | ||
| enumeratedCiphers = append(enumeratedCiphers, tls.CipherSuiteName(ciphersuite)) | ||
| } | ||
| cancel() |
There was a problem hiding this comment.
Missing timeout validation: Timeout == 0 causes immediate context expiration.
Unlike ConnectWithOptions (lines 110-113), this code does not check c.options.Timeout != 0 before creating the context. If Timeout is 0, context.WithTimeout(..., 0) creates an already-expired context, causing every handshake to fail immediately.
🐛 Proposed fix to guard against zero timeout
+ timeout := time.Duration(c.options.Timeout) * time.Second
+ if timeout == 0 {
+ timeout = 5 * time.Second // default timeout
+ }
+ ctx, cancel := context.WithTimeout(context.Background(), timeout)
- ctx, cancel := context.WithTimeout(context.Background(), time.Duration(c.options.Timeout)*time.Second)
if err := conn.HandshakeContext(ctx); err == nil {Alternatively, skip the timeout wrapper when Timeout == 0 to match ConnectWithOptions behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/tlsx/tls/tls.go` around lines 239 - 244, The handshake loop uses
context.WithTimeout with c.options.Timeout which if zero yields an
already-expired context and makes conn.HandshakeContext fail; modify the code
around context.WithTimeout/conn.HandshakeContext so it first checks
c.options.Timeout != 0 and only creates a timeout context in that case,
otherwise use context.Background() (or context.WithCancel) for the handshake;
update the cancel() usage accordingly to only call cancel when a cancellable
timeout context was created (refer to c.options.Timeout, context.WithTimeout,
conn.HandshakeContext, and how ConnectWithOptions guards Timeout).
| ctx, cancel := context.WithTimeout(context.Background(), time.Duration(c.options.Timeout)*time.Second) | ||
| if err := c.tlsHandshakeWithTimeout(conn, ctx); err == nil { | ||
| h1 := conn.GetHandshakeLog() | ||
| enumeratedCiphers = append(enumeratedCiphers, h1.ServerHello.CipherSuite.String()) | ||
| } | ||
| cancel() |
There was a problem hiding this comment.
Same timeout validation issue: Timeout == 0 causes immediate context expiration.
This matches the issue in pkg/tlsx/tls/tls.go. Unlike ConnectWithOptions (lines 118-122 in this file), there's no guard against Timeout == 0, which would cause all cipher enumeration handshakes to fail immediately.
🐛 Proposed fix to guard against zero timeout
+ timeout := time.Duration(c.options.Timeout) * time.Second
+ if timeout == 0 {
+ timeout = 5 * time.Second // default timeout
+ }
+ ctx, cancel := context.WithTimeout(context.Background(), timeout)
- ctx, cancel := context.WithTimeout(context.Background(), time.Duration(c.options.Timeout)*time.Second)
if err := c.tlsHandshakeWithTimeout(conn, ctx); err == nil {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ctx, cancel := context.WithTimeout(context.Background(), time.Duration(c.options.Timeout)*time.Second) | |
| if err := c.tlsHandshakeWithTimeout(conn, ctx); err == nil { | |
| h1 := conn.GetHandshakeLog() | |
| enumeratedCiphers = append(enumeratedCiphers, h1.ServerHello.CipherSuite.String()) | |
| } | |
| cancel() | |
| timeout := time.Duration(c.options.Timeout) * time.Second | |
| if timeout == 0 { | |
| timeout = 5 * time.Second // default timeout | |
| } | |
| ctx, cancel := context.WithTimeout(context.Background(), timeout) | |
| if err := c.tlsHandshakeWithTimeout(conn, ctx); err == nil { | |
| h1 := conn.GetHandshakeLog() | |
| enumeratedCiphers = append(enumeratedCiphers, h1.ServerHello.CipherSuite.String()) | |
| } | |
| cancel() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/tlsx/ztls/ztls.go` around lines 260 - 265, The cipher-enumeration loop
uses context.WithTimeout with c.options.Timeout which when zero makes the
context expire immediately; update the code around tlsHandshakeWithTimeout and
the block that builds ctx, cancel to guard for c.options.Timeout == 0 (same
pattern as ConnectWithOptions) by choosing a sensible default or skipping the
timeout logic so handshakes can run, then call tlsHandshakeWithTimeout(conn,
ctx) and cancel() as before; reference c.options.Timeout,
tlsHandshakeWithTimeout, conn.GetHandshakeLog and enumeratedCiphers to locate
and apply the guard.
This PR fixes multiple indefinite hangs in tlsx:
These fixes resolve issue #819 where tlsx hangs indefinitely after processing a large number of targets.
Summary by CodeRabbit
Release Notes