Skip to content

fix: prevent indefinite hang during cipher enumeration (#819)#949

Open
cherry-bisht wants to merge 1 commit intoprojectdiscovery:mainfrom
cherry-bisht:fix-tlsx-hang-819
Open

fix: prevent indefinite hang during cipher enumeration (#819)#949
cherry-bisht wants to merge 1 commit intoprojectdiscovery:mainfrom
cherry-bisht:fix-tlsx-hang-819

Conversation

@cherry-bisht
Copy link

@cherry-bisht cherry-bisht commented Mar 7, 2026

Fixes an issue where tlsx could hang indefinitely during large scans with cipher enumeration enabled.

Root Cause

During large scans some targets accept a TCP connection but never complete the TLS handshake.
In this situation the TLS handshake call can block indefinitely.

Additionally, several parts of the scanning pipeline used context.Background() for
connection pool acquisition and network operations. Under high concurrency this allowed
workers to block without any timeout, eventually causing the scan to stall after processing
thousands of targets.

Cipher enumeration also performed TLS handshakes sequentially, which increased the
likelihood of worker starvation when multiple slow or unresponsive hosts were encountered.

Fix

This PR ensures that all potentially blocking operations respect timeouts and cannot stall workers:

  • Use timeout-aware contexts for pool.Acquire() and TLS operations.
  • Execute TLS handshakes with explicit deadlines so they cannot block indefinitely.
  • Close the underlying connection when a timeout occurs to guarantee the handshake returns.
  • Parallelize cipher enumeration using CipherConcurrency to avoid sequential blocking.
  • Ensure JSONL output is flushed reliably to prevent truncated output if the process stops.

Testing

Environment:
Arch Linux
Go 1.26

Test setup:

  • 30,000 targets
  • concurrency: 300
  • cipher enumeration enabled

Before fix:
tlsx consistently stalled around ~25k targets and the JSON output file ended with a
truncated line.

After fix:
The scan completes successfully with all results written and no deadlock observed.
JSONL output is valid and complete.

Proof:
30k targets processed successfully in 2min 31 sec

/claim #819

@neo-by-projectdiscovery-dev
Copy link

neo-by-projectdiscovery-dev bot commented Mar 7, 2026

Neo - PR Security Review

Caution

Review could not be completed

Task aborted before completion

Suggestion: Try again with @neo review.

Comment @neo help for available commands.

@coderabbitai
Copy link

coderabbitai bot commented Mar 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds concurrent worker-pool cipher enumeration, per-operation timeout contexts with explicit cancellation and handshake timeouts, expands handshake post-processing, introduces timeout/leak regression tests, adjusts file flush/newline error handling, tweaks defaults and client-cert error classification, and updates .gitignore.

Changes

Cohort / File(s) Summary
Cipher Enumeration Concurrency
pkg/tlsx/tls/tls.go, pkg/tlsx/ztls/ztls.go
Replaces sequential per-cipher processing with a worker-pool: cipher channel, workers, WaitGroup and mutex. Each worker acquires a timed context, performs timeout-aware handshake, records results, and ensures connection/context cleanup. Adds sync import.
TLS Connect & Timeout Handling
pkg/tlsx/tls/tls.go, pkg/tlsx/ztls/ztls.go, pkg/tlsx/jarm/jarm.go, pkg/tlsx/openssl/openssl.go
Wraps handshakes with deadlines/HandshakeContext and channels, returns explicit timeout errors and closes connections on timeout; moves to per-iteration context cancellation (openssl); acquires pool entries with context.WithTimeout and cancels on all early-exit paths (jarm).
Tests & Regression Suite
pkg/tlsx/tls/tls_test.go, pkg/tlsx/ztls/ztls_test.go, pkg/tlsx/ztls/regression_test.go
Adds new regression tests exercising handshake timeouts and goroutine leak detection; adds concurrency/timeouts tests; adjusts tests to early-return on ConnectWithOptions errors to avoid cascading assertions.
Output & File Handling
pkg/output/file_writer.go
Alters Write to ignore newline write error (always return nil after newline attempt); Close now attempts flush and returns flush error if present, otherwise returns file Close error—ensures flush doesn't short-circuit close.
Defaults & Validation
internal/runner/banner.go
Adds defaulting for options when non-positive: Concurrency=300, Timeout=5, CipherConcurrency=10.
Client Error Classification
pkg/tlsx/clients/utils.go
Broadens IsClientCertRequiredError to treat "handshake failure" as a client-cert-required condition (in addition to "bad certificate" and "certificate required").
Repo Ignore Rules
.gitignore
Replaces global vendor/ ignore with selective entries (e.g., vendor/hosts.txt) and adds ignores for out.jsonl and tlsx-dev.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Caller
    participant EnumerateCiphers
    participant CipherChannel
    participant Worker as Worker(Goroutine)
    participant ConnMgr as Connection
    participant Handshake
    participant Results

    Caller->>EnumerateCiphers: Start enumeration (cipher list)
    EnumerateCiphers->>CipherChannel: Create channel
    EnumerateCiphers->>Worker: Spawn N workers
    EnumerateCiphers->>CipherChannel: Enqueue ciphers
    EnumerateCiphers->>CipherChannel: Close channel

    loop for each cipher
        Worker->>CipherChannel: Receive cipher
        Worker->>ConnMgr: Acquire connection (context with timeout)
        Worker->>Handshake: HandshakeContext / timeout-aware
        Handshake-->>Worker: Success / Timeout / Error
        alt Success
            Worker->>Results: Append cipher (mutex)
        else Timeout or Error
            Worker->>ConnMgr: Close / cleanup
        end
        Worker->>ConnMgr: Ensure close & cancel context
    end

    Worker-->>EnumerateCiphers: Worker done (WaitGroup)
    EnumerateCiphers->>Caller: Return aggregated results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I bounded ciphers down the lane,
With workers hopping, one by one,
Timeouts watched, no leaking bane,
Flushes fixed, defaults begun—
Tests keep watch beneath the sun.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main fix: preventing indefinite hangs during cipher enumeration, which is the core issue addressed across multiple files.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/tlsx/ztls/ztls_test.go (1)

96-102: ⚠️ Potential issue | 🟡 Minor

Pre-existing bug: potential nil pointer dereference when both pointers are nil.

If both tc.expectedResult and actualResult are nil, the first two conditions (lines 96-99) are false, and line 100 will dereference nil pointers, causing a panic. This is a pre-existing issue not introduced by this PR, but worth fixing.

Compare with tls_test.go line 111 which has the correct guard: tc.expectedResult != nil && actualResult != nil && *tc.expectedResult != *actualResult.

🐛 Proposed fix to add nil guard
-		} else if *tc.expectedResult != *actualResult {
+		} else if tc.expectedResult != nil && actualResult != nil && *tc.expectedResult != *actualResult {
 			t.Errorf("expected isClientCertRequired = %t but received %t", *tc.expectedResult, *actualResult)
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tlsx/ztls/ztls_test.go` around lines 96 - 102, The test currently
dereferences tc.expectedResult and actualResult without ensuring both are
non-nil; update the final comparison to guard against nils by changing the
condition to check tc.expectedResult != nil && actualResult != nil &&
*tc.expectedResult != *actualResult before dereferencing, so the block that
calls t.Errorf only runs when both pointers are non-nil (referring to variables
tc.expectedResult and actualResult in ztls_test.go).
🧹 Nitpick comments (2)
pkg/tlsx/tls/tls.go (1)

249-251: Missing connection deadline before handshake.

In ztls.go (line 271), baseConn.SetDeadline is called before the TLS handshake to ensure operations don't block indefinitely. This is missing here. While HandshakeContext respects the context timeout, setting a deadline on the underlying connection provides defense-in-depth against blocking reads/writes.

♻️ Proposed fix to add connection deadline
 				cfg := baseCfg.Clone()
 				cfg.CipherSuites = []uint16{tlsCiphers[v]}
+				_ = baseConn.SetDeadline(time.Now().Add(time.Duration(c.options.Timeout) * time.Second))
 				conn := tls.Client(baseConn, cfg)

 				if err := conn.HandshakeContext(ctx); err == nil {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tlsx/tls/tls.go` around lines 249 - 251, Before calling
conn.HandshakeContext(ctx) on the tls.Client, set a connection deadline on the
underlying baseConn to prevent indefinite blocking: obtain the deadline from ctx
(if ctx has a deadline via ctx.Deadline) or compute one from the context
timeout, call baseConn.SetDeadline(deadline) immediately before
HandshakeContext, and after the handshake (success or failure) clear the
deadline with baseConn.SetDeadline(time.Time{}); do this around the tls.Client
created as conn (the variables/methods involved are baseConn, tls.Client(...,
cfg), conn.HandshakeContext(ctx), and baseConn.SetDeadline).
pkg/output/file_writer.go (1)

29-30: Consider adding a comment explaining the intentional error discard.

The explicit error discard (_, _ =) works correctly here because bufio.Writer propagates I/O errors through its internal state, which Flush() will return. However, this relies on an implementation detail that isn't immediately obvious to readers.

A brief comment would clarify the intent and prevent future maintainers from "fixing" the discarded error:

📝 Suggested documentation
-	_, _ = w.writer.WriteRune('\n')
+	// Error ignored: any I/O failure is captured by bufio's internal
+	// error state and will be returned by the subsequent Flush() call.
+	_, _ = w.writer.WriteRune('\n')
 	return w.writer.Flush()

Alternatively, for explicit error handling:

-	_, _ = w.writer.WriteRune('\n')
-	return w.writer.Flush()
+	if _, err = w.writer.WriteRune('\n'); err != nil {
+		return err
+	}
+	return w.writer.Flush()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/output/file_writer.go` around lines 29 - 30, Add a short inline comment
next to the intentional error discard on the WriteRune call to clarify why the
returned error is ignored (bufio.Writer accumulates errors and Flush() will
expose them); update the code near w.writer.WriteRune('\n') to explain that
WriteRune's error is intentionally discarded because w.writer.Flush() will
surface any prior write errors and thus is the single point of error handling
for this writer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/tlsx/ztls/ztls.go`:
- Around line 274-278: The code assumes conn.GetHandshakeLog() and its
ServerHello are non-nil after tlsHandshakeWithTimeout succeeds, which can panic;
update the block after tlsHandshakeWithTimeout to call h1 :=
conn.GetHandshakeLog() and defensively check that h1 != nil and h1.ServerHello
!= nil (and that CipherSuite is usable) before locking mu and appending
h1.ServerHello.CipherSuite.String() to enumeratedCiphers; if any are nil, skip
the append (or log a warning) so no nil pointer dereference occurs.

---

Outside diff comments:
In `@pkg/tlsx/ztls/ztls_test.go`:
- Around line 96-102: The test currently dereferences tc.expectedResult and
actualResult without ensuring both are non-nil; update the final comparison to
guard against nils by changing the condition to check tc.expectedResult != nil
&& actualResult != nil && *tc.expectedResult != *actualResult before
dereferencing, so the block that calls t.Errorf only runs when both pointers are
non-nil (referring to variables tc.expectedResult and actualResult in
ztls_test.go).

---

Nitpick comments:
In `@pkg/output/file_writer.go`:
- Around line 29-30: Add a short inline comment next to the intentional error
discard on the WriteRune call to clarify why the returned error is ignored
(bufio.Writer accumulates errors and Flush() will expose them); update the code
near w.writer.WriteRune('\n') to explain that WriteRune's error is intentionally
discarded because w.writer.Flush() will surface any prior write errors and thus
is the single point of error handling for this writer.

In `@pkg/tlsx/tls/tls.go`:
- Around line 249-251: Before calling conn.HandshakeContext(ctx) on the
tls.Client, set a connection deadline on the underlying baseConn to prevent
indefinite blocking: obtain the deadline from ctx (if ctx has a deadline via
ctx.Deadline) or compute one from the context timeout, call
baseConn.SetDeadline(deadline) immediately before HandshakeContext, and after
the handshake (success or failure) clear the deadline with
baseConn.SetDeadline(time.Time{}); do this around the tls.Client created as conn
(the variables/methods involved are baseConn, tls.Client(..., cfg),
conn.HandshakeContext(ctx), and baseConn.SetDeadline).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 82f24295-befc-41ec-bfbd-f5516e1fbe0a

📥 Commits

Reviewing files that changed from the base of the PR and between d13b67f and 67a66ab.

📒 Files selected for processing (6)
  • pkg/output/file_writer.go
  • pkg/tlsx/jarm/jarm.go
  • pkg/tlsx/tls/tls.go
  • pkg/tlsx/tls/tls_test.go
  • pkg/tlsx/ztls/ztls.go
  • pkg/tlsx/ztls/ztls_test.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/tlsx/ztls/regression_test.go`:
- Around line 104-118: Check and fail fast on setup errors by handling return
errors from fastdialer.NewDialer, New (client constructor), net.Dial, and
client.getConfig instead of ignoring them; if any of those return an error call
t.Fatalf with the error to stop the test immediately. Only defer dialer.Close()
after fastdialer.NewDialer succeeds, and ensure the created raw/tls connection
is closed on test exit (defer rawConn.Close() or defer tlsConn.Close() once
tlsConn is non-nil) so sockets aren’t leaked; locate symbols
fastdialer.NewDialer, New (client constructor), net.Dial, client.getConfig, and
tlsConn/tls.Client to apply these checks and closes.

In `@pkg/tlsx/ztls/ztls.go`:
- Around line 274-285: When GetHandshakeLog() returns nil or h1.ServerHello is
nil inside the tlsHandshakeWithTimeout handling, the function currently closes
conn and continues without calling cancel(), leaving the per-cipher context
timer running; fix by calling cancel() before continuing (and ensure
conn.Close() remains called), i.e., in the nil-ServerHello branch of the block
that follows tlsHandshakeWithTimeout(conn, ctx) invoke cancel() then close the
connection and continue so the WithTimeout context is always cancelled.
- Around line 256-291: The worker pool can deadlock or immediately fail when
c.options.CipherConcurrency is <=0 or c.options.Timeout is <=0; validate and
clamp these before spawning workers in the enumeration function (where
cipherChan, threads, wg, and pool are used). Ensure CipherConcurrency is set to
at least 1 (or a sane default) before creating goroutines and creating
cipherChan, and ensure Timeout is clamped to a positive duration (or set to the
same default used in ConnectWithOptions) before calling context.WithTimeout and
SetDeadline; update the code that calculates threads and uses c.options.Timeout
so the worker loop and timeouts always use safe, non-zero values.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2d715568-a634-4e82-be9b-5c1b27defc70

📥 Commits

Reviewing files that changed from the base of the PR and between 67a66ab and b5acefc.

📒 Files selected for processing (2)
  • pkg/tlsx/ztls/regression_test.go
  • pkg/tlsx/ztls/ztls.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/tlsx/tls/tls.go (1)

209-212: ⚠️ Potential issue | 🟠 Major

Guard invalid enum worker settings before starting the pool.

If CipherConcurrency <= 0, no worker starts and Line 264 blocks forever on the first send. If Timeout <= 0, Line 239 creates an already-expired context, so every acquire/handshake attempt is skipped immediately. This worker path is using both values unchecked.

🛠️ Proposed fix
+	if len(toEnumerate) == 0 {
+		return enumeratedCiphers, nil
+	}
+
 	threads := c.options.CipherConcurrency
+	if threads <= 0 {
+		threads = 1
+	}
+	timeout := time.Duration(c.options.Timeout) * time.Second
+	if timeout <= 0 {
+		return enumeratedCiphers, errorutil.NewWithTag("ctls", "cipher enumeration requires a positive timeout") //nolint
+	}
 	if len(toEnumerate) < threads {
 		threads = len(toEnumerate)
 	}
@@
-				ctx, cancel := context.WithTimeout(context.Background(), time.Duration(c.options.Timeout)*time.Second)
+				ctx, cancel := context.WithTimeout(context.Background(), timeout)
 				baseConn, err := pool.Acquire(ctx)

Also applies to: 239-240, 263-267

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tlsx/tls/tls.go` around lines 209 - 212, Validate and sanitize the
worker-related options before using them: ensure c.options.CipherConcurrency is
at least 1 (use min( len(toEnumerate), max(1, CipherConcurrency) )) before
creating the worker pool and ensure c.options.Timeout is positive (use a default
positive timeout if Timeout <= 0) before calling context.WithTimeout so you
don't create an already-expired context or zero workers; update the places that
compute threads (currently using CipherConcurrency and toEnumerate) and the
context timeout creation (where Timeout is used) to use these validated values
so the worker acquire/handshake loop cannot block forever or be skipped
immediately.
♻️ Duplicate comments (2)
pkg/tlsx/ztls/ztls.go (2)

226-229: ⚠️ Potential issue | 🟠 Major

Guard invalid CipherConcurrency and Timeout here as well.

If CipherConcurrency <= 0, no goroutine reads from cipherChan and Line 292 blocks forever. If Timeout <= 0, Line 262 creates an already-expired context and Line 272 sets the deadline to time.Now() or earlier, so every probe fails immediately.

🛠️ Proposed fix
+	if len(toEnumerate) == 0 {
+		return enumeratedCiphers, nil
+	}
+
 	threads := c.options.CipherConcurrency
+	if threads <= 0 {
+		threads = 1
+	}
+	timeout := time.Duration(c.options.Timeout) * time.Second
+	if timeout <= 0 {
+		return enumeratedCiphers, errorutil.NewWithTag("ztls", "cipher enumeration requires a positive timeout") //nolint
+	}
 	if len(toEnumerate) < threads {
 		threads = len(toEnumerate)
 	}
@@
-				ctx, cancel := context.WithTimeout(context.Background(), time.Duration(c.options.Timeout)*time.Second)
+				ctx, cancel := context.WithTimeout(context.Background(), timeout)
 				baseConn, err := pool.Acquire(ctx)
@@
-				_ = baseConn.SetDeadline(time.Now().Add(time.Duration(c.options.Timeout) * time.Second))
+				_ = baseConn.SetDeadline(time.Now().Add(timeout))

Also applies to: 262-272, 291-295

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tlsx/ztls/ztls.go` around lines 226 - 229, Guard against non-positive
CipherConcurrency and Timeout before using them: ensure
c.options.CipherConcurrency is clamped to a minimum of 1 when computing threads
(e.g., threads := c.options.CipherConcurrency; if threads <= 0 { threads = 1 })
and clamp threads to len(toEnumerate) as you already do; likewise ensure the
timeout value used to create the context and set deadlines (c.options.Timeout)
is positive (e.g., if c.options.Timeout <= 0 { c.options.Timeout =
defaultTimeout } or use a local timeout := c.options.Timeout; if timeout <= 0 {
timeout = time.Second }) so context.WithTimeout and deadline calculations never
get an already-expired time; update uses around cipherChan/toEnumerate, the
context creation (context.WithTimeout), and deadline setting to use the
validated/clamped values.

275-279: ⚠️ Potential issue | 🟡 Minor

Cancel the per-cipher context before continuing.

This branch skips Line 286, so the WithTimeout timer stays live until it expires whenever GetHandshakeLog() is incomplete.

🧹 Proposed fix
 				if err := c.tlsHandshakeWithTimeout(conn, ctx); err == nil {
 					h1 := conn.GetHandshakeLog()
 					if h1 == nil || h1.ServerHello == nil {
+						cancel()
 						_ = conn.Close()
 						continue
 					}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tlsx/ztls/ztls.go` around lines 275 - 279, The per-cipher timeout context
created for tlsHandshakeWithTimeout is not cancelled when you continue after an
incomplete handshake log; ensure you call the cancel function for the
WithTimeout context (the cancel returned alongside ctx used with
tlsHandshakeWithTimeout) before calling conn.Close() and continue so the timer
is released promptly; locate the per-cipher context creation and add cancel()
just prior to _ = conn.Close() / continue in the branch where
conn.GetHandshakeLog() is nil or h1.ServerHello is nil.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/tlsx/tls/tls.go`:
- Around line 209-212: Validate and sanitize the worker-related options before
using them: ensure c.options.CipherConcurrency is at least 1 (use min(
len(toEnumerate), max(1, CipherConcurrency) )) before creating the worker pool
and ensure c.options.Timeout is positive (use a default positive timeout if
Timeout <= 0) before calling context.WithTimeout so you don't create an
already-expired context or zero workers; update the places that compute threads
(currently using CipherConcurrency and toEnumerate) and the context timeout
creation (where Timeout is used) to use these validated values so the worker
acquire/handshake loop cannot block forever or be skipped immediately.

---

Duplicate comments:
In `@pkg/tlsx/ztls/ztls.go`:
- Around line 226-229: Guard against non-positive CipherConcurrency and Timeout
before using them: ensure c.options.CipherConcurrency is clamped to a minimum of
1 when computing threads (e.g., threads := c.options.CipherConcurrency; if
threads <= 0 { threads = 1 }) and clamp threads to len(toEnumerate) as you
already do; likewise ensure the timeout value used to create the context and set
deadlines (c.options.Timeout) is positive (e.g., if c.options.Timeout <= 0 {
c.options.Timeout = defaultTimeout } or use a local timeout :=
c.options.Timeout; if timeout <= 0 { timeout = time.Second }) so
context.WithTimeout and deadline calculations never get an already-expired time;
update uses around cipherChan/toEnumerate, the context creation
(context.WithTimeout), and deadline setting to use the validated/clamped values.
- Around line 275-279: The per-cipher timeout context created for
tlsHandshakeWithTimeout is not cancelled when you continue after an incomplete
handshake log; ensure you call the cancel function for the WithTimeout context
(the cancel returned alongside ctx used with tlsHandshakeWithTimeout) before
calling conn.Close() and continue so the timer is released promptly; locate the
per-cipher context creation and add cancel() just prior to _ = conn.Close() /
continue in the branch where conn.GetHandshakeLog() is nil or h1.ServerHello is
nil.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 45581d0d-f67f-4986-a32f-acf161500ac9

📥 Commits

Reviewing files that changed from the base of the PR and between b5acefc and a1b7902.

📒 Files selected for processing (7)
  • fix-tlsx-hang-819
  • hosts.txt
  • nuclei
  • out.jsonl
  • pkg/tlsx/tls/tls.go
  • pkg/tlsx/ztls/ztls.go
  • tlsx-dev
✅ Files skipped from review due to trivial changes (1)
  • nuclei

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
pkg/tlsx/ztls/ztls.go (2)

253-295: ⚠️ Potential issue | 🟠 Major

Clamp CipherConcurrency and Timeout inside the ztls enumerator too.

This still relies on validateOptions() having run earlier. If CipherConcurrency is 0, no workers are started, so the first send on Line 291 has no receiver; if Timeout <= 0, the per-cipher context is already expired and Line 272 sets an immediate socket deadline.

🛠️ Proposed fix
 	threads := c.options.CipherConcurrency
+	if threads <= 0 {
+		threads = 10
+	}
 	if len(toEnumerate) < threads {
 		threads = len(toEnumerate)
 	}
+	if threads == 0 {
+		return enumeratedCiphers, nil
+	}
+
+	timeout := c.options.Timeout
+	if timeout <= 0 {
+		timeout = 5
+	}
@@
-				ctx, cancel := context.WithTimeout(context.Background(), time.Duration(c.options.Timeout)*time.Second)
+				ctx, cancel := context.WithTimeout(context.Background(), time.Duration(timeout)*time.Second)
@@
-				_ = baseConn.SetDeadline(time.Now().Add(time.Duration(c.options.Timeout) * time.Second))
+				_ = baseConn.SetDeadline(time.Now().Add(time.Duration(timeout) * time.Second))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tlsx/ztls/ztls.go` around lines 253 - 295, Clamp the concurrency and
timeout locally before spawning workers: compute threads =
c.options.CipherConcurrency; if threads < 1 set threads = 1 (so at least one
goroutine receives from cipherChan), and compute timeoutSecs =
c.options.Timeout; if timeoutSecs <= 0 set timeoutSecs to a sane minimum (e.g.
1) and use timeoutSecs in context.WithTimeout and when calling SetDeadline.
Update references to threads and c.options.Timeout in the goroutine
(context.WithTimeout and SetDeadline) to use these clamped local values so the
enumerator is safe even if validateOptions() wasn't run.

275-280: ⚠️ Potential issue | 🟡 Minor

Cancel before the nil-ServerHello continue.

This branch skips Line 286, so every incomplete handshake leaves its timeout timer live until it expires.

🛠️ Proposed fix
 				if err := c.tlsHandshakeWithTimeout(conn, ctx); err == nil {
 					h1 := conn.GetHandshakeLog()
 					if h1 == nil || h1.ServerHello == nil {
 						_ = conn.Close()
+						cancel()
 						continue
 					}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tlsx/ztls/ztls.go` around lines 275 - 280, When h1 == nil ||
h1.ServerHello == nil inside the post-handshake check (after calling
c.tlsHandshakeWithTimeout(conn, ctx)), stop the handshake timeout before closing
the connection and continuing; specifically, invoke the cancel function
associated with the context/timer (the cancel returned from
context.WithCancel/WithTimeout that is used with tlsHandshakeWithTimeout) just
before calling conn.Close() and continue, so the timeout goroutine/timer is
cleaned up.
🧹 Nitpick comments (1)
pkg/output/file_writer.go (1)

35-45: Improved Close logic ensures file is always closed, but consider preserving both errors.

The nil check for w.writer is good defensive coding. Always closing the file even if flush fails is an improvement over potentially leaving the file handle open.

However, if both Flush() and Close() fail, only flushErr is returned and the close error is lost. Consider wrapping both errors if both are non-nil:

Optional: preserve both errors
 	err := w.file.Close()
 	if flushErr != nil {
+		if err != nil {
+			return fmt.Errorf("flush error: %w; close error: %v", flushErr, err)
+		}
 		return flushErr
 	}
 	return err

This would require adding "fmt" to imports.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/output/file_writer.go` around lines 35 - 45, The Close implementation
currently flushes w.writer (flushErr) then always calls w.file.Sync() and
w.file.Close() but returns only flushErr, losing close error; update the logic
in Close (the block using w.writer.Flush(), w.file.Sync(), and w.file.Close())
to capture both errors, call w.file.Close() unconditionally, and if both Flush()
and Close() return non-nil errors, return a wrapped error that preserves both
(e.g., fmt.Errorf("flush error: %w; close error: %v", flushErr, closeErr)),
adding "fmt" to imports.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/output/file_writer.go`:
- Around line 29-30: The code in pkg/output/file_writer.go currently discards
the error from w.writer.WriteRune('\n'); change it to capture and return that
error (e.g., err := w.writer.WriteRune('\n'); if err != nil { return err }) so
callers receive failures, and if the intention is to flush after each JSONL
write also call and check a flush (e.g., if w.writer has Flush or use a known
flush method on the writer) and propagate any flush error; update the method
that contains w.writer.WriteRune to return the combined error instead of nil.

In `@pkg/tlsx/clients/utils.go`:
- Around line 152-153: The case matching the generic string "handshake failure"
inside the client-cert detection logic (in the function isClientCertRequired or
the switch handling handshake error strings in pkg/tlsx/clients/utils.go) is too
broad and causes non-cert handshake errors to be treated as ClientCertRequired;
remove that "handshake failure" branch or replace it with a certificate-specific
condition (e.g., explicit messages that mention client certificate or TLS alert
codes indicating certificate required) so only true client-cert-required errors
return true and other handshake failures are left to surface normally.

In `@pkg/tlsx/tls/tls_test.go`:
- Around line 134-143: The accepted connections from the listener (the goroutine
calling l.Accept()) are never closed; collect each net.Conn returned by
l.Accept() (e.g., append to a slice protected by a sync.Mutex or channel) and
register a t.Cleanup that iterates over that collection and calls Close() on
every conn (and clears the collection). Do the same fix for the other identical
accept loop mentioned (the block that also calls l.Accept()); ensure you add the
collection and t.Cleanup near the test setup so every accepted connection is
closed during test cleanup.
- Around line 205-208: The test currently ignores errors from
fastdialer.NewDialer and tls.New which can cause panics or misleading leak
reports; change the setup to check and handle both errors (capture returned errs
from fastdialer.NewDialer and tls.New), call t.Fatalf/t.Fatalf-like assertion on
error to fail the test immediately with the error, and only defer dialer.Close()
after verifying dialer is non-nil/successfully created; reference
fastdialer.NewDialer, dialer.Close, tls.New and the clients.Options{Fastdialer:
dialer, Timeout: 1} to locate where to add the error checks and test failures.

In `@pkg/tlsx/tls/tls.go`:
- Around line 243-281: EnumerateCiphers can block or immediately timeout when
options are zero-valued; clamp CipherConcurrency and Timeout at the start of
EnumerateCiphers (e.g. threads := c.options.CipherConcurrency; if threads <= 0 {
threads = 1 }) and normalize timeout used for context creation (e.g. if
c.options.Timeout <= 0 { c.options.Timeout = 1 } or use a local timeoutSec
variable set to 1 when <=0) before spawning goroutines, and return early when
there is nothing to enumerate (if len(toEnumerate) == 0 return). Update the code
paths that use threads, c.options.Timeout, and toEnumerate (refer to
EnumerateCiphers, threads, c.options.Timeout, toEnumerate, pool.Acquire, and the
worker goroutine) to use these normalized values so sends on cipherChan never
block and acquires/handshakes have a sensible timeout.

---

Duplicate comments:
In `@pkg/tlsx/ztls/ztls.go`:
- Around line 253-295: Clamp the concurrency and timeout locally before spawning
workers: compute threads = c.options.CipherConcurrency; if threads < 1 set
threads = 1 (so at least one goroutine receives from cipherChan), and compute
timeoutSecs = c.options.Timeout; if timeoutSecs <= 0 set timeoutSecs to a sane
minimum (e.g. 1) and use timeoutSecs in context.WithTimeout and when calling
SetDeadline. Update references to threads and c.options.Timeout in the goroutine
(context.WithTimeout and SetDeadline) to use these clamped local values so the
enumerator is safe even if validateOptions() wasn't run.
- Around line 275-280: When h1 == nil || h1.ServerHello == nil inside the
post-handshake check (after calling c.tlsHandshakeWithTimeout(conn, ctx)), stop
the handshake timeout before closing the connection and continuing;
specifically, invoke the cancel function associated with the context/timer (the
cancel returned from context.WithCancel/WithTimeout that is used with
tlsHandshakeWithTimeout) just before calling conn.Close() and continue, so the
timeout goroutine/timer is cleaned up.

---

Nitpick comments:
In `@pkg/output/file_writer.go`:
- Around line 35-45: The Close implementation currently flushes w.writer
(flushErr) then always calls w.file.Sync() and w.file.Close() but returns only
flushErr, losing close error; update the logic in Close (the block using
w.writer.Flush(), w.file.Sync(), and w.file.Close()) to capture both errors,
call w.file.Close() unconditionally, and if both Flush() and Close() return
non-nil errors, return a wrapped error that preserves both (e.g.,
fmt.Errorf("flush error: %w; close error: %v", flushErr, closeErr)), adding
"fmt" to imports.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9902043b-f43e-4b3b-ba3f-3dfe5f4b05dd

📥 Commits

Reviewing files that changed from the base of the PR and between 132b508 and 8ccd1e0.

📒 Files selected for processing (7)
  • internal/runner/banner.go
  • pkg/output/file_writer.go
  • pkg/tlsx/clients/utils.go
  • pkg/tlsx/openssl/openssl.go
  • pkg/tlsx/tls/tls.go
  • pkg/tlsx/tls/tls_test.go
  • pkg/tlsx/ztls/ztls.go

@cherry-bisht cherry-bisht force-pushed the fix-tlsx-hang-819 branch 3 times, most recently from 927b126 to 180ca0f Compare March 7, 2026 17:17
### Root Cause
The hang was caused by a violation of Go's select statement evaluation rules. Previously, tlsConn.Handshake() was executed as an expression within a select case. In Go, all case expressions are evaluated before the select block begins monitoring for the first available case. If the handshake blocked (e.g., a server accepting TCP but ignoring ClientHello), the timeout case (ctx.Done()) was never reached, and the worker goroutine remained stuck indefinitely.

Additionally, some connection pool acquisitions used context.Background(), allowing workers to block indefinitely when the pool was exhausted under high concurrency.

### Fix
- Handshake Isolation: Wrapped all TLS handshakes in dedicated goroutines, allowing the select block to immediately monitor the timeout.
- Forced Abortion: Upon timeout, the underlying raw TCP connection is closed to unblock the handshake.
- Leak Prevention: Implemented synchronous draining of the result channel on timeout to ensure goroutine termination.
- Timeout-Aware Acquisition: Updated pool.Acquire() to use context.WithTimeout.
- Reliable Output: Refactored file_writer to combine data and newline into a single atomic write, preventing JSONL corruption.

### Validation
- Successfully scanned 30,000 targets in 2m31s (previously hung at ~25k).
- Regression tests prove perfect goroutine stability after 1,000+ concurrent timeouts.
- Fixed FD leaks in test servers using t.Cleanup().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants