feat: add automatic wildcard detection (--auto-wildcard)#962
feat: add automatic wildcard detection (--auto-wildcard)#962flaggdavid-source wants to merge 2 commits intoprojectdiscovery:devfrom
Conversation
Adds --auto-wildcard / -aw flag that automatically detects and filters wildcard DNS domains across all input, similar to PureDNS. How it works: 1. Before resolution, extracts unique root domains from all inputs 2. Probes each domain with a random subdomain (xid-generated) 3. Compares response IPs against root domain IPs 4. Domains returning the same IPs for random subdomains are marked as wildcard and their results are filtered from output This eliminates the need to manually specify -wd for each domain, making wildcard filtering practical for large multi-domain scans. Changes: - internal/runner/options.go: Add AutoWildcard bool field and -aw flag, block in stream mode (consistent with existing -wd behavior) - internal/runner/wildcard.go: Add auto-detection logic with thread-safe domain tracking (RWMutex), root domain extraction, and per-domain wildcard probing - internal/runner/runner.go: Integrate auto-detection before workers start, add post-processing filter for detected wildcard domains Fixes projectdiscovery#924
Neo - PR Security ReviewNo security issues found Highlights
Hardening Notes
Comment |
WalkthroughAdds an optional auto-wildcard detection and filtering mode enabled by Changes
Sequence DiagramsequenceDiagram
participant Client
participant Runner
participant Workers
participant WildcardDetector as Wildcard Detector
participant Registry as Wildcard Registry
participant Output
Client->>Runner: Start with --auto-wildcard
activate Runner
Runner->>WildcardDetector: AutoDetectWildcards()
activate WildcardDetector
WildcardDetector->>Runner: Read stored hostnames (r.hm)
WildcardDetector->>WildcardDetector: For each root domain: probe subdomain vs root
WildcardDetector->>Registry: Register detected wildcard domains
deactivate WildcardDetector
Runner->>Workers: Start DNS resolution workers
activate Workers
Workers->>Runner: Store DNS results (persist for later filtering)
deactivate Workers
Runner->>Output: Restart/ensure output worker
activate Output
Runner->>Registry: Query isAutoWildcardDomain(root)
Registry-->>Runner: wildcard status
Runner->>Output: Emit non-wildcard hosts via lookupAndOutput
Runner->>Output: Close output channel
Output-->>Runner: Output worker finished
deactivate Output
Runner-->>Client: Return filtered results
deactivate Runner
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.Change the |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
internal/runner/wildcard.go (1)
11-13: Scope detected wildcard domains toRunnerstate.This registry is package-global and survives for the lifetime of the process, so a second
Runnerin the same process inherits detections from the previous run. Keeping it onRunner(likewildcards) or clearing it at the start ofAutoDetectWildcards()would avoid cross-run leakage and make tests more predictable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/runner/wildcard.go` around lines 11 - 13, The global registry autoWildcardDomains and its mutex autoWildcardDomainsMutex leak state across Runner instances; move this state into the Runner struct (e.g., add a wildcards/autoWildcardDomains field and its mutex) and update AutoDetectWildcards(), any callers, and checks to use r.autoWildcardDomains / r.autoWildcardDomainsMutex (or clear autoWildcardDomains at the start of AutoDetectWildcards() if moving is impractical) so each Runner has its own scoped wildcard registry and tests/runs no longer inherit prior detections.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/runner/options.go`:
- Around line 312-314: Add a validation to reject using --auto-wildcard together
with --wildcard-domain: in the options validation block where
options.AutoWildcard is checked, also check options.WildcardDomain and call
gologger.Fatal().Msgf(...) if both are true, with a clear message like
"auto-wildcard and wildcard-domain are mutually exclusive"; reference the
options.AutoWildcard and options.WildcardDomain flags so the runner never allows
both (Runner.run executes manual wildcard post-processing and auto-wildcard
independently and will otherwise emit duplicate/inconsistent host output).
In `@internal/runner/runner.go`:
- Around line 559-594: The auto-wildcard filtering runs too late and only
post-processes already-emitted results; to fix, change the worker flow to buffer
outputs when r.options.AutoWildcard is true (same approach used for
options.WildcardDomain) so wildcard detection happens before any results are
written. Specifically, in worker() ensure startOutputWorker()/outputchan
buffering is enabled earlier when r.options.AutoWildcard is set, make the
auto-wildcard scan occur before closing outputchan, and have lookupAndOutput()
write into the buffered store (or reuse the same buffering mechanism) so
JSON/response-mode entries can be reconstructed and suppressed correctly; update
any related wait/close logic (wgoutputworker, close(outputchan)) to match the
wildcard-domain path.
In `@internal/runner/wildcard.go`:
- Around line 98-123: The wildcard detection currently only inspects DNSData.A
(in detectWildcardForDomain via r.dnsx.QueryOne and DNSData.A), causing failures
for non-A query types; update detectWildcardForDomain so that when auto-wildcard
is enabled it either (a) explicitly performs A-record lookups for the random
test subdomain and the root domain regardless of the user query type, or (b)
dynamically inspects the response field matching the requested record type
(e.g., DNSData.AAAA, DNSData.CNAME, DNSData.MX, etc.) instead of only DNSData.A;
change the comparisons and map construction (rootIPs and in.A iteration) to use
the selected record slice based on the active query type to correctly detect
wildcards for non-A queries.
---
Nitpick comments:
In `@internal/runner/wildcard.go`:
- Around line 11-13: The global registry autoWildcardDomains and its mutex
autoWildcardDomainsMutex leak state across Runner instances; move this state
into the Runner struct (e.g., add a wildcards/autoWildcardDomains field and its
mutex) and update AutoDetectWildcards(), any callers, and checks to use
r.autoWildcardDomains / r.autoWildcardDomainsMutex (or clear autoWildcardDomains
at the start of AutoDetectWildcards() if moving is impractical) so each Runner
has its own scoped wildcard registry and tests/runs no longer inherit prior
detections.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7f7e5963-bc4d-4dda-829a-f81cdb8b8649
📒 Files selected for processing (3)
internal/runner/options.gointernal/runner/runner.gointernal/runner/wildcard.go
internal/runner/runner.go
Outdated
| // Auto wildcard filtering - filter results from detected wildcard domains | ||
| if r.options.AutoWildcard && len(autoWildcardDomains) > 0 { | ||
| gologger.Print().Msgf("Starting to filter auto-detected wildcard domains\n") | ||
|
|
||
| // we need to restart output | ||
| r.startOutputWorker() | ||
|
|
||
| seen := make(map[string]struct{}) | ||
| numRemovedSubdomains := 0 | ||
|
|
||
| r.hm.Scan(func(k, v []byte) error { | ||
| host := string(k) | ||
| rootDomain := getRootDomain(host) | ||
|
|
||
| // Skip if this domain was detected as wildcard | ||
| if IsAutoWildcardDomain(rootDomain) { | ||
| if _, ok := seen[host]; !ok { | ||
| numRemovedSubdomains++ | ||
| seen[host] = struct{}{} | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| // Output non-wildcard results | ||
| if _, ok := seen[host]; !ok { | ||
| seen[host] = struct{}{} | ||
| _ = r.lookupAndOutput(host) | ||
| } | ||
| return nil | ||
| }) | ||
|
|
||
| close(r.outputchan) | ||
| // waiting output worker | ||
| r.wgoutputworker.Wait() | ||
| gologger.Print().Msgf("%d wildcard subdomains removed\n", numRemovedSubdomains) | ||
| } |
There was a problem hiding this comment.
--auto-wildcard filters too late to affect the real output.
This block runs after Lines 475-476 have already closed the original output worker, so wildcard matches have already been printed/written once. Unlike the --wildcard-domain flow, worker() only buffers results when options.WildcardDomain != "", so --auto-wildcard never suppresses the first pass. The current behavior is an extra filtered pass appended to the unfiltered output, and lookupAndOutput() cannot reconstruct JSON/response-mode results because nothing was stored for this path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/runner/runner.go` around lines 559 - 594, The auto-wildcard
filtering runs too late and only post-processes already-emitted results; to fix,
change the worker flow to buffer outputs when r.options.AutoWildcard is true
(same approach used for options.WildcardDomain) so wildcard detection happens
before any results are written. Specifically, in worker() ensure
startOutputWorker()/outputchan buffering is enabled earlier when
r.options.AutoWildcard is set, make the auto-wildcard scan occur before closing
outputchan, and have lookupAndOutput() write into the buffered store (or reuse
the same buffering mechanism) so JSON/response-mode entries can be reconstructed
and suppressed correctly; update any related wait/close logic (wgoutputworker,
close(outputchan)) to match the wildcard-domain path.
|
Implemented randomized wildcard probing to improve the accuracy of DNS wildcard detection. By using a unique prefix for each probe, we can bypass potential DNS caching issues and more reliably identify wildcard records. |
|
Thanks for the suggestion! We're using |
- Add mutual exclusion validation for --auto-wildcard and --wildcard-domain - Buffer results when auto-wildcard is active (store DNS data during worker phase, output only in post-processing) to prevent double output - Use Lookup() for wildcard detection to always check A records regardless of configured query types (fixes detection when using -aaaa, -cname, etc.) - Scope wildcard registry to Runner struct instead of package-global state to prevent cross-run leakage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (3)
internal/runner/wildcard.go (3)
144-152: Single-threaded detection may be slow for large, diverse input sets.The sequential loop through domains (lines 145-152) could become a bottleneck if the input contains thousands of unique root domains. Each domain requires two DNS lookups.
Consider parallelizing this in a future iteration—similar to how
wildcardWorker()goroutines handle the existing--wildcard-domainflow—if performance becomes an issue.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/runner/wildcard.go` around lines 144 - 152, The sequential loop that runs detectWildcardForDomain for each domain can be parallelized to avoid being a bottleneck; replace the single-threaded for-range in wildcard.go with a worker pool: create a buffered channel of domains, spawn N goroutines (or use a semaphore to limit concurrency) that read domains, call r.detectWildcardForDomain(domain), and on true lock r.autoWildcardDomainsMutex to write to r.autoWildcardDomains and log the detection; use a sync.WaitGroup to wait for workers to finish and ensure any logs and map writes remain protected exactly as in the current code; mirror the concurrency pattern used by wildcardWorker() to choose a sensible default for N or make it configurable.
73-85: Known limitation acknowledged; consider documenting workaround.The two-label extraction is a reasonable simplification for common TLDs. The inline comment correctly documents the limitation for multi-part TLDs (
.co.uk,.com.au).For users needing accurate extraction, consider adding a note in the CLI help or documentation that a public suffix list library (e.g.,
publicsuffix) could be integrated in the future.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/runner/wildcard.go` around lines 73 - 85, Update the documentation/help text to explicitly note the limitation of getRootDomain's two-label heuristic and recommend using a public suffix list for accurate extraction; mention the getRootDomain function in internal/runner/wildcard.go and add a short CLI/help message (or README entry) advising that multi-part TLDs like .co.uk and .com.au are not handled correctly and that integrating a library such as publicsuffix would be the recommended workaround.
95-101: Rate limiter not applied to wildcard detection lookups.The
Lookup()calls here (and inrunner.go:757for ASN checks) bypassr.limiter.Take(). This is consistent with existing code, but be aware that with--auto-wildcardenabled,2 * number_of_unique_root_domainsDNS queries occur before the rate-limited main resolution phase. For users with strict resolver rate limits, this could cause transient issues.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/runner/wildcard.go` around lines 95 - 101, The wildcard detection and ASN pre-checks are bypassing the global rate limiter by calling r.dnsx.Lookup(...) directly; add r.limiter.Take() immediately before each such Lookup (e.g., the testHost and domain lookups in wildcard detection and the ASN-related Lookup in the ASN check code path) so those pre-check queries are subject to the same rate limiting as the main resolution phase; ensure you call r.limiter.Take() before each Lookup and preserve existing error handling around the Lookup calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/runner/wildcard.go`:
- Around line 144-152: The sequential loop that runs detectWildcardForDomain for
each domain can be parallelized to avoid being a bottleneck; replace the
single-threaded for-range in wildcard.go with a worker pool: create a buffered
channel of domains, spawn N goroutines (or use a semaphore to limit concurrency)
that read domains, call r.detectWildcardForDomain(domain), and on true lock
r.autoWildcardDomainsMutex to write to r.autoWildcardDomains and log the
detection; use a sync.WaitGroup to wait for workers to finish and ensure any
logs and map writes remain protected exactly as in the current code; mirror the
concurrency pattern used by wildcardWorker() to choose a sensible default for N
or make it configurable.
- Around line 73-85: Update the documentation/help text to explicitly note the
limitation of getRootDomain's two-label heuristic and recommend using a public
suffix list for accurate extraction; mention the getRootDomain function in
internal/runner/wildcard.go and add a short CLI/help message (or README entry)
advising that multi-part TLDs like .co.uk and .com.au are not handled correctly
and that integrating a library such as publicsuffix would be the recommended
workaround.
- Around line 95-101: The wildcard detection and ASN pre-checks are bypassing
the global rate limiter by calling r.dnsx.Lookup(...) directly; add
r.limiter.Take() immediately before each such Lookup (e.g., the testHost and
domain lookups in wildcard detection and the ASN-related Lookup in the ASN check
code path) so those pre-check queries are subject to the same rate limiting as
the main resolution phase; ensure you call r.limiter.Take() before each Lookup
and preserve existing error handling around the Lookup calls.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2c265ea2-0191-49be-aa26-d3b8dc6b3af9
📒 Files selected for processing (3)
internal/runner/options.gointernal/runner/runner.gointernal/runner/wildcard.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/runner/options.go
|
Thank you for the contribution! This PR has a clean scope with only 3 files, but it lacks test coverage for the new wildcard detection functionality. The issue has been resolved via #966, which includes a comprehensive test suite (364 lines covering multi-root filtering, AAAA, CNAME, threshold independence, and normalization) plus shared library extraction into |
Summary
Fixes #924
Adds
--auto-wildcard/-awflag that automatically detects and filters wildcard DNS domains across all input, similar to how PureDNS handles wildcard detection.How It Works
xid-generated, consistent with existing wildcard code)This eliminates the need to manually specify
-wdfor each domain, making wildcard filtering practical for large multi-domain scans.Changes
internal/runner/options.goAutoWildcardfield,-awflag, stream mode validationinternal/runner/wildcard.goAutoDetectWildcards(),detectWildcardForDomain(),getRootDomain(),IsAutoWildcardDomain()with thread-safeRWMutexinternal/runner/runner.goUsage
Testing
go build ./...passes clean-houtput under Configurations groupKnown Limitations
getRootDomain()uses a simple two-label heuristic — works for.com,.org, etc. but not multi-part TLDs like.co.uk. Documented in code comments. A public suffix list library could be integrated in a follow-up if needed.Summary by CodeRabbit
New Features
Bug Fixes / Validation