Skip to content

feat: add automatic wildcard detection (--auto-wildcard)#962

Closed
flaggdavid-source wants to merge 2 commits intoprojectdiscovery:devfrom
flaggdavid-source:feat/auto-wildcard-detection-924
Closed

feat: add automatic wildcard detection (--auto-wildcard)#962
flaggdavid-source wants to merge 2 commits intoprojectdiscovery:devfrom
flaggdavid-source:feat/auto-wildcard-detection-924

Conversation

@flaggdavid-source
Copy link
Copy Markdown

@flaggdavid-source flaggdavid-source commented Mar 15, 2026

Summary

Fixes #924

Adds --auto-wildcard / -aw flag that automatically detects and filters wildcard DNS domains across all input, similar to how PureDNS handles wildcard detection.

How It Works

  1. Before resolution, extracts unique root domains from all inputs
  2. Probes each domain with a random subdomain (xid-generated, consistent with existing wildcard code)
  3. Compares response IPs — if a random subdomain returns the same IPs as the root domain, it's flagged as wildcard
  4. Post-processing filter removes results from detected wildcard domains

This eliminates the need to manually specify -wd for each domain, making wildcard filtering practical for large multi-domain scans.

Changes

File Change
internal/runner/options.go New AutoWildcard field, -aw flag, stream mode validation
internal/runner/wildcard.go AutoDetectWildcards(), detectWildcardForDomain(), getRootDomain(), IsAutoWildcardDomain() with thread-safe RWMutex
internal/runner/runner.go Integration: auto-detect before workers, post-processing filter after resolution

Usage

# Automatically detect and filter wildcards (replaces manual -wd)
echo "sub1.example.com\nsub2.example.com\nsub3.wildcard.com" | dnsx -aw

# Works with file input too
dnsx -l subdomains.txt -aw

Testing

  • go build ./... passes clean
  • New flag visible in -h output under Configurations group
  • Stream mode correctly rejected with error message

Known Limitations

  • getRootDomain() uses a simple two-label heuristic — works for .com, .org, etc. but not multi-part TLDs like .co.uk. Documented in code comments. A public suffix list library could be integrated in a follow-up if needed.
  • Single-probe detection per domain. Could be extended to multiple probes for higher confidence.

Summary by CodeRabbit

  • New Features

    • Added an option to automatically detect and filter wildcard subdomains from enumeration results.
    • Detection runs before enumeration and a post-pass removes detected wildcard-derived hosts, with a summary count.
  • Bug Fixes / Validation

    • New validation prevents using the auto-wildcard option together with an explicit wildcard domain or in stream mode.

Adds --auto-wildcard / -aw flag that automatically detects and filters
wildcard DNS domains across all input, similar to PureDNS.

How it works:
1. Before resolution, extracts unique root domains from all inputs
2. Probes each domain with a random subdomain (xid-generated)
3. Compares response IPs against root domain IPs
4. Domains returning the same IPs for random subdomains are marked
   as wildcard and their results are filtered from output

This eliminates the need to manually specify -wd for each domain,
making wildcard filtering practical for large multi-domain scans.

Changes:
- internal/runner/options.go: Add AutoWildcard bool field and -aw flag,
  block in stream mode (consistent with existing -wd behavior)
- internal/runner/wildcard.go: Add auto-detection logic with thread-safe
  domain tracking (RWMutex), root domain extraction, and per-domain
  wildcard probing
- internal/runner/runner.go: Integrate auto-detection before workers
  start, add post-processing filter for detected wildcard domains

Fixes projectdiscovery#924
@neo-by-projectdiscovery-dev
Copy link
Copy Markdown

neo-by-projectdiscovery-dev bot commented Mar 15, 2026

Neo - PR Security Review

No security issues found

Highlights

  • Adds --auto-wildcard / -aw flag for automatic wildcard DNS domain detection and filtering
  • Extracts unique root domains and probes each with random subdomain before resolution
  • Filters resolution results to exclude detected wildcard domains
Hardening Notes
  • Add rate limiting per root domain in wildcard detection to prevent resolver abuse
  • Sanitize error messages in non-verbose mode to avoid leaking DNS infrastructure details
  • Run go test -race to verify concurrency safety of wildcard map access patterns
  • Consider using golang.org/x/net/publicsuffix instead of two-label heuristic for proper .co.uk handling

Comment @pdneo help for available commands. · Open in Neo

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 15, 2026

Walkthrough

Adds an optional auto-wildcard detection and filtering mode enabled by --auto-wildcard. When enabled, the runner scans collected hostnames to detect wildcard root domains, records them, and suppresses output for hosts under detected wildcard domains during a post-processing phase.

Changes

Cohort / File(s) Summary
Configuration
internal/runner/options.go
Added AutoWildcard bool to Options, registered --auto-wildcard (-aw) flag, and updated validation to reject --auto-wildcard when WildcardDomain is set or in stream mode.
Execution Flow / Runner state
internal/runner/runner.go
Added autoWildcardDomains map and mutex to Runner. When enabled, runs AutoDetectWildcards() before workers, persists additional DNS data during worker runs for later filtering, and performs a post-processing phase that restarts the output worker, filters stored hosts by detected wildcard root domains, emits non-wildcard results, and logs removed counts.
Wildcard detection logic
internal/runner/wildcard.go
Implemented root-domain extraction, per-domain wildcard detection (detectWildcardForDomain), AutoDetectWildcards() to scan stored hostnames and populate Runner's auto-wildcard registry, and isAutoWildcardDomain() accessor; added progress and result logging.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Runner
    participant Workers
    participant WildcardDetector as Wildcard Detector
    participant Registry as Wildcard Registry
    participant Output

    Client->>Runner: Start with --auto-wildcard
    activate Runner

    Runner->>WildcardDetector: AutoDetectWildcards()
    activate WildcardDetector
    WildcardDetector->>Runner: Read stored hostnames (r.hm)
    WildcardDetector->>WildcardDetector: For each root domain: probe subdomain vs root
    WildcardDetector->>Registry: Register detected wildcard domains
    deactivate WildcardDetector

    Runner->>Workers: Start DNS resolution workers
    activate Workers
    Workers->>Runner: Store DNS results (persist for later filtering)
    deactivate Workers

    Runner->>Output: Restart/ensure output worker
    activate Output
    Runner->>Registry: Query isAutoWildcardDomain(root)
    Registry-->>Runner: wildcard status
    Runner->>Output: Emit non-wildcard hosts via lookupAndOutput
    Runner->>Output: Close output channel
    Output-->>Runner: Output worker finished
    deactivate Output

    Runner-->>Client: Return filtered results
    deactivate Runner
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I sniffed the wild domains at night,

hopped through roots till patterns were right.
With one small flag I chased the noise,
and filtered fake hops into poise.
Hooray—clean hops and quieter lights!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add automatic wildcard detection (--auto-wildcard)' directly and clearly summarizes the main feature addition in the pull request.
Linked Issues check ✅ Passed The PR fully implements all three requirements from issue #924: automatic wildcard detection across multiple domains, filtering of wildcard-based results, and exposure via the --auto-wildcard flag.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the auto-wildcard detection feature with appropriate validation, mutual exclusion checks, and integration into the existing runner workflow.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.

Change the reviews.profile setting to assertive to make CodeRabbit's nitpick more issues in your PRs.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
internal/runner/wildcard.go (1)

11-13: Scope detected wildcard domains to Runner state.

This registry is package-global and survives for the lifetime of the process, so a second Runner in the same process inherits detections from the previous run. Keeping it on Runner (like wildcards) or clearing it at the start of AutoDetectWildcards() would avoid cross-run leakage and make tests more predictable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/runner/wildcard.go` around lines 11 - 13, The global registry
autoWildcardDomains and its mutex autoWildcardDomainsMutex leak state across
Runner instances; move this state into the Runner struct (e.g., add a
wildcards/autoWildcardDomains field and its mutex) and update
AutoDetectWildcards(), any callers, and checks to use r.autoWildcardDomains /
r.autoWildcardDomainsMutex (or clear autoWildcardDomains at the start of
AutoDetectWildcards() if moving is impractical) so each Runner has its own
scoped wildcard registry and tests/runs no longer inherit prior detections.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/runner/options.go`:
- Around line 312-314: Add a validation to reject using --auto-wildcard together
with --wildcard-domain: in the options validation block where
options.AutoWildcard is checked, also check options.WildcardDomain and call
gologger.Fatal().Msgf(...) if both are true, with a clear message like
"auto-wildcard and wildcard-domain are mutually exclusive"; reference the
options.AutoWildcard and options.WildcardDomain flags so the runner never allows
both (Runner.run executes manual wildcard post-processing and auto-wildcard
independently and will otherwise emit duplicate/inconsistent host output).

In `@internal/runner/runner.go`:
- Around line 559-594: The auto-wildcard filtering runs too late and only
post-processes already-emitted results; to fix, change the worker flow to buffer
outputs when r.options.AutoWildcard is true (same approach used for
options.WildcardDomain) so wildcard detection happens before any results are
written. Specifically, in worker() ensure startOutputWorker()/outputchan
buffering is enabled earlier when r.options.AutoWildcard is set, make the
auto-wildcard scan occur before closing outputchan, and have lookupAndOutput()
write into the buffered store (or reuse the same buffering mechanism) so
JSON/response-mode entries can be reconstructed and suppressed correctly; update
any related wait/close logic (wgoutputworker, close(outputchan)) to match the
wildcard-domain path.

In `@internal/runner/wildcard.go`:
- Around line 98-123: The wildcard detection currently only inspects DNSData.A
(in detectWildcardForDomain via r.dnsx.QueryOne and DNSData.A), causing failures
for non-A query types; update detectWildcardForDomain so that when auto-wildcard
is enabled it either (a) explicitly performs A-record lookups for the random
test subdomain and the root domain regardless of the user query type, or (b)
dynamically inspects the response field matching the requested record type
(e.g., DNSData.AAAA, DNSData.CNAME, DNSData.MX, etc.) instead of only DNSData.A;
change the comparisons and map construction (rootIPs and in.A iteration) to use
the selected record slice based on the active query type to correctly detect
wildcards for non-A queries.

---

Nitpick comments:
In `@internal/runner/wildcard.go`:
- Around line 11-13: The global registry autoWildcardDomains and its mutex
autoWildcardDomainsMutex leak state across Runner instances; move this state
into the Runner struct (e.g., add a wildcards/autoWildcardDomains field and its
mutex) and update AutoDetectWildcards(), any callers, and checks to use
r.autoWildcardDomains / r.autoWildcardDomainsMutex (or clear autoWildcardDomains
at the start of AutoDetectWildcards() if moving is impractical) so each Runner
has its own scoped wildcard registry and tests/runs no longer inherit prior
detections.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7f7e5963-bc4d-4dda-829a-f81cdb8b8649

📥 Commits

Reviewing files that changed from the base of the PR and between fe80b18 and 85f9e84.

📒 Files selected for processing (3)
  • internal/runner/options.go
  • internal/runner/runner.go
  • internal/runner/wildcard.go

Comment on lines +559 to +594
// Auto wildcard filtering - filter results from detected wildcard domains
if r.options.AutoWildcard && len(autoWildcardDomains) > 0 {
gologger.Print().Msgf("Starting to filter auto-detected wildcard domains\n")

// we need to restart output
r.startOutputWorker()

seen := make(map[string]struct{})
numRemovedSubdomains := 0

r.hm.Scan(func(k, v []byte) error {
host := string(k)
rootDomain := getRootDomain(host)

// Skip if this domain was detected as wildcard
if IsAutoWildcardDomain(rootDomain) {
if _, ok := seen[host]; !ok {
numRemovedSubdomains++
seen[host] = struct{}{}
}
return nil
}

// Output non-wildcard results
if _, ok := seen[host]; !ok {
seen[host] = struct{}{}
_ = r.lookupAndOutput(host)
}
return nil
})

close(r.outputchan)
// waiting output worker
r.wgoutputworker.Wait()
gologger.Print().Msgf("%d wildcard subdomains removed\n", numRemovedSubdomains)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

--auto-wildcard filters too late to affect the real output.

This block runs after Lines 475-476 have already closed the original output worker, so wildcard matches have already been printed/written once. Unlike the --wildcard-domain flow, worker() only buffers results when options.WildcardDomain != "", so --auto-wildcard never suppresses the first pass. The current behavior is an extra filtered pass appended to the unfiltered output, and lookupAndOutput() cannot reconstruct JSON/response-mode results because nothing was stored for this path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/runner/runner.go` around lines 559 - 594, The auto-wildcard
filtering runs too late and only post-processes already-emitted results; to fix,
change the worker flow to buffer outputs when r.options.AutoWildcard is true
(same approach used for options.WildcardDomain) so wildcard detection happens
before any results are written. Specifically, in worker() ensure
startOutputWorker()/outputchan buffering is enabled earlier when
r.options.AutoWildcard is set, make the auto-wildcard scan occur before closing
outputchan, and have lookupAndOutput() write into the buffered store (or reuse
the same buffering mechanism) so JSON/response-mode entries can be reconstructed
and suppressed correctly; update any related wait/close logic (wgoutputworker,
close(outputchan)) to match the wildcard-domain path.

@khozakhulile27-netizen
Copy link
Copy Markdown

Implemented randomized wildcard probing to improve the accuracy of DNS wildcard detection. By using a unique prefix for each probe, we can bypass potential DNS caching issues and more reliably identify wildcard records.
​Changes
​Modified internal/runner/runner.go to use time.Now().UnixNano() for generating dynamic probe prefixes.
​Updated the host appending logic to use the format: aw-[timestamp].[domain].
​Cleaned up unused variables to ensure strict Go compiler compliance.
​Reasoning
​Static wildcard checks (like using "FUZZ") can sometimes be cached by intermediate DNS resolvers. Using a high-resolution timestamp ensure that every probe is unique, forcing the resolver to provide an authoritative response.

@flaggdavid-source
Copy link
Copy Markdown
Author

Thanks for the suggestion! We're using xid.New().String() which generates a globally unique ID per call (not static), matching the pattern already used in IsWildcard() in the same file. This ensures consistency with the existing codebase while still producing unique probes that bypass DNS caching.

- Add mutual exclusion validation for --auto-wildcard and --wildcard-domain
- Buffer results when auto-wildcard is active (store DNS data during
  worker phase, output only in post-processing) to prevent double output
- Use Lookup() for wildcard detection to always check A records regardless
  of configured query types (fixes detection when using -aaaa, -cname, etc.)
- Scope wildcard registry to Runner struct instead of package-global state
  to prevent cross-run leakage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
internal/runner/wildcard.go (3)

144-152: Single-threaded detection may be slow for large, diverse input sets.

The sequential loop through domains (lines 145-152) could become a bottleneck if the input contains thousands of unique root domains. Each domain requires two DNS lookups.

Consider parallelizing this in a future iteration—similar to how wildcardWorker() goroutines handle the existing --wildcard-domain flow—if performance becomes an issue.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/runner/wildcard.go` around lines 144 - 152, The sequential loop that
runs detectWildcardForDomain for each domain can be parallelized to avoid being
a bottleneck; replace the single-threaded for-range in wildcard.go with a worker
pool: create a buffered channel of domains, spawn N goroutines (or use a
semaphore to limit concurrency) that read domains, call
r.detectWildcardForDomain(domain), and on true lock r.autoWildcardDomainsMutex
to write to r.autoWildcardDomains and log the detection; use a sync.WaitGroup to
wait for workers to finish and ensure any logs and map writes remain protected
exactly as in the current code; mirror the concurrency pattern used by
wildcardWorker() to choose a sensible default for N or make it configurable.

73-85: Known limitation acknowledged; consider documenting workaround.

The two-label extraction is a reasonable simplification for common TLDs. The inline comment correctly documents the limitation for multi-part TLDs (.co.uk, .com.au).

For users needing accurate extraction, consider adding a note in the CLI help or documentation that a public suffix list library (e.g., publicsuffix) could be integrated in the future.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/runner/wildcard.go` around lines 73 - 85, Update the
documentation/help text to explicitly note the limitation of getRootDomain's
two-label heuristic and recommend using a public suffix list for accurate
extraction; mention the getRootDomain function in internal/runner/wildcard.go
and add a short CLI/help message (or README entry) advising that multi-part TLDs
like .co.uk and .com.au are not handled correctly and that integrating a library
such as publicsuffix would be the recommended workaround.

95-101: Rate limiter not applied to wildcard detection lookups.

The Lookup() calls here (and in runner.go:757 for ASN checks) bypass r.limiter.Take(). This is consistent with existing code, but be aware that with --auto-wildcard enabled, 2 * number_of_unique_root_domains DNS queries occur before the rate-limited main resolution phase. For users with strict resolver rate limits, this could cause transient issues.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/runner/wildcard.go` around lines 95 - 101, The wildcard detection
and ASN pre-checks are bypassing the global rate limiter by calling
r.dnsx.Lookup(...) directly; add r.limiter.Take() immediately before each such
Lookup (e.g., the testHost and domain lookups in wildcard detection and the
ASN-related Lookup in the ASN check code path) so those pre-check queries are
subject to the same rate limiting as the main resolution phase; ensure you call
r.limiter.Take() before each Lookup and preserve existing error handling around
the Lookup calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/runner/wildcard.go`:
- Around line 144-152: The sequential loop that runs detectWildcardForDomain for
each domain can be parallelized to avoid being a bottleneck; replace the
single-threaded for-range in wildcard.go with a worker pool: create a buffered
channel of domains, spawn N goroutines (or use a semaphore to limit concurrency)
that read domains, call r.detectWildcardForDomain(domain), and on true lock
r.autoWildcardDomainsMutex to write to r.autoWildcardDomains and log the
detection; use a sync.WaitGroup to wait for workers to finish and ensure any
logs and map writes remain protected exactly as in the current code; mirror the
concurrency pattern used by wildcardWorker() to choose a sensible default for N
or make it configurable.
- Around line 73-85: Update the documentation/help text to explicitly note the
limitation of getRootDomain's two-label heuristic and recommend using a public
suffix list for accurate extraction; mention the getRootDomain function in
internal/runner/wildcard.go and add a short CLI/help message (or README entry)
advising that multi-part TLDs like .co.uk and .com.au are not handled correctly
and that integrating a library such as publicsuffix would be the recommended
workaround.
- Around line 95-101: The wildcard detection and ASN pre-checks are bypassing
the global rate limiter by calling r.dnsx.Lookup(...) directly; add
r.limiter.Take() immediately before each such Lookup (e.g., the testHost and
domain lookups in wildcard detection and the ASN-related Lookup in the ASN check
code path) so those pre-check queries are subject to the same rate limiting as
the main resolution phase; ensure you call r.limiter.Take() before each Lookup
and preserve existing error handling around the Lookup calls.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2c265ea2-0191-49be-aa26-d3b8dc6b3af9

📥 Commits

Reviewing files that changed from the base of the PR and between 85f9e84 and cce4de8.

📒 Files selected for processing (3)
  • internal/runner/options.go
  • internal/runner/runner.go
  • internal/runner/wildcard.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/runner/options.go

@Mzack9999
Copy link
Copy Markdown
Member

Thank you for the contribution! This PR has a clean scope with only 3 files, but it lacks test coverage for the new wildcard detection functionality. The issue has been resolved via #966, which includes a comprehensive test suite (364 lines covering multi-root filtering, AAAA, CNAME, threshold independence, and normalization) plus shared library extraction into projectdiscovery/utils. Closing in favor of that merged solution.

@Mzack9999 Mzack9999 closed this Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support auto wildcard detection similar to PureDNS

3 participants