Skip to content

noptrix/httpgrep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Description

A fast, asynchronous Python tool that scans HTTP(S) servers and greps for strings or regex patterns in HTTP response bodies and headers.

It takes single hosts, URLs, CIDR ranges, IP ranges or files; scans multiple ports per target (single, comma-lists or ranges, auto-detecting TLS vs plain per port); can pull and scan name-based (v)hosts straight from TLS certificates; streams matches live to the terminal; and can write results to text, CSV or JSONL log files.

It is built for large scans: an async core drives thousands of concurrent connections, a TCP preflight skips dead ports cheaply, per-host and global timeouts keep it from hanging on slow/dead targets, and an interrupted run can be resumed.

Requirements

  • Python 3.11+ on a POSIX system (Linux, *BSD, macOS - uses termios and asyncio's Unix signal handling)
  • httpx - pip install -r requirements.txt (or pip install httpx)
  • optional, auto-used if present: uvloop (faster event loop), aiodns (non-blocking dns for -r), httpx[socks] / socksio (SOCKS proxies)

httpgrep is a single self-contained script - just run ./httpgrep.py.

Usage

$ httpgrep -H
    __    __  __
   / /_  / /_/ /_____  ____ _________  ____
  / __ \/ __/ __/ __ \/ __ `/ ___/ _ \/ __ \
 / / / / /_/ /_/ /_/ / /_/ / /  /  __/ /_/ /
/_/ /_/\__/\__/ .___/\__, /_/   \___/ .___/
             /_/    /____/         /_/

     --== [ by nullsecurity.net ] ==--

usage

  httpgrep -h <args> -s <arg> [opts] | <misc>

target options

  -h <hosts|file>   - single host/url or host-/cidr-range or file containing
                      hosts or file containing URLs, e.g.: foobar.net,
                      192.168.0.1-192.168.0.254, 192.168.0.0/24, /tmp/hosts.txt
                      NOTE: hosts can also contain ':<ports>' on cmdline or in
                      file, where <ports> is a single port, comma-list or
                      range, e.g.: foo.net:8080, foo.net:80,443, 10.0.0.1:1-1024
  -p <ports|file>   - port(s) to connect to: single port, comma-separated list,
                      range, or a file with one spec per line, e.g.: 80,
                      80,443,8080, 8000-8100, /tmp/ports.txt
                      (default: 80, or 443 when -t is given)
  -t                - force TLS/SSL on all ports. by default the scheme is
                      auto-detected per port (plain http, switching to TLS if
                      the port speaks it)
  -u <URI|file>     - URI or comma-separated URIs or file with URIs (one per
                      line) to search given strings in, e.g.: /foobar/,
                      /foo.html, /admin,/login, /tmp/paths.txt (default: /)
  -r                - show the reverse-dns (PTR) name of scanned IPv4s as a
                      label; the ip stays the scan target (no scope drift)

http options

  -X <method>       - HTTP request method to use, any case (default: get).
                      use '?' to list available methods.
  -a <user:pass>    - http auth credentials (format: 'user:pass')
  -U <UA>           - set custom User-Agent (default: latest ms edge, windows)
  -A                - use random user-agent per request
  -R <headers>      - set custom headers (format: 'foo=bar;lol=lulz;...')
  -C <cookies>      - set cookies (format: 'foo=bar;lol=lulz;...')
  -F                - don't follow HTTP redirects
  -L <num>          - max redirects to follow (default: 10; ignored with -F)
  -E                - verify TLS/SSL certificates (default: no verification)
  -P <proxy>        - use proxy (format: '[http|https|socks4|socks5]://host:port')
                      (socks needs the 'httpx[socks]' / socksio package)
  -f <codes>        - only report responses with given HTTP status codes,
                      e.g.: '200', '200,301,302'
  -e <codes>        - exclude responses with given HTTP status codes,
                      e.g.: '404', '403,404,500'
  -2                - try HTTP/2 (ALPN-negotiated on TLS, falls back to 1.1;
                      plain http stays 1.1). needs the 'h2' package

search options

  -s <str|file>     - a single string/regex or multile strings/regex in a file
                      to find in given URIs and HTTP response headers,
                      e.g.: 'tomcat 8', '/tmp/igot0daysforthese.txt'
  -S <str|file>     - invert (grep -v): drop ALL matches of a response if this
                      string/regex (or file) appears anywhere in its body or
                      headers, e.g. to filter out dynamic error / 404 pages
  -w <where>        - search strings in given places (default: headers,body)
  -b <bytes>        - num bytes of context to show from a body match
                      (default: 64)
  -m <size>         - max body to read + search; suffix b/kb/mb, no suffix = kb,
                      e.g.: 512, 1mb, 262144b (default: 256kb)
  -i                - use case-insensitive search
  -I                - use case-insensitive invert (for -S)

scan options

  -x <num>          - max concurrent connections (async; default: 300). raise
                      ulimit -n accordingly for very high values
  -c <seconds>      - per-host read timeout in seconds, also caps body read
                      time. the tcp preflight is capped at 2s regardless, so
                      filtered/dead hosts free their slot fast (default: 3.0)
  -G <seconds>      - global timeout: hard-stop the whole scan after N seconds
                      (safety net against any hang; default: none)
  -y <num>          - retry a failed probe up to <num> times (default: 0).
                      helps with flaky hosts at scale; keep it small
  -1                - once a host has a match, skip its not-yet-started probes
                      (best-effort; in-flight requests still finish, so under
                      high -x you may still see a few matches per host)
  -z <size>         - scan targets in random order within a memory-bounded
                      window of <size> ram (suffix b/kb/mb/gb), e.g.: -z 1gb.
                      keeps huge ranges/files from exhausting memory
  -Z <num>          - cap the -z window at <num> targets (default 2000000,
                      ~267mb at ~140 bytes each). more = wider mixing on huge
                      ranges, at the cost of ram and start-up buffering
  -W                - save/resume: on ctrl+c write progress to httpgrep.session;
                      rerun with -W to resume from it (else start fresh)
  -T <0|1>          - pull (v)hosts from the TLS cert (CN + SAN) and scan them.
                      0 = in-scope only (via host header on the scanned IP);
                      1 = also scan each vhost by name (dns-resolved, MAY LEAVE
                      the scanned scope). needs TLS (-t or a *443 port).

output options

  -l <file>         - log found matches to <file>.<fmt> per chosen -O format
                      (e.g. -l out -O csv,jsonl => out.csv, out.jsonl)
  -O <formats>      - log file format(s), comma-list of: txt, csv, jsonl
                      (default: txt; use '?' to list). terminal output always
                      stays human-readable.
  -v                - verbose: print each url as it gets scanned
  -7                - escape non-ASCII in terminal output to \xNN, so a hostile
                      response body can't corrupt your terminal (logs stay raw)

misc options

  -H                - print help
  -V                - print version information

examples

  # grep for 'apache' in headers and body of a single host
  $ httpgrep -h foobar.net -s apache

  # scan a CIDR range on port 8080, search for 'tomcat' in body only
  $ httpgrep -h 192.168.0.0/24 -p 8080 -s tomcat -w body

  # scan a host across multiple ports and a port range for 'jenkins'
  $ httpgrep -h 192.168.0.10 -p 80,443,8080,8000-8100 -s jenkins -i

  # scan host list, search string file, log matches (-> /tmp/out.txt)
  $ httpgrep -h /tmp/hosts.txt -s /tmp/strings.txt -x 200 -l /tmp/out

  # grep for 'admin' case-insensitively across multiple URIs via TLS
  $ httpgrep -h foobar.net -t -u /admin,/login,/dashboard -s admin -i

  # scan IP range, reverse DNS, only report 200 responses
  $ httpgrep -h 10.0.0.1-10.0.0.254 -s 'powered by' -r -f 200

  # search headers only, don't follow redirects, verbose output
  $ httpgrep -h foobar.net -s 'X-Powered-By' -w headers -F -v

  # grep for 'admin', but drop dynamic error pages (invert, case-insensitive)
  $ httpgrep -h 192.168.0.0/24 -s admin -i -S 'error|not found' -I

  # route through proxy, custom UA, search for version strings
  $ httpgrep -h /tmp/hosts.txt -s 'nginx/1\.' -P http://127.0.0.1:8080 -U 'curl/8.0'

  # big resumable scan: ctrl+c saves state, rerun with -W to continue; also
  # cap the whole run at 1 hour as a hang safety net
  $ httpgrep -h 10.0.0.0/16 -p 80,443 -s admin -W -G 3600

Output

Matches are printed live, one per line:

[*] <url> | [vhost] | <status> | <type> | <match>
  • <url> - the scanned URL (scheme://host:port/uri).
  • <vhost> - present with -T (the cert (v)host tried via the Host header) or -r (the PTR name of the scanned ip); empty for direct scans.
  • <status> - the HTTP response status code (after redirects).
  • <type> - body or header.
  • <match> - body hit: a short repr'd window from the match (-b bytes); header hit: name: value.

The terminal always shows this human-readable form. With -l <base> the same matches are mirrored to <base>.<fmt> for each -O format - txt (these lines), csv (url,vhost,status,type,match rows with a header), jsonl (one JSON object per match).

Author

noptrix

Notes

  • quick'n'dirty code
  • httpgrep is already packaged and available for BlackArch Linux
  • My master-branches are always stable; dev-branches are created for current work.
  • All of my public stuff you find are officially announced and published via nullsecurity.net.

License

Check docs/LICENSE.

Disclaimer

We hereby emphasize, that the hacking related stuff found on nullsecurity.net are only for education purposes. We are not responsible for any damages. You are responsible for your own actions.

About

Async HTTP(S) scanner that greps response bodies and headers for strings or regex across hosts, ports, CIDR/ranges and TLS-cert vhosts.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages