deployer-repo - distributed internet scanner (scanner + provisioning)

This is the deployer: the scanner code and provisioning that gets installed on your Debian VMs. Results are pushed to a separate intel-repo (see its README).

Three VMs each scan one third of routable IPv4 with masscan, fingerprint services with zgrab2 / JARM / zdns, and push results to the intel repo. Load is split with masscan's native --shards i/3 (disjoint, no overlap, no gaps).

Install the scanner on a VM (one command)

read -rp "Intel repo URL: " REPO; \
read -rsp "Token for that repo: " GH; echo; \
read -rp "Shard index (0/1/2): " IDX; \
[ -n "$GH" ] && [ -n "$REPO" ] && \
export NS_REPO="$REPO" NS_GIT_TOKEN="$GH" NS_SHARD_INDEX="$IDX" \
       NS_SHARD_TOTAL=3 NS_NODE_NAME="$(hostname)" NS_NONINTERACTIVE=1 \
       NS_INSTALL_REPO="https://github.com/ScryerNet/Scryer-deployer.git" && \
curl -fsSL https://raw.githubusercontent.com/ScryerNet/Scryer-deployer/main/install.sh \
  | sudo -E bash; \
unset GH REPO IDX NS_REPO NS_GIT_TOKEN NS_SHARD_INDEX NS_SHARD_TOTAL NS_NODE_NAME NS_NONINTERACTIVE NS_INSTALL_REPO

Run this on each Debian VM (replace <you> / repo names):

curl -fsSL https://raw.githubusercontent.com/<you>/deployer-repo/main/bootstrap.sh | bash

It clones this repo to ~/scanner-repo and installs masscan, zgrab2, jarm, zdns. Then launch that VM's shard (0, 1, or 2), pointing it at your intel repo:

cd ~/scanner-repo \
  && SHARD_INDEX=0 SHARD_TOTAL=3 \
     GIT_REPO_URL=https://github.com/<you>/intel-repo.git \
     GITHUB_TOKEN=github_pat_xxx \
     nohup bash scanner/scan.sh > ~/scan.log 2>&1 &

Or drive all three VMs from your laptop with deployer/deploy.sh (see below).

Production install (systemd service + timer)

bootstrap.sh is the quick path. For a hardened, self-managing node use install.sh, which sets up the scanner as a sandboxed, least-privilege systemd service plus a timer that re-scans on a schedule and pushes results automatically:

sudo NS_NONINTERACTIVE=1 \
     NS_REPO=https://github.com/<you>/intel-repo.git \
     NS_GIT_TOKEN=github_pat_xxx \
     NS_SHARD_INDEX=0 NS_SHARD_TOTAL=3 \
     bash install.sh

It runs pre-flight checks, installs the toolchain system-wide, isolates the token in a 0640 root:netscan env file, and runs the scanner as the unprivileged netscan user with only CAP_NET_RAW (no root) under ProtectSystem=strict sandboxing. The service and units are named plainly (netscan) on purpose — an authorized scanner should be attributable, not hidden. Remove everything with sudo bash uninstall.sh. See install.sh -h for all NS_* options.

How the load is split

You do not need a custom partitioner. masscan has built-in sharding: every VM runs the identical command with a different --shards i/3, and masscan deterministically (via a cyclic multiplicative group over the address space) gives each VM a disjoint, evenly-interleaved third. No overlap, no gaps.

VM0:  masscan ... --shards 0/3
VM1:  masscan ... --shards 1/3
VM2:  masscan ... --shards 2/3

Architecture

        GitHub (this repo)  ── deployer/ orchestrates ──┐
                                                         ▼
   ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
   │     VM0      │     │     VM1      │     │     VM2      │
   │ shard 0/3    │     │ shard 1/3    │     │ shard 2/3    │
   │ masscan→zgrab│     │ masscan→zgrab│     │ masscan→zgrab│
   │   →jarm→zdns │     │   →jarm→zdns │     │   →jarm→zdns │
   └──────┬───────┘     └──────┬───────┘     └──────┬───────┘
          │   compressed shard results (NDJSON.gz)  │
   └──────────────┬───────────────┬──────────┘
                  │  each VM git-pushes its own results/vm<i>/  │
                  ▼               ▼              ▼
        Central GitHub repo (single branch: main)
          results/vm0/  results/vm1/  results/vm2/
          ├─ shardN_<stamp>.json      (small summary)
          └─ raw/<stamp>/*.gz.partNNN (compressed raw, chunked <90MB)
                  │
                  ▼  on push, GitHub Action runs
          aggregator/aggregate.py → reports/latest.{json,md}

Everything lives on GitHub. Each VM commits its summary plus its compressed raw output (split into <90 MB chunks to clear the 100 MB file limit) into its own results/vm<i>/ directory and pushes to main. Pushes are batched to stay under the 2 GB per-push limit. Because each VM owns a separate directory, concurrent pushes never conflict on content — only on the ref, which a fetch/rebase/retry loop handles. Reassemble raw locally with ./deploy.sh reassemble.

Note: git keeps every pushed blob in history, so the repo grows with each scan and will eventually pass GitHub's ~5 GB soft cap. When that day comes your only in-GitHub options are squashing history or Git LFS; until then, plain git is fine.

Quick start

Fork this repo. Put your three VMs in deployer/inventory.yaml (copy the example).
From your laptop: cd deployer && ./deploy.sh provision (installs the toolchain)
Configure the scan in scanner/config.env (ports, rate, what to fingerprint).

Create a fine-grained PAT (Contents: Read and write on the repo) and export it:

export GITHUB_TOKEN=github_pat_xxx
export GIT_REPO_URL=https://github.com/<you>/<repo>.git

Launch: ./deploy.sh scan — each VM gets its shard index and pushes its own results/vm<i>/ (summary + chunked raw) straight to main.
The GitHub Action in .github/workflows/aggregate.yml runs aggregate.py on each push → reports/latest.{json,md}.
To read raw later: pull the repo and run ./deploy.sh reassemble.

Speed vs. survival (read this)

masscan rate is set in scanner/config.env as MASSCAN_RATE (packets/sec).

Rate (pps)	Full IPv4, one port, split 3 ways	Risk
10,000	~1.7 days / VM	low
50,000	~8 hours / VM	moderate — expect complaints
150,000+	~3 hours / VM	high — providers may suspend

"Really fast" and "doesn't get your account terminated" are in tension. Start at 25k–50k, watch your provider's reaction, scale up only if you own/authorized the upstream. The default is 25,000.

Responsible scanning (non-negotiable defaults — keep them)

These are not red tape; they are why ZMap/Censys can scan continuously without being shut down, and in several jurisdictions they're what keeps high-volume scanning on the right side of CFAA-style "unauthorized access" lines.

scanner/blocklist.conf excludes reserved/private ranges and an opt-out list.
Set PTR records for your three scanning IPs to a hostname that resolves to a page explaining the research and offering opt-out (coordinator/scan-info-page.html).
Put a real abuse@ / research contact in scanner/config.env (ABUSE_CONTACT).
Honor every opt-out request by adding the network to blocklist.conf immediately.
Only scan what you're authorized to. "Authorized" for the whole internet is a real legal question — consult the rules for your provider and jurisdiction before a full sweep. Targeted scans of infrastructure you own/control need none of this debate.

See coordinator/RESPONSIBLE_SCANNING.md for the full checklist.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

deployer-repo - distributed internet scanner (scanner + provisioning)

Install the scanner on a VM (one command)

Production install (systemd service + timer)

How the load is split

Architecture

Quick start

Speed vs. survival (read this)

Responsible scanning (non-negotiable defaults — keep them)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
coordinator		coordinator
deployer		deployer
scanner		scanner
.gitignore		.gitignore
README.md		README.md
bootstrap.sh		bootstrap.sh
install.sh		install.sh
uninstall.sh		uninstall.sh

Folders and files

Latest commit

History

Repository files navigation

deployer-repo - distributed internet scanner (scanner + provisioning)

Install the scanner on a VM (one command)

Production install (systemd service + timer)

How the load is split

Architecture

Quick start

Speed vs. survival (read this)

Responsible scanning (non-negotiable defaults — keep them)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages