Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
93626eb
feat: add category dictionary and curated agencies for alerts
novatechflow Mar 16, 2026
fa4fc48
feat(ui): tighten alert navigation and overview
novatechflow Mar 16, 2026
c211055
feat(collector): harden browser-backed source ingestion
novatechflow Mar 16, 2026
f136f60
chore(ci): remove alerts refresh workflow
novatechflow Mar 16, 2026
2097b5f
feat(discovery): improve feed search and wikidata hygiene
novatechflow Mar 16, 2026
4b452d4
chore(ui): refresh Scalytics OSINT branding
novatechflow Mar 16, 2026
f8f6b70
feat(ui): add health_emergency, intelligence_report, emergency_manage…
novatechflow Mar 16, 2026
c755264
feat(collector): add local crime downranking and Interpol pagination
novatechflow Mar 16, 2026
d04739d
feat(config): bump default max-per-source to 40
novatechflow Mar 16, 2026
bfdcf60
feat(registry): expand to 257 sources across 92 countries
novatechflow Mar 16, 2026
06c1ba4
feat(sourcedb): add FTS5 search index for alerts
novatechflow Mar 16, 2026
d8660fc
feat(api): add search API server with FTS5-backed /api/search
novatechflow Mar 16, 2026
2a393ba
feat(ui): wire search bar to FTS5 API with client-side fallback
novatechflow Mar 16, 2026
b76c46b
fix(interpol): add XHR headers to bypass Akamai WAF
novatechflow Mar 16, 2026
9150d26
feat(collector): add FBI Wanted API source type and browser mode for …
novatechflow Mar 16, 2026
8029b46
feat(api): add per-IP token bucket rate limiter on search endpoint
novatechflow Mar 16, 2026
b236b53
fix: follow redirects for feeds, strip HTML before translate, UI polish
novatechflow Mar 16, 2026
b1fd3ef
chore: add test-interpol CLI utility for probing HTML sources
novatechflow Mar 16, 2026
6f354bc
Revert "chore: add test-interpol CLI utility for probing HTML sources"
novatechflow Mar 16, 2026
2ca1167
chore: gitignore throwaway cmd/test-* utilities
novatechflow Mar 16, 2026
5d027d6
reject BSI NESAS source and filter rejected entries in JSON loader
novatechflow Mar 16, 2026
7b9569c
reject dead FBI News RSS feed, fix global feed count
novatechflow Mar 16, 2026
d58bc2a
lift severity filter to App, replace Active with Conflict
novatechflow Mar 16, 2026
dba2adc
geocode international alerts to crisis location, fix conflict feeds
novatechflow Mar 16, 2026
ea4b909
fix build: align SeverityFilter types, remove stale Activity reference
novatechflow Mar 16, 2026
05bafc5
add humanitarian and peacekeeping feeds, reject broken sources
novatechflow Mar 16, 2026
2861b95
fix OIJ nav scraping and NCMEC broken titles
novatechflow Mar 16, 2026
45e44bb
reset navigator selection on region or category change
novatechflow Mar 16, 2026
4f927bb
fix Interpol notice links and map placement
novatechflow Mar 16, 2026
3fb9b22
feat: incremental Interpol notice accumulation with cursor-based pagi…
novatechflow Mar 16, 2026
1078a48
fix: constrain map bounds to prevent duplicate earth views
novatechflow Mar 16, 2026
39abf97
fix: bump map min zoom to fill container, add German severity keywords
novatechflow Mar 16, 2026
2cf400c
docs: add user guide, cap Interpol notices at 160 per type
novatechflow Mar 16, 2026
3ca2ba5
docs: comprehensive user guide with all alert categories and regions
novatechflow Mar 16, 2026
67b3b01
feat: add environmental disaster and disease outbreak categories
novatechflow Mar 16, 2026
4fe5866
feat: live registry hygiene — auto-merge, auto-reject dead sources, L…
novatechflow Mar 16, 2026
0590286
fix: dev-stop and dev-restart now remove volumes for clean DB rebuild
novatechflow Mar 16, 2026
0952795
chore: update search topic labels, add DLQ sync script, clean up run.go
novatechflow Mar 16, 2026
99fe58f
chore: reject European Schoolnet — educational, not intelligence
novatechflow Mar 16, 2026
e223283
feat: pre-seeded DB distribution for new installs
novatechflow Mar 16, 2026
0e7a624
feat: 3-tier geocoding pipeline — city DB, Nominatim, capital coords
novatechflow Mar 17, 2026
c42dad2
feat: auto-discover missing country feeds via gap analysis
novatechflow Mar 17, 2026
ebbd5be
feat: DDG headless browser search as primary discovery, LLM as fallback
novatechflow Mar 17, 2026
e86fede
fix: Wikidata SPARQL timeouts — single type ID queries, no labels
novatechflow Mar 17, 2026
10dc4ed
feat: add Scalytics product and contact links to footer
novatechflow Mar 17, 2026
f487cc6
chore: add pre-seeded sources.db for fresh installs
novatechflow Mar 17, 2026
d8829f2
fix: HTML scraper false positives from URL keyword matching and junk …
novatechflow Mar 17, 2026
bc173ca
feat: add maritime security, legislative, and conflict monitoring sou…
novatechflow Mar 17, 2026
f723bdc
chore: reject European Schoolnet — educational, not intelligence
novatechflow Mar 17, 2026
eaff169
fix: update terminology from "feeds" to "streams" for consistency
novatechflow Mar 17, 2026
ba90bef
Reject tasking noise
novatechflow Mar 17, 2026
a7bd4e5
Fix feed transport and filters
novatechflow Mar 17, 2026
1835910
Harden vetting and fetch fallback
novatechflow Mar 17, 2026
7739512
Tighten geocoding precision
novatechflow Mar 17, 2026
178a215
Improve install flow and clean bootstrap
novatechflow Mar 17, 2026
33528f3
Harden Docker builds and Chromium install
novatechflow Mar 17, 2026
bdfb64b
Fix Header lint errors
novatechflow Mar 17, 2026
aa953a9
Run gofmt on collector and sourcedb files
novatechflow Mar 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,29 @@
EUOSINT_SITE_ADDRESS=:80
EUOSINT_HTTP_PORT=8080
EUOSINT_HTTPS_PORT=8443
EUOSINT_WEB_IMAGE=ghcr.io/scalytics/euosint-web:latest
EUOSINT_COLLECTOR_IMAGE=ghcr.io/scalytics/euosint-collector:latest
HTTP_TIMEOUT_MS=60000
BROWSER_ENABLED=true
BROWSER_TIMEOUT_MS=60000

# Candidate crawler intake and dead-letter queue.
CANDIDATE_QUEUE_PATH=registry/source_candidates.json
REPLACEMENT_QUEUE_PATH=registry/source_dead_letter.json
SEARCH_DISCOVERY_ENABLED=false
SEARCH_DISCOVERY_MAX_TARGETS=4
SEARCH_DISCOVERY_MAX_URLS_PER_TARGET=3

# Source vetting agent.
SOURCE_VETTING_ENABLED=false
SOURCE_VETTING_PROVIDER=xai
SOURCE_VETTING_BASE_URL=https://api.x.ai/v1
SOURCE_VETTING_API_KEY=
SOURCE_VETTING_MODEL=grok-4-1-fast
SOURCE_VETTING_TEMPERATURE=0
SOURCE_VETTING_MAX_SAMPLE_ITEMS=6

# Alert-level LLM gate: yes/no + English translation + category id.
ALERT_LLM_ENABLED=false
ALERT_LLM_MODEL=grok-4-1-fast
ALERT_LLM_MAX_ITEMS_PER_SOURCE=4
53 changes: 0 additions & 53 deletions .github/workflows/alerts-feed.yml

This file was deleted.

20 changes: 17 additions & 3 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,13 +56,27 @@ jobs:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Build image
- name: Build image (attempt 1)
id: build_image_1
continue-on-error: true
uses: docker/build-push-action@v6
with:
context: .
file: ./${{ matrix.image.dockerfile }}
push: false
load: false
provenance: false
cache-from: type=gha
cache-to: type=gha,mode=max
cache-from: type=gha,scope=docker-${{ matrix.image.name }}
cache-to: type=gha,mode=max,scope=docker-${{ matrix.image.name }}

- name: Build image (attempt 2 on transient failure)
if: steps.build_image_1.outcome == 'failure'
uses: docker/build-push-action@v6
with:
context: .
file: ./${{ matrix.image.dockerfile }}
push: false
load: false
provenance: false
cache-from: type=gha,scope=docker-${{ matrix.image.name }}
cache-to: type=gha,mode=max,scope=docker-${{ matrix.image.name }}
21 changes: 18 additions & 3 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,9 @@ jobs:
type=sha
type=raw,value=latest

- name: Build and push image
- name: Build and push image (attempt 1)
id: build_push_1
continue-on-error: true
uses: docker/build-push-action@v6
with:
context: .
Expand All @@ -80,8 +82,21 @@ jobs:
provenance: false
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
cache-from: type=gha,scope=release-${{ matrix.image.name }}
cache-to: type=gha,mode=max,scope=release-${{ matrix.image.name }}

- name: Build and push image (attempt 2 on transient failure)
if: steps.build_push_1.outcome == 'failure'
uses: docker/build-push-action@v6
with:
context: .
file: ./${{ matrix.image.dockerfile }}
push: true
provenance: false
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha,scope=release-${{ matrix.image.name }}
cache-to: type=gha,mode=max,scope=release-${{ matrix.image.name }}

- name: Publish GitHub release
if: matrix.image.name == 'web'
Expand Down
67 changes: 67 additions & 0 deletions .github/workflows/source-discovery.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Copyright 2026 ff, Scalytics, Inc. - https://www.scalytics.io
# SPDX-License-Identifier: Apache-2.0

name: Source Discovery

on:
schedule:
- cron: "0 6 * * 1" # Every Monday at 06:00 UTC
workflow_dispatch:

permissions:
contents: write
pull-requests: write

jobs:
discover:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: actions/setup-go@v5
with:
go-version-file: go.mod

- name: Run source discovery
run: go run ./cmd/euosint-collector --discover --discover-output discover-results.json

- name: Check for new candidates
id: check
run: |
count=$(jq '.new_candidate_count' discover-results.json)
echo "count=$count" >> "$GITHUB_OUTPUT"
if [ "$count" -eq 0 ]; then
echo "No new source candidates found."
else
echo "Found $count new source candidates."
fi

- name: Create PR with results
if: steps.check.outputs.count != '0'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
branch="discover/new-sources-$(date +%Y%m%d)"
git checkout -b "$branch"
git add discover-results.json
git commit -m "chore: add ${{ steps.check.outputs.count }} discovered source candidates

Automated discovery via FIRST.org CSIRT team directory.

Co-Authored-By: github-actions[bot] <github-actions[bot]@users.noreply.github.com>"
git push origin "$branch"
gh pr create \
--title "Add ${{ steps.check.outputs.count }} discovered OSINT source candidates" \
--body "## Source Discovery Results

Found **${{ steps.check.outputs.count }}** new feed candidates via automated discovery.

Review \`discover-results.json\` and promote worthy candidates to \`registry/source_registry.json\`.

## How to review
- Check each feed URL is reachable and returns valid RSS/Atom
- Verify the organization is a legitimate CSIRT or security authority
- Add geographic coordinates and reporting metadata before merging

🤖 Generated by weekly source discovery workflow" \
--label "discovery"
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ dist-ssr/
coverage/
.tmp/
/euosint-collector
cmd/test-*/

# Runtime logs
logs/
Expand Down Expand Up @@ -34,6 +35,9 @@ docker-compose.override.yml
*.sln
*.sw?

# GeoNames dataset (downloaded at build time, ~30MB)
registry/cities500.txt

# Tool caches
.eslintcache
.npm/
Expand Down
28 changes: 25 additions & 3 deletions Dockerfile.collector
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,45 @@ FROM golang:1.25-alpine AS build

WORKDIR /app

COPY go.mod ./
COPY go.mod go.sum ./
RUN go mod download
COPY cmd ./cmd
COPY internal ./internal
COPY registry ./registry
COPY public ./public

RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o /out/euosint-collector ./cmd/euosint-collector

FROM alpine:3.20 AS geonames
RUN apk add --no-cache curl unzip
RUN curl -sL https://download.geonames.org/export/dump/cities500.zip -o /tmp/cities500.zip \
&& unzip /tmp/cities500.zip -d /tmp \
&& rm /tmp/cities500.zip

FROM alpine:3.20

RUN apk add --no-cache ca-certificates
RUN set -eux; \
apk add --no-cache ca-certificates; \
i=0; \
until [ "$i" -ge 3 ]; do \
if apk add --no-cache chromium; then \
break; \
fi; \
i=$((i + 1)); \
if [ "$i" -ge 3 ]; then \
echo "ERROR: failed to install chromium after 3 attempts" >&2; \
exit 1; \
fi; \
echo "WARN: chromium install failed, retrying in 5s..." >&2; \
sleep 5; \
done
ENV CHROME_PATH=/usr/bin/chromium-browser

WORKDIR /app

COPY --from=build /out/euosint-collector /usr/local/bin/euosint-collector
COPY --from=geonames /tmp/cities500.txt ./registry/cities500.txt
COPY registry ./registry
COPY public ./public-defaults
COPY docker/collector-entrypoint.sh /usr/local/bin/collector-entrypoint.sh

RUN chmod +x /usr/local/bin/collector-entrypoint.sh
Expand Down
29 changes: 24 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -127,15 +127,34 @@ dev-start: ## Start the local HTTP dev stack on localhost
@echo "EUOSINT available at http://localhost:$${EUOSINT_HTTP_PORT:-8080}"
@open "http://localhost:$${EUOSINT_HTTP_PORT:-8080}"

dev-stop: ## Stop the local dev stack
$(DOCKER_COMPOSE) down --remove-orphans

dev-restart: ## Restart the local dev stack
$(DOCKER_COMPOSE) down --remove-orphans
dev-stop: ## Stop the local dev stack, remove feed-data volume and prune images
$(DOCKER_COMPOSE) down --remove-orphans -v
@docker image prune -f --filter "label=com.docker.compose.project" >/dev/null 2>&1 || true
@docker builder prune -f >/dev/null 2>&1 || true

dev-restart: ## Restart the local dev stack (removes volumes, rebuilds from scratch)
$(DOCKER_COMPOSE) down --remove-orphans -v
@docker image prune -f --filter "label=com.docker.compose.project" >/dev/null 2>&1 || true
@docker builder prune -f >/dev/null 2>&1 || true
$(DOCKER_COMPOSE) up --build -d
@echo "EUOSINT available at http://localhost:$${EUOSINT_HTTP_PORT:-8080}"
@open "http://localhost:$${EUOSINT_HTTP_PORT:-8080}"

dev-sync-registry: ## Merge source_registry.json into the running DB (adds new feeds)
$(DOCKER_COMPOSE) exec collector euosint-collector --source-db /data/sources.db --curated-seed /app/registry/source_registry.json --source-db-merge-registry

dev-export-db: ## Export seeded sources.db from running container for distribution
@mkdir -p registry
@docker cp euosint-collector-1:/data/sources.db registry/sources.seed.db 2>/dev/null && \
echo "Exported registry/sources.seed.db ($$(wc -c < registry/sources.seed.db | tr -d ' ') bytes)" || \
echo "Container not running or no DB found"

dev-sync-dlq: ## Copy the dead-letter queue from the running container to update the local JSON registry
@docker cp euosint-collector-1:/data/source_dead_letter.json .tmp/dlq.json 2>/dev/null && \
python3 scripts/apply-dlq.py registry/source_registry.json .tmp/dlq.json && \
echo "DLQ applied — review changes with: git diff registry/source_registry.json" || \
echo "No DLQ data or container not running"

dev-logs: ## Tail local dev stack logs
$(DOCKER_COMPOSE) logs -f --tail=200

Expand Down
18 changes: 17 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,26 @@ make dev-restart
make dev-logs
```

## Remote Install (wget bootstrap)

```bash
wget -qO- https://raw.githubusercontent.com/scalytics/EUOSINT/main/deploy/install.sh | bash
```

The installer will:
- verify Docker + Compose availability
- clone or update the repo on the host
- set GHCR runtime images (`ghcr.io/scalytics/euosint-web` + `ghcr.io/scalytics/euosint-collector`)
- prompt for install mode (`preserve` or `fresh` volume reset)
- prompt for domain (`EUOSINT_SITE_ADDRESS`)
- when domain mode is enabled, optionally check `ufw`/`firewalld` and validate local 80/443 availability
- prompt for key runtime flags (browser + LLM vetting settings)
- optionally run `docker compose pull` and start with `--no-build`

- The release pipeline now builds two images: a web image and a Go collector image.
- The scheduled feed refresh workflow now runs the Go collector.
- The web image now uses Caddy instead of nginx, with the collector output mounted into the web container at runtime.
- In Docker dev mode, the collector seeds the shared feed volume with the repository snapshots first, then replaces them with live output on the first successful run.
- In Docker dev mode, the collector initializes empty JSON outputs on a fresh volume and then writes live output on the first successful run.

## Run Locally Without Docker

Expand Down
Loading
Loading