Merged
Conversation
- Introduced a new category dictionary in JSON format to manage alert categories such as missing persons, wanted suspects, travel warnings, fraud alerts, cyber advisories, and public appeals. - Added curated agencies with relevant RSS feeds for missing persons and wanted suspects, including INTERPOL, FBI, and Europol. - Implemented a new source candidates structure for future source management. - Developed a FeedDirectory component to display alerts, source health, and statistics. - Created a custom hook to fetch and manage source health data. - Established a theme utility for consistent color management across the application. - Defined TypeScript interfaces for source health data structures to ensure type safety.
…ment categories New category types with theme colors and icons for expanded source coverage.
Routine domestic police operations (raids, drug busts, sentencing) are penalised -0.20 unless cross-border signals are present. Interpol fetcher uses polite paginated API (20/page, 2s delay).
Prevents flat top-authorities ranking where every source caps at 20. Cyber advisory sources are individually capped at 15 in the registry.
- Interpol Red/Yellow Notices (paginated API) - Humanitarian: ICRC, UNHCR, WHO, ReliefWeb, WFP, MSF - Conflict: ICG, SIPRI, NATO, UN SC, OSCE, African Union - Intelligence: CIA, MI5, GCHQ, BfV, BND, DGSI, AIVD, SÄPO, ASIO, CSIS - Health: ECDC, CDC, ProMED, WHO emergencies - Emergency: GDACS, FEMA, EU ERCC, USGS - Ukraine: CERT-UA, NSDC, SBU, National Police - Nordics: full coverage NO/DK/FI/IS/SE (police, intel, CERT, travel) - Eastern Europe: PL/CZ/HU/RO/SK/BG/RS/GE/MD police and intel - Middle East: Israel, UAE, Saudi, Jordan, Qatar CERTs - Africa: AU, Egypt, Morocco, Rwanda, Ethiopia, Senegal, Algeria - Asia: Taiwan, China, Vietnam, Pakistan, Bangladesh, Sri Lanka, Mongolia - Sanctions: OFAC, EU, UN, FATF, OpenSanctions - Financial fraud: FCA, BaFin, ESMA, SEC, FinCEN, FINMA + 6 EU regulators - Organized crime: OCCRP, UNODC, ENFAST, OLAF, EPPO, DIA, SFO, DEA, ATF - Europol updated to CMS API RSS, duplicate entry removed - Cyber advisory sources capped at max_items=15
Alerts table now has a companion alerts_fts virtual table with BM25 ranking. SearchAlerts method supports text queries, category/region/ status filters, and automatic FTS index rebuild on SaveAlerts.
Embedded HTTP server in collector process, enabled via --api flag. Supports ranked text search (BM25), category/region/status filters, auto prefix matching, and FTS5 syntax (quoted phrases, AND/OR/NOT). Caddy proxies /api/* to collector:3001 in Docker.
useSearch hook probes /api/health on mount, then sends debounced queries to /api/search. When API is unavailable, falls back to the existing in-memory string.includes() filter.
Interpol API requires Referer, Origin, and Sec-Fetch-* headers mimicking an XHR from www.interpol.int. Without these, Akamai returns 403. Verified: 6,455 Red + 11,271 Yellow notices accessible.
…US agencies - New fbi-wanted-json source type hitting api.fbi.gov public API - Parser extracts name, charges, nationality, aliases, reward, armed-dangerous - 4 FBI subcategories: wanted, ten-most-wanted, seeking-info, parental-kidnappings - Add fetch_mode: browser to DEA and ATF (blocked by stealth HTTP) - New dea-fugitives and usms-mostwanted browser-backed sources - Remove duplicate fbi-seeking and fbi-mostwanted registry entries
- 30-request burst, 5/sec refill, stale eviction after 10min - Returns 429 with Retry-After header when exceeded - Health endpoint exempt from rate limiting - Extracts client IP from X-Forwarded-For/X-Real-Ip - Makefile dev-stop/dev-restart now prune dangling images and build cache
- Always follow HTTP redirects for RSS/Atom feed fetches (302/307 are normal for feeds) instead of treating them as dead sources - Strip HTML and truncate to 2k chars before sending to Google Translate to prevent 413 on feeds with full-page markup in descriptions - Export StripHTML for cross-package use - Left panel: smaller severity numbers, remove duplicate Alerts row, larger Countries/Feeds display, clickable severity filter, zone stats, ">" prefix on capped authority counts, Middle East region - GlobeView: fix map disappearing on force-reload via ResizeObserver - Docker: merge registry into DB on every startup, bump MAX_PER_SOURCE to 40, add dev-sync-registry target, German AA followRedirects
This reverts commit b1fd3ef.
BSI NESAS is a product certification feed (NESAS audit/evaluation docs), not security advisories. Mark it as rejected in the registry and add promotion_status filtering to normalizeAll() so the JSON loader respects rejection status the same way the SQLite loader does.
FBI removed their RSS feeds; news/press releases are not available via the Wanted API either. Mark fbi-news as rejected. Fix FeedDirectory global overview to use source-health total instead of counting only sources that produced alerts.
Critical/High buttons now filter the map and alert feed globally instead of only affecting left-panel stats. Third box shows conflict monitoring count (ACLED etc.) and toggles that category filter. Remove the Clear button — click again to deselect.
Add country/region extraction from alert titles so international sources (Crisis Group, SIPRI, UN Press, AU Peace) pin to the actual conflict location instead of the org HQ. Uses rightmost-match heuristic with ~150 country centroids plus conflict sub-regions (Tigray, Donbas, Rakhine, etc.). Fix broken conflict feed URLs: SIPRI → /rss/combined.xml, UN SC → press.un.org/en/rss.xml. Reject NATO, OSCE, ACLED (no working feeds).
New sources: - UN Peacekeeping (Blue Helmets) — mission/deployment updates - UN OCHA — humanitarian crisis coordination - UN News Peace & Security — conflict/peacekeeping coverage - UN News Refugees & Migrants — displacement/migration intel - UN News Humanitarian Aid — aid operations and tasking - ICRC Humanitarian Law & Policy — IHL and conflict law - ICRC Field Operations — Israel/Gaza/West Bank ops reporting Rejected (feeds dead, no alternative): - ICRC News (404), UNHCR (403), WFP (403), ICRC Family Links (403)
OIJ Costa Rica: remove "oij" from include_keywords (matched every URL on the domain), drop root URL from feed_urls to avoid scraping navigation. Tighten keywords to actual missing person terms. NCMEC: their RSS emits titles like ": Name (State)" with a leading colon. Prefix with "Missing" so it reads "Missing: Name (State)".
When switching regions via header shortcuts or the scope dropdown, the active navigator group now resets so the first group in the new region is auto-selected instead of sticking to the old selection.
Replace API JSON URLs (ws-public.interpol.int/notices/v1/red/...) with human-readable web URLs (interpol.int/.../View-Red-Notices#ID). Override lat/lng from Interpol HQ (Lyon) to the person's nationality country centroid so notices pin to the correct location on the map.
…nation Interpol has ~6.4k red and ~4k yellow notices. Instead of fetching all at once, each run fetches a 320-notice window (2 pages × 160) and advances a persistent cursor. State reconciliation carries forward previously accumulated alerts for sources marked accumulate:true, building the full corpus over successive runs.
Add maxBounds, maxBoundsViscosity, minZoom and noWrap to stop vertical scrolling past the world edge and tile repetition.
Increase global/international zoom and minZoom from 2 to 3 to eliminate white gaps at map edges. Add German-language keywords to severity inference so CERT.AT/BSI advisories get correct severity levels.
Document that EUOSINT intentionally pulls only the newest 160 red and 160 yellow Interpol notices per run to avoid data overflow. Also covers severity classification, map tiles, and collector cycle behavior.
Cover all 17 alert categories with descriptions and example sources, severity classification rules, Interpol notice limits, map tiles, collector cycle, and region scoping.
Add 12 new sources: GDACS disasters, USGS earthquakes, NOAA oil/chem incidents, Smithsonian volcanoes, EMSA maritime, IAEA nuclear, WHO, ECDC epidemiological updates + risk assessments, CDC, WOAH zoonotic. Reject ECB press releases (general news, not fraud intel). Add severity keywords for outbreaks, natural disasters, and hazmat. Fraud sources will appear after next dev-restart (DB merge on startup).
…LM category vetting Source lifecycle is now fully automated without requiring restarts: - Every collector cycle merges the JSON registry into SQLite (new sources picked up, rejected status synced) - Dead sources (404, 403, DNS, TLS errors) are rejected in both SQLite AND the JSON registry, then written to the DLQ - Merge respects runtime rejections: if a source was killed at runtime, re-merging from JSON won't resurrect it - LLM vetting prompt now includes all 18 categories with descriptions so the model can assign the correct category for discovered sources - Verdict struct includes category field validated against the taxonomy
Add -v flag to docker-compose down so feed-data volume is removed, ensuring the entrypoint re-seeds the DB from the current JSON registry on next start.
Add all 18 category labels to LLM search discovery for replacement feed lookups. Remove dead rejectInJSONRegistry (wrote to Docker ephemeral FS). Add scripts/apply-dlq.py and Makefile dev-sync-dlq target for developer-side DLQ processing.
Add make dev-export-db to snapshot sources.db from a running collector. Entrypoint prefers sources.seed.db over cold init+import when available. The merge step still runs on every start to pick up JSON registry updates.
Tier 1: GeoNames cities500.txt (200k+ cities) imported into SQLite for fast city-name lookups. Text is scanned for place names and matched against the DB with population-based disambiguation. Tier 2: OSM Nominatim fallback for place names not in the local DB. Rate-limited (1 req/sec), in-memory cached, configurable base URL for self-hosted instances. Tier 3: Country-level text scanning now returns capital city coordinates instead of geographic centroids. Fixes island nations (Malta, Cyprus, Singapore, etc.) placing alerts in the sea. The Dockerfile downloads cities500.txt at build time (~30MB). The collector auto-imports it into SQLite on first run. All tiers are optional — the system degrades gracefully.
Gap analysis scans the active registry against 120+ target countries worldwide. Missing country+category combinations generate synthetic search candidates that feed into the existing LLM search + RSS probe pipeline. Covers Europe, Americas, Asia-Pacific, Middle East, Africa, Central Asia, and Caucasus. Also fixes Interpol notice URLs: 2026/5314 → 2026-5314 in fragment.
DuckDuckGo search via headless Chrome is now the first-citizen feed discovery method — zero API keys, zero tokens. The system searches for RSS/Atom feeds using gap-analysis targets, extracts result URLs from DDG HTML, and feeds them into the existing probe pipeline. LLM search only runs for targets that DDG didn't cover, saving tokens. Also expanded feed probe paths with government/ministry patterns (DOJ-style feeds, multi-language /de/feed /fr/feed etc.) and enabled browser + DDG in docker-compose.
Root cause: SERVICE wikibase:label and P279* subclass traversal cause persistent timeouts on the public Wikidata SPARQL endpoint. Fix both police.go and humanitarian.go to query one type ID at a time with P31 only, LIMIT 50, deriving names from hostnames and a static country map. Added 8 explicit subclass type IDs to compensate for removed P279*.
Links to streamingintelligence product page and contact page with UTM tracking parameters. Referrer enabled for analytics attribution.
3.2MB SQLite snapshot with curated feeds so new deployments start with full coverage immediately. Entrypoint copies it to /data/sources.db on first run; subsequent starts merge the JSON registry on top.
…titles Include keywords now match title only, not the URL — the feed URL path (e.g. /desaparecidos) was letting every link on the page pass the filter. Added junk title blocklist to reject navigation boilerplate (load more, cookie config, browser names) at parse time.
…rces New categories: maritime_security (7 sources incl. US Navy, CIMSEC, EU NAVFOR, MARAD) and legislative (7 sources incl. EU Parliament, EU Council, EEAS, US Congress, State Dept). Additional conflict monitoring sources: US DoD, NATO, OSCE, UN Security Council, ICG, German Foreign Office, France Diplomatie. All filtered with include_keywords to reduce noise. Gap analysis now discovers these categories automatically for all target countries.
Add non-OSINT term and host blocklists to discovery hygiene (education, world bank, social media, entertainment, etc). Purge orphan alerts from rejected/removed sources on each collection cycle. Explicitly reject worldbank-education-digital in registry.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR is a large end-to-end upgrade of EUOSINT across collector, discovery, source registry, API, UI, Docker runtime, and deployment operations.
Core platform changes
/api/searchendpoint.Discovery and hygiene
New intelligence domains
maritime_securitylegislativeconflict_monitoringenvironmental_disasterdisease_outbreakGeocoding and map quality
UI and UX
Docker, install, and operations
deploy/install.sh) with:preservevsfreshvolume reset),Why
Validation focus