Skip to content

Pr/slop#2117

Open
moshloop wants to merge 22 commits intomainfrom
pr/slop
Open

Pr/slop#2117
moshloop wants to merge 22 commits intomainfrom
pr/slop

Conversation

@moshloop
Copy link
Copy Markdown
Member

@moshloop moshloop commented Apr 17, 2026

Summary by CodeRabbit

  • New Features

    • Added a real-time scrape results dashboard accessible via the new ui command for browsing configurations, relationships, and diagnostics.
    • Added snapshot diffing to visualize state changes between scrape runs.
    • Enhanced Azure DevOps scraper to support Pipeline and Release permissions in addition to Git.
    • Extended Playwright scraper with login configuration, headless mode, and query export options.
  • Documentation

    • Added comprehensive runtime properties reference in PROPERTIES.md.

moshloop added 22 commits April 17, 2026 10:18
Introduces support for Playwright-based browser automation to enable web scraping, screenshot capture, and federation login workflows. Includes browser instance management via chromedp, AWS federated login support, and HAR recording integration.
…nd tracing options

Refactors Playwright scraper config into separate types for login providers, audit log, and tracing. Adds support for OutputMode, headless mode, timeout, and browser login. Fixes artifact linking to use resolved change IDs from DB and improves deepcopy generation for nested types.
Adds dedicated CI job for end-to-end Playwright tests with bun and Chromium. Migrates browser tests from unit test framework (ginkgo) to e2e test suite, adding parseOutput validation tests and interactive browser scenarios.
The bun package manager may not be available in the PATH by default on CI runners. Explicitly add $HOME/.bun/bin to PATH before running playwright installation and tests to ensure the bundled bun binary can be found.
Use GITHUB_PATH to set bun binary location instead of exporting PATH in shell, which is more reliable across workflow steps. Removes redundant PATH exports and simplifies test command.
Adds a new `--ui` flag to the run command that launches an interactive browser dashboard. The dashboard provides real-time progress tracking with config browsing, logs, HTTP traces, and access/relationship visualization. Includes a complete Preact-based frontend with filtering, split-pane layout, and detailed config inspection. Also adds a new `ui` command to view saved JSON results in the dashboard.
… permission resolution

Adds descriptor format conversion functions (DescriptorToSID, SIDToVssgp, NormalizeDescriptor) to handle multiple Azure DevOps identity formats. Introduces BuildIdentityMap for multi-form descriptor lookups and ResolvedIdentityName for better display names. Filters system descriptors (svc., s2s.) from resolution API calls and handles 404 responses gracefully. Improves Release audit log mapping to handle ReleaseDefinitionId in addition to PipelineId.
Implements frontend-friendly relationship representation using external IDs and resolved names. Adds global search across all tabs with hit counts, config metadata (parents, location), change status indicators (new/updated/deleted), and displays relationships in detail panel.
Add ForeignKeyErrors field to EntitySummary and implement row-by-row fallback for FK violations during bulk config_access inserts. This allows graceful handling of missing referenced entities while preserving valid rows and tracking constraint violations for diagnostics.
…ole resolution

Refactors permission handling to support Build and Release namespace permissions alongside Git. Introduces ResolveRoles as a generalized permission-to-role mapper with DefaultRoles, replacing ResolveGitRoles's fallback behavior. Adds ParseBuildPermissions and ParseReleasePermissions to extract permission bits, GetReleasePermissions client method, and permission fetching for both pipelines and releases with role-based access tracking.
…tching

Adds in-memory resolution of RelationshipSelectors in BuildUIRelationships and implements fallback matching logic for access control entities when aliases are unavailable. This enables correct relationship discovery from selector configurations and graceful handling of ID-only entity references.
Adds URL hash-based state persistence to enable bookmarkable and shareable links for specific configs, tabs, and search queries. Also enhances config relationship deduplication and extends Azure permissions documentation for resource-type-prefixed permissions.
…downloads

Adds an Issues tab to display orphaned changes and FK errors. Replaces hash-based routing with history API for cleaner URLs and deep-linking support (e.g., /configs/{id}). Includes per-config JSON downloads, membership resolution for user/group relationships, and SPA fallback routing. Consolidates external entity sources to prefer post-merge canonical IDs from SaveSummary.

BREAKING CHANGE: add scrape issues tab and client-side routing with config downloads
…ed deduplication

Remove DescriptorID function and switch to alias-based identity reconciliation. Users and groups are now deduplicated locally by descriptor and reconciled via alias overlap in SQL merge, allowing the AAD scraper's authoritative records to take precedence.
Extends ChangeResult with a Resolved field to track post-transformation changes through the mapping pipeline, and makes EntitySummary generic to carry both counts and canonical post-merge entities. This enables UI and downstream consumers to read merged state from a single authoritative location instead of deduplicating across individual results.
Add before/after/diff snapshots of DB state to track what changed during a scrape. Snapshots capture per-scraper and per-type config item counts, external entities, and access records across multiple time windows (Last/Hour/Day/Week). Captured before and after scrape runs to show impact and persist even on partial failures. Wired into UI with new Snapshot tab and pretty-print summaries in CLI logs.
Add last_created_at and last_updated_at fields to EntityWindowCounts to surface when entities were most recently created or modified. Update IsZero() and Sub() methods to handle new timestamp fields. Extend database queries to calculate MAX timestamps from underlying tables.
…e summaries

Extends EntitySummary with LastCreatedAt field to track the most recent entity creation timestamp across merges. Adds Warning struct to capture full diagnostic context (input, output, expression, error) of failed transformations. Introduces ScrapeSummary.AsMap() to expose json-tagged snake_case keys for template/CEL environments. Updates processors to pass transform context through the extraction pipeline.

BREAKING CHANGE: add warning context and last_created_at tracking to scrape summaries
…agnostic tooling

Restructure external entity resolution to handle merged entities via loser-id aliases. Add id-keyed caches for users/roles/groups, new findExternalEntityByID helper for resolving both canonical and historical ids via alias fallback. Implement savepoint-guarded merge functions with temp-table dump diagnostics on failure. Improve user-group resolution to validate direct UUIDs against current scrape entities, preventing dangling FKs. Add comprehensive test coverage for edge cases. Session properties now applied to merge functions.
… summary

Add ScraperTemplateEnv() method to provide request-shaping config (URLs, query payloads) with scraper-level template inputs. Restrict last_scrape_summary access to scraper-level config only, not per-item transforms. Implement deduplication of warnings by error key across ExtractedConfig, ScrapeSummary, and ConfigTypeScrapeSummary to prevent duplicate entries and consolidate counts.
… and config queries

Add support for AWS federation and browser login auto-authentication, trace recording configuration, config item export via queries, and improved script parameter flexibility including headless mode and execution timeout controls.
Update Go dependencies to latest versions including:
- flanksource packages (artifacts, clicky, commons, deps, duty, gomplate, sandbox-runtime)
- AWS SDK v2 services and related packages
- Google Cloud libraries
- OpenTelemetry packages
- Kubernetes client libraries
- Various other transitive dependencies

Add PROPERTIES.md documentation covering runtime configuration properties for retention, deduplication, scraper concurrency, scheduling, timeouts, logging, and other system controls.

Fix database query in external_cache.go to properly handle UUID slice results from Pluck operation.

Update Makefile to simplify go run command and add bench target for benchmarking.

Refs: dependency maintenance and documentation improvements
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 17, 2026

Walkthrough

This PR introduces a real-time scrape UI with streaming snapshots and diff visualization, enhances the scraping pipeline with structured warning tracking and transformation diagnostics, implements external entity merge deduplication via alias overlap, expands Azure DevOps permission handling for pipelines and releases, and adds property/change-result exposure in template contexts.

Changes

Cohort / File(s) Summary
Build and Configuration
.gitignore, Makefile, go.mod, PROPERTIES.md
Updated build flags to include git commit and build date, added .PHONY bench target, expanded benchmark scope; documented runtime properties; removed unused dependencies.
Core API Types and Snapshot Infrastructure
api/context.go, api/context_test.go, api/v1/interface.go, api/v1/interface_test.go, api/v1/scrape_snapshot.go, api/v1/scrape_snapshot_pretty.go, api/v1/scrape_snapshot_test.go
Introduced ScraperTemplateEnv() to expose scrape context in templates; refactored entity summaries to generic EntitySummary[T]; added Warning type with structured error tracking; introduced snapshot capture/diff infrastructure with windowed entity counts; added JSON serialization via AsMap() for templates.
Summary Formatting
api/v1/summary_pretty.go, api/v1/summary_pretty_test.go
Adapted pretty-printing to work with generic EntitySummary[T] via interface-based polymorphism.
Azure DevOps Schema Updates
api/v1/azure.go, config/schemas/config_azuredevops.schema.json, config/schemas/config_playwright.schema.json, config/schemas/scrape_config.schema.json
Updated permission documentation to support typed Type:Permission format for Git/Pipeline/Release; expanded Playwright schema with login, trace, query, and output-mode configurations.
Scrape UI Backend
cmd/scrapeui/types.go, cmd/scrapeui/server.go, cmd/scrapeui/convert.go, cmd/scrapeui/assets.go
Defined UI snapshot and progress models; implemented HTTP server with SSE streaming, SPA shell serving, and JSON config endpoints; added result transformation and relationship building utilities.
Scrape UI Frontend
cmd/scrapeui/frontend/...
Complete Preact-based UI for real-time scrape monitoring with config tree, detail panel, filter/search, relationships, access tables, HAR capture, logs with ANSI rendering, external entities, snapshots, and issues visualization; includes Vite build configuration with git metadata injection and file-based dev mode.
CLI Commands
cmd/root.go, cmd/ui.go, cmd/run.go
Added BuildInfo struct and GetBuildInfo() function; implemented standalone ui command for static UI serving; extended run command with optional scrape UI mode including snapshots, HAR capture, and real-time progress tracking.
Database Snapshot Capture and Change Resolution
db/scrape_snapshot.go, db/scrape_snapshot_test.go, db/change_traversal.go
Added CaptureScrapeSnapshot() to record windowed entity counts per scraper/config-type; populated ChangeResult.Resolved with full change metadata for move/copy/fan-out traversals.
External Entity Management
db/external_cache.go, db/external_entities.go, db/external_entities_test.go, db/external_loser_alias_test.go
Introduced ID-keyed entity resolution caches; implemented alias-overlap-based deduplication via union-find; added savepoint-based rollback recovery for merge failures; updated user-group resolution to handle merged entity remapping; expanded test coverage for loser-alias merge scenarios.
Permission and Access Logging
db/permission_changes.go
Added FK violation recovery with stub-row pre-creation; implemented row-by-row fallback insertion for failed bulk inserts; added FK diagnostic logging; tracked foreignKeyErrors in results.
Change Extraction and Results
db/update.go
Refactored change extraction to return extractChangesResult struct with orphaned/FK-error tracking; integrated warnings into ScrapeSummary; updated access-log persistence to return insertion results and track FK errors; added transform-context propagation and stricter config-access validation.
Change Rules and Templating
scrapers/changes/rules.go, scrapers/changes/rules_test.go
Migrated rule evaluation from standalone gomplate calls to ctx.RunTemplate() methods; changed logging key from "scraper.log.transforms" to "log.transforms"; added test coverage for template context availability in rules.
Extraction and Warning Management
scrapers/extract/extract.go, scrapers/extract/extract_test.go, scrapers/extract/fullmode.go
Changed Warnings from []string to structured []v1.Warning with count aggregation; added SetTransformContext() for diagnostic metadata; added TransformContext parameter to extraction entry point; updated summary to use generic EntitySummary[T]; added pointer-field conversion for external user groups.
Extraction Test Data
scrapers/extract/testdata/unit/*.yaml
Updated warning assertions to access .error field from structured warning objects instead of comparing raw strings.
JSON Processing and Scripting
scrapers/processors/json.go, scrapers/processors/script.go, scrapers/processors/script_test.go
Migrated filter/template evaluation from gomplate to ctx.RunTemplate(); introduced ScriptResult struct to capture script metadata (input/output/expression); implemented full-map preservation for ScrapeResult.Config; added test coverage for struct/map unmarshaling and environment variable exclusions.
HTTP and Slack Scrapers
scrapers/http/http.go, scrapers/slack/slack.go
Updated URL and filter templating to use ctx.ScraperTemplateEnv() and ctx.RunTemplate() for context-aware evaluation.
Scraper Job and Cache Management
scrapers/cron.go, scrapers/run.go
Exported ScraperSummaryCache and GetLastScrapeSummary() for public access; updated ExtractFullMode call site to pass api.ScrapeContext.
Azure DevOps Scrapers
scrapers/azure/active_directory.go, scrapers/azure/azure.go, scrapers/azure/devops/client.go, scrapers/azure/devops/groups.go, scrapers/azure/devops/pipelines.go, scrapers/azure/devops/releases.go, scrapers/azure/devops/repositories.go
Refactored external user groups to use v1.ExternalUserGroup with pointer fields; added deterministic role ID generation via RoleID(); migrated credential handling to generic azcore.TokenCredential; added build and release permission parsing; implemented permissions fetching for pipelines/releases with role resolution and deduplication; replaced deterministic UUID generation with descriptor-based deduplication; added identity resolution helpers (DescriptorAliases, BuildIdentityMap, ResolvedIdentityName).
Azure DevOps Tests
scrapers/azure/devops/groups_test.go, scrapers/azure/devops/audit_log_test.go, scrapers/azure/devops/repositories_test.go
Added coverage for role resolution with permission prefixes, build/release permission parsing, identity descriptor handling, and release definition ID extraction; updated function calls to use renamed exported helpers.
Config Access Tests
scrapers/config_access_test.go
Updated test assertions to unpack return value from SaveConfigAccessLogs now returning a result struct in addition to error.

Sequence Diagram(s)

sequenceDiagram
    participant Browser as Browser/Client
    participant ScrapeUI as Scrape UI Server
    participant DB as Database
    participant Scrapers as Scraper Workers

    Note over Browser,Scrapers: Real-Time Scrape Monitoring
    Browser->>ScrapeUI: GET /api/scrape (initial snapshot)
    activate ScrapeUI
    ScrapeUI->>DB: CaptureScrapeSnapshot(runStart)
    activate DB
    DB-->>ScrapeUI: ScrapeSnapshot (before state)
    deactivate DB
    ScrapeUI-->>Browser: JSON Snapshot
    deactivate ScrapeUI

    Browser->>ScrapeUI: EventSource /api/scrape/stream
    activate ScrapeUI
    loop During Scrape Execution
        Scrapers->>ScrapeUI: UpdateScraper(status, results, summary)
        activate ScrapeUI
        ScrapeUI->>DB: Process Results & Store
        ScrapeUI->>DB: CaptureScrapeSnapshot(runStart)
        activate DB
        DB-->>ScrapeUI: ScrapeSnapshot (after state)
        deactivate DB
        ScrapeUI->>ScrapeUI: DiffSnapshots(before, after)
        ScrapeUI-->>Browser: SSE Event with Updated Snapshot
        deactivate ScrapeUI
    end
    Scrapers->>ScrapeUI: SetDone()
    ScrapeUI-->>Browser: SSE done event
    deactivate ScrapeUI
    Browser-->>Browser: Render Final Results
Loading
sequenceDiagram
    participant Scraper as Scraper Worker
    participant ExtEntity as External Entity Sync
    participant DB as Database (Temp Table)
    participant Merge as Merge Function
    participant Cache as Entity ID Cache
    participant LiveDB as Live Database

    Note over Scraper,LiveDB: External Entity Merge & Deduplication
    Scraper->>ExtEntity: syncExternalEntities(users, groups, roles)
    activate ExtEntity
    ExtEntity->>ExtEntity: dedupeByIDWithIndex (alias overlap union-find)
    ExtEntity->>DB: Create Temp Tables (external_users, etc.)
    activate DB
    DB-->>ExtEntity: Temp tables ready
    deactivate DB
    
    ExtEntity->>Merge: Call merge_and_upsert_external_users
    activate Merge
    Merge->>Merge: Find connected components by alias overlap
    Merge->>Merge: Select winner (prefer non-nil ID)
    Merge-->>ExtEntity: Merge result with winner IDs
    deactivate Merge
    
    ExtEntity->>ExtEntity: Rewrite IDs based on idMap
    ExtEntity->>Cache: Populate ExternalUserIDCache (loser→winner)
    Cache-->>ExtEntity: Cache updated
    
    ExtEntity->>LiveDB: Fetch final canonical entities
    activate LiveDB
    LiveDB-->>ExtEntity: External Users/Groups/Roles
    deactivate LiveDB
    
    ExtEntity-->>Scraper: syncedExternalEntities (canonical + maps)
    deactivate ExtEntity
    
    Note over Scraper: Later: FindExternalEntityByID uses cache<br/>to resolve loser UUIDs to winners
Loading

Possibly related PRs

  • Pr/azure #2060: Overlaps with Azure DevOps scraper modifications including permission parsing (Git/Build/Release), identity resolution helpers, role ID generation, and external entity handling for pipelines/releases.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pr/slop
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch pr/slop

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 17, 2026

Benchstat

Base: ccc955a70f2bd9ff62adf84199fd0b03ca1373c1
Head: 6a3ba3f3344a41f7c7fd9e8a2240c20b1d4f4e13

✅ No significant performance changes detected

Full benchstat output
goos: linux
goarch: amd64
pkg: github.com/flanksource/config-db/bench
cpu: AMD EPYC 7763 64-Core Processor                
                                         │ bench-base.txt │
                                         │     sec/op     │
BenchSaveResultsSeed/N=1000-4                 632.3m ± 8%
BenchSaveResultsUpdateUnchanged/N=1000-4      737.3m ± 1%
BenchSaveResultsUpdateChanged/N=1000-4         1.208 ± 1%
geomean                                       825.8m

                                         │ bench-base.txt │
                                         │      MB/s      │
BenchSaveResultsSeed/N=1000-4                0.000 ± 0%
BenchSaveResultsUpdateUnchanged/N=1000-4     0.000 ± 0%
BenchSaveResultsUpdateChanged/N=1000-4       0.000 ± 0%
geomean                                                 ¹
¹ summaries must be >0 to compute geomean

                                         │ bench-base.txt │
                                         │      B/op      │
BenchSaveResultsSeed/N=1000-4                36.07Mi ± 0%
BenchSaveResultsUpdateUnchanged/N=1000-4     25.58Mi ± 0%
BenchSaveResultsUpdateChanged/N=1000-4       74.84Mi ± 0%
geomean                                      41.03Mi

                                         │ bench-base.txt │
                                         │   allocs/op    │
BenchSaveResultsSeed/N=1000-4                 442.0k ± 0%
BenchSaveResultsUpdateUnchanged/N=1000-4      314.0k ± 0%
BenchSaveResultsUpdateChanged/N=1000-4        904.6k ± 0%
geomean                                       500.8k

@socket-security
Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addednpm/​vite@​6.4.2861008398100
Addednpm/​typescript@​5.9.31001009010090
Addednpm/​@​preact/​preset-vite@​2.10.59910010092100
Addednpm/​preact@​10.29.110010010096100

View full report

@socket-security
Copy link
Copy Markdown

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action Severity Alert  (click "▶" to expand/collapse)
Warn High
Obfuscated code: npm entities is 91.0% likely obfuscated

Confidence: 0.91

Location: Package overview

From: cmd/scrapeui/frontend/package-lock.jsonnpm/@preact/preset-vite@2.10.5npm/entities@4.5.0

ℹ Read more on: This package | This alert | What is obfuscated code?

Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at support@socket.dev.

Suggestion: Packages should not obfuscate their code. Consider not using packages with obfuscated code.

Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment @SocketSecurity ignore npm/entities@4.5.0. You can also ignore all packages with @SocketSecurity ignore-all. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

Warn High
Obfuscated code: npm vite is 91.0% likely obfuscated

Confidence: 0.91

Location: Package overview

From: cmd/scrapeui/frontend/package-lock.jsonnpm/vite@6.4.2

ℹ Read more on: This package | This alert | What is obfuscated code?

Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at support@socket.dev.

Suggestion: Packages should not obfuscate their code. Consider not using packages with obfuscated code.

Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment @SocketSecurity ignore npm/vite@6.4.2. You can also ignore all packages with @SocketSecurity ignore-all. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
scrapers/cron.go (1)

417-436: ⚠️ Potential issue | 🟠 Major

ScraperSummaryCache entries are not evicted on scraper deletion.

DeleteScrapeJob clears scrapeJobs and unschedules related jobs, but leaves entries in the now-exported ScraperSummaryCache. For long-running instances with scraper churn (CRD churn, renames, tenant teardown) this grows unbounded. Since the cache is now also reachable externally via the exported symbol, the leak is more likely to manifest.

🛠️ Suggested fix
 func DeleteScrapeJob(id string) {
 	logger.Debugf("deleting scraper job for %s", id)
 
+	ScraperSummaryCache.Delete(id)
+
 	if j, ok := scrapeJobs.Load(id); ok {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/cron.go` around lines 417 - 436, DeleteScrapeJob currently
unschedules and removes entries from scrapeJobs but never evicts entries from
ScraperSummaryCache, causing an unbounded leak; update DeleteScrapeJob to call
ScraperSummaryCache.Delete for the primary id and any derived keys you use
(e.g., the same keys produced by consumeKubernetesWatchJobKey(id) and
consumePubSubJobKey(id)) after you Unschedule/Delete each job (or once at the
end) so cached summaries are removed when a scraper is deleted.
scrapers/http/http.go (1)

82-89: ⚠️ Potential issue | 🟡 Minor

Potential env-name collision with last_scrape_summary.

ScraperTemplateEnv() seeds templateEnv with the reserved key last_scrape_summary. If a user defines a spec.Env entry named last_scrape_summary, the loop at Line 87 will silently overwrite it (or vice-versa depending on ordering). Low likelihood but worth either documenting the reserved key or rejecting/warning on collision.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/http/http.go` around lines 82 - 89, The loop that populates
templateEnv (returned by ScraperTemplateEnv) can overwrite the reserved key
"last_scrape_summary"; update the loop in the function that iterates spec.Env to
detect collisions before assigning into templateEnv (use env.Name and the map
key lookup on templateEnv), and either return a clear error or emit a warning
when an incoming env.Name equals "last_scrape_summary" or any existing
templateEnv key; reference the existing GetEnvValueFromCache(ctx, env,
ctx.Namespace()) usage and keep the same value-fetch logic but skip assignment
on collision and handle it via error/warn as per project convention.
scrapers/slack/slack.go (1)

138-147: ⚠️ Potential issue | 🟡 Minor

Consider surfacing template errors instead of silently rejecting.

When ctx.RunTemplate fails (e.g., a malformed expression or missing variable), the filter now silently drops the message (return false, nil). For a misconfigured filter.Expr this will quietly skip every message in a channel with zero signal to the operator. Consider either returning the error (treated like a bad-filter exit in processRule) or at minimum logging it via ctx.Logger so that expression regressions are observable.

Also a minor typo on line 143: expresionexpression.

🛠️ Suggested change
-		output, err := ctx.RunTemplate(gomplate.Template{Expression: string(filter.Expr)}, message.AsMap())
-		if err != nil {
-			return false, nil
-		} else if parsed, err := strconv.ParseBool(output); err != nil {
-			return false, fmt.Errorf("expected expresion to return a boolean value: %w", err)
-		} else if !parsed {
-			return false, nil
-		}
+		output, err := ctx.RunTemplate(gomplate.Template{Expression: string(filter.Expr)}, message.AsMap())
+		if err != nil {
+			ctx.Logger.V(3).Infof("slack filter expression failed: %v", err)
+			return false, nil
+		}
+		parsed, err := strconv.ParseBool(output)
+		if err != nil {
+			return false, fmt.Errorf("expected expression to return a boolean value: %w", err)
+		}
+		if !parsed {
+			return false, nil
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/slack/slack.go` around lines 138 - 147, The current filter block
swallows template evaluation errors from ctx.RunTemplate (using filter.Expr) by
returning false, nil; change this to surface failures by returning the error (so
processRule can treat it as a bad-filter exit) or at minimum call
ctx.Logger.Errorf(...) with the error and context (including filter.Expr and
message.AsMap()) before returning false, err; also fix the typo in the error
message in the strconv.ParseBool branch from "expresion" to "expression". Ensure
you update the branch handling of ctx.RunTemplate, keeping references to
ctx.RunTemplate, filter.Expr, message.AsMap(), and processRule.
cmd/run.go (1)

266-275: ⚠️ Potential issue | 🟡 Minor

UI mode suppresses the non-zero exit on errors.

When uiEnabled is true, the handler returns after receiving SIGINT/SIGTERM, bypassing the if hasErrors { os.Exit(1) } block below. If that's intentional (UI users expect graceful Ctrl-C even after failures), consider preserving the exit code for CI/scripted use, e.g. os.Exit(1) on hasErrors before returning. At minimum worth a comment clarifying the intent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/run.go` around lines 266 - 275, The UI branch returns immediately after
receiving SIGINT/SIGTERM which skips the subsequent hasErrors check and non-zero
exit; update the uiServer signal handling (the sig channel and its receive block
where uiServer != nil) to check hasErrors and call os.Exit(1) before returning
(or add a clarifying comment if the behavior is intentional) so that failures
still produce a non-zero exit code for CI/scripted runs; reference the uiServer
variable, the sig channel/signal.Notify block, and the hasErrors/os.Exit(1)
logic when making the change.
db/external_entities.go (1)

81-134: ⚠️ Potential issue | 🟠 Major

groupIDMap is built but not returned; resolvedGroups and resolvedRoles are never rewritten to winner IDs.

upsertExternalEntities builds and populates groupIDMap during the merge (loser→winner mappings from the groups merge block), but the function signature only returns a single map:

func upsertExternalEntities(
	...
) (upsertCounts, map[uuid.UUID]uuid.UUID, error) {
	// ...
	userIDMap := make(map[uuid.UUID]uuid.UUID)
	groupIDMap := make(map[uuid.UUID]uuid.UUID)
	// ...populate both maps...
	return counts, userIDMap, nil  // only userIDMap returned
}

As a result:

  1. Only resolvedUsers is rewritten in the ID-rewrite loop at lines 112–116. resolvedGroups and resolvedRoles still carry their pre-merge (loser) IDs when published via result.Groups.Entities / result.Roles.Entities (lines 130, 133) and via synced.Groups / synced.Roles (lines 120–121). Downstream consumers (e.g. FK remapping in SaveResults that uses configAccess.ExternalGroupID / ExternalRoleID) receive IDs that may already be obsolete in the DB.

  2. The synced.GroupIDMap field — documented in the struct at lines 41–44 — is declared but never assigned anywhere, leaving it nil for callers that need it.

  3. The ID-cache writes at lines 97–103 insert loser→loser mappings for groups/roles that got merged, instead of loser→winner. Lookups via findExternalEntityByID return stale loser IDs from the cache until TTL eviction.

(Note: groupIDMap is used internally by upsertExternalUserGroupsBlockremapExternalUserGroups at line 633, confirming the data exists and just isn't being propagated to the caller.)

Suggested fix

Expose both userIDMap and groupIDMap from upsertExternalEntities:

 func upsertExternalEntities(
 	...
-) (upsertCounts, map[uuid.UUID]uuid.UUID, error) {
+) (upsertCounts, map[uuid.UUID]uuid.UUID, map[uuid.UUID]uuid.UUID, error) {
 	...
-	return counts, userIDMap, nil
+	return counts, userIDMap, groupIDMap, nil
 }

Update the callsite in syncExternalEntities to receive and use both maps:

-	counts, idMap, err := upsertExternalEntities(ctx, resolvedUsers, resolvedGroups, resolvedRoles, resolvedUserGroups, scraperID)
+	counts, userIDMap, groupIDMap, err := upsertExternalEntities(ctx, resolvedUsers, resolvedGroups, resolvedRoles, resolvedUserGroups, scraperID)
 	
-	for i := range resolvedUsers {
-		if winner, ok := idMap[resolvedUsers[i].ID]; ok {
-			resolvedUsers[i].ID = winner
+	for i := range resolvedUsers {
+		if winner, ok := userIDMap[resolvedUsers[i].ID]; ok {
+			resolvedUsers[i].ID = winner
+		}
+	}
+	for i := range resolvedGroups {
+		if winner, ok := groupIDMap[resolvedGroups[i].ID]; ok {
+			resolvedGroups[i].ID = winner
+		}
 	}
 
 	synced = syncedExternalEntities{
 		Users:      resolvedUsers,
 		Groups:     resolvedGroups,
 		Roles:      resolvedRoles,
-		UserIDMap:  idMap,
+		UserIDMap:  userIDMap,
+		GroupIDMap: groupIDMap,
 	}

Fix the cache eviction and population for groups (mirroring the user block):

 	if scraperID != nil {
-		for loserID := range idMap {
+		for loserID := range userIDMap {
 			ExternalUserIDCache.Delete(loserID.String())
 		}
+		for loserID, winnerID := range groupIDMap {
+			ExternalGroupIDCache.Delete(loserID.String())
+			ExternalGroupIDCache.Set(loserID.String(), winnerID, cache.DefaultExpiration)
+		}
 		...
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/external_entities.go` around lines 81 - 134, upsertExternalEntities builds
a groupIDMap (and possibly roleIDMap) but only returns the user ID map, so
syncExternalEntities only rewrites resolvedUsers and writes loser→loser entries
into caches; fix by changing upsertExternalEntities to return the additional ID
maps (e.g., groupIDMap, roleIDMap) alongside the user map, update the call in
syncExternalEntities to capture them from upsertExternalEntities, apply the same
winner-ID rewrite loop to resolvedGroups and resolvedRoles (mirror the users
loop), assign synced.GroupIDMap (and synced.RoleIDMap if present) from the
returned maps, and when updating ExternalGroupIDCache/ExternalGroupCache and
ExternalRoleIDCache/ExternalRoleCache use the winner ID (map lookup) instead of
the original loser ID so caches store loser→winner mappings used by
remapExternalUserGroups/upsertExternalUserGroupsBlock.
🟡 Minor comments (21)
config/schemas/config_playwright.schema.json-722-729 (1)

722-729: ⚠️ Potential issue | 🟡 Minor

Remove or clarify the unused top-level har field in the Playwright config.

The Playwright.har field (line 727-729) is defined in both the Go struct and JSON schema but is never used in the codebase. HAR recording is controlled exclusively by ctx.IsTrace() or Trace.har (inside PlaywrightTrace). The field should either be removed or marked as deprecated to avoid user confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/schemas/config_playwright.schema.json` around lines 722 - 729, The
top-level Playwright.har field is unused (HAR is controlled by ctx.IsTrace() and
Trace.har inside PlaywrightTrace), so remove or deprecate it: update the JSON
schema entry for "har" (and the corresponding Go struct field Playwright.har) to
either delete that property entirely or replace it with a deprecated boolean
(add a "deprecated": true and update the description to point users to Trace.har
and ctx.IsTrace()), and ensure any validation or unmarshalling code that
references Playwright.har is removed or adjusted to use PlaywrightTrace
(PlaywrightTrace.har) instead so no dead/duplicated config remains; also update
any docs/messages to refer to Trace.har.
cmd/scrapeui/frontend/src/components/AliasList.tsx-12-21 (1)

12-21: ⚠️ Potential issue | 🟡 Minor

Silent clipboard failure & minor a11y gap.

navigator.clipboard.writeText(alias) returns a Promise that can reject (no permission, insecure context, or API unavailable), and the rejection is currently swallowed. Consider catching the rejection and surfacing a fallback/toast. Also, for screen readers, the icon-only button benefits from aria-label in addition to title.

♻️ Suggested change
           <button
             class="text-gray-300 hover:text-gray-600 opacity-0 group-hover:opacity-100 transition-opacity shrink-0"
             title="Copy"
+            aria-label={`Copy ${alias}`}
             onClick={(e) => {
               e.stopPropagation();
-              navigator.clipboard.writeText(alias);
+              navigator.clipboard?.writeText(alias).catch((err) => {
+                console.warn('Failed to copy alias', err);
+              });
             }}
           >
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/scrapeui/frontend/src/components/AliasList.tsx` around lines 12 - 21, In
AliasList.tsx update the copy button handler to handle clipboard failures and
improve accessibility: wrap navigator.clipboard.writeText(alias) in a promise
catch (or use async/await try/catch) and on rejection show a fallback (e.g.,
programmatic selection+execCommand or dispatch a toast/error notification) so
errors aren’t swallowed, and add an aria-label (e.g., aria-label="Copy alias")
to the icon-only <button> (keep the existing title) to support screen readers.
cmd/scrapeui/frontend/src/components/AccessLogTable.tsx-99-99 (1)

99-99: ⚠️ Potential issue | 🟡 Minor

Use a stable row key so expansion state follows the entry, not the position.

Keying AccessLogRow by array index means that when the user changes the sort column, the local open state of each row (useState in AccessLogRow) stays bound to the row position, not the underlying entry — so the wrong row appears expanded after sorting. Prefer a stable key derived from the entry (e.g., a composite of external_config_id, external_user_id, and created_at, or a backend-supplied id if available).

🛠️ Suggested change
-          {sorted.map((e, i) => <AccessLogRow key={i} entry={e} lookups={lookups} />)}
+          {sorted.map((e, i) => (
+            <AccessLogRow
+              key={`${e.external_config_id ?? ''}-${e.external_user_id ?? ''}-${e.created_at ?? i}`}
+              entry={e}
+              lookups={lookups}
+            />
+          ))}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/scrapeui/frontend/src/components/AccessLogTable.tsx` at line 99, The list
is keyed by array index so AccessLogRow's internal open state follows row
position instead of the actual entry; change the key in the sorted.map call that
renders AccessLogRow to a stable identifier derived from the entry (for example
use a composite of entry.external_config_id + '|' + entry.external_user_id + '|'
+ entry.created_at, or use a backend-supplied id if present) so expansion state
follows the entry rather than its sorted position.
cmd/scrapeui/frontend/src/components/HARPanel.tsx-133-135 (1)

133-135: ⚠️ Potential issue | 🟡 Minor

Index-as-key on sorted+filtered rows.

key={i} on sorted.map((e, i) => <HARRow …/>) means row identity changes whenever the list is re-sorted or filtered. Since HARRow holds local open state, the wrong row can appear expanded after a sort/filter change. Prefer a stable key derived from the entry (e.g., startedDateTime + url) or an id if available.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/scrapeui/frontend/src/components/HARPanel.tsx` around lines 133 - 135,
The rows use an unstable index key (key={i}) in sorted.map((e, i) => <HARRow …
/>), which causes HARRow's local open state to mix up after sorting/filtering;
change the key to a stable identifier derived from the entry (for example
combine entry.startedDateTime and entry.request.url or use an existing entry.id)
when rendering <HARRow>, i.e., replace key={i} with a stable expression like
`${entry.startedDateTime}-${entry.request.url}` (or the id field) so each row
keeps its identity across reorders/filters.
cmd/scrapeui/frontend/src/components/ConfigNode.tsx-31-31 (1)

31-31: ⚠️ Potential issue | 🟡 Minor

isNew heuristic risks falsely flagging every persisted item.

!item.Action && !!item.created_at will classify essentially every historical config item (all of which have a created_at) as "new" whenever the scrape result didn't set Action. If the intent is "created in this scrape window", gate on the snapshot's window (e.g., compare created_at against a window boundary) or rely strictly on item.Action === 'inserted'.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/scrapeui/frontend/src/components/ConfigNode.tsx` at line 31, The isNew
heuristic in ConfigNode.tsx (the isNew variable using item.Action and
item.created_at) wrongly marks all persisted items as new; change it to either
rely strictly on item.Action === 'inserted' or, if you really need to treat
items created during the current scrape as new, compare item.created_at against
the scrape snapshot window boundaries (e.g.,
snapshotWindowStart/snapshotWindowEnd) instead of merely checking for presence
of created_at; update the isNew assignment to use that comparison or drop the
created_at check so only item.Action === 'inserted' signals new.
cmd/scrapeui/frontend/src/components/ConfigNode.tsx-29-29 (1)

29-29: ⚠️ Potential issue | 🟡 Minor

isSelected is incorrectly true when selected is null and the item has no id/config_type.

selected?.id === item.id && selected?.config_type === item.config_type evaluates to undefined === undefinedtrue for any item whose id and config_type are undefined while nothing is selected. Aggregated/metadata-only ScrapeResults (e.g., the synthetic entries from unmarshalConfigsFromString metadata-only path covered in scrapers/processors/script_test.go) can have empty id/config_type, which would light them all up as "selected" in the tree.

🐛 Proposed fix
-  const isSelected = selected?.id === item.id && selected?.config_type === item.config_type;
+  const isSelected =
+    selected != null &&
+    selected.id === item.id &&
+    selected.config_type === item.config_type;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/scrapeui/frontend/src/components/ConfigNode.tsx` at line 29, The
isSelected expression incorrectly returns true when both selected and item have
undefined id/config_type; change it to explicitly require a non-null selected
before comparing fields (e.g., check selected != null or !!selected) and then
compare selected.id === item.id && selected.config_type === item.config_type so
that items with undefined id/config_type are not treated as selected when
nothing is selected; update the isSelected variable in ConfigNode (the
declaration using selected and item) accordingly.
cmd/ui.go-24-38 (1)

24-38: ⚠️ Potential issue | 🟡 Minor

Artifacts is deserialized but never propagated.

jsonResults.Artifacts is parsed from the input JSON but not copied into scrapeui.Snapshot (lines 55-70). Either wire it through (if the UI is meant to show artifacts from saved runs) or drop the field to avoid confusing silent drops.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/ui.go` around lines 24 - 38, jsonResults.Artifacts is parsed from input
JSON but never copied into the scrapeui.Snapshot, so artifacts are silently
dropped; update the code that maps jsonResults -> scrapeui.Snapshot (where
scrapeui.Snapshot is constructed/filled) to include the Artifacts field (copy
jsonResults.Artifacts into the corresponding scrapeui.Snapshot.Artifacts or
equivalent field) or remove the Artifacts field from jsonResults if artifacts
are not intended to be propagated; locate the mapping logic that builds
scrapeui.Snapshot from jsonResults and add the Artifacts assignment to ensure
artifacts are passed through.
db/scrape_snapshot_test.go-295-295 (1)

295-295: ⚠️ Potential issue | 🟡 Minor

Vacuous assertion — int >= 0 is always true.

Based on the seed (the single group in-run is lastHour, outside the run-start window), the intent looks like asserting Equal(0) or at least a tighter bound. As written it's a no-op and will pass on any value.

-		Expect(snap.ExternalGroups.UpdatedLast).To(BeNumerically(">=", 0))
+		// seeded group's updated_at is before runStart, so it must not land in Last
+		Expect(snap.ExternalGroups.UpdatedLast).To(BeNumerically("<", 1_000_000))

(Or remove the line entirely if no meaningful upper bound is asserted elsewhere.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/scrape_snapshot_test.go` at line 295, The assertion
Expect(snap.ExternalGroups.UpdatedLast).To(BeNumerically(">=", 0)) is vacuous;
change it to assert the intended value based on the test seed (e.g.,
Expect(snap.ExternalGroups.UpdatedLast).To(Equal(0)) if the single seeded group
is outside the run-start window) or tighten the bound to the correct expected
range for ExternalGroups.UpdatedLast so the test actually validates behavior.
cmd/scrapeui/frontend/src/components/AccessTable.tsx-80-96 (1)

80-96: ⚠️ Potential issue | 🟡 Minor

Inconsistent nullability on entries — possible NPE if undefined + search is set.

The Props type declares entries: ExternalConfigAccess[] (required), yet line 94 guards against !entries. Meanwhile the useMemo on line 81 runs before the guard and calls entries.filter(...) when search is truthy — if a caller ever passes undefined, this throws before the empty-state renders. Either tighten the type (entries?: ExternalConfigAccess[]) and guard in the memo, or drop the defensive check on line 94.

🔧 Proposed fix
-  const filtered = useMemo(() => {
-    if (!search) return entries;
-    return entries.filter(e =>
+  const filtered = useMemo(() => {
+    if (!entries) return [];
+    if (!search) return entries;
+    return entries.filter(e =>
       matchesSearch(search,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/scrapeui/frontend/src/components/AccessTable.tsx` around lines 80 - 96,
Props currently claims entries is required but the component defensively checks
!entries and useMemo assumes entries exists; make the contract consistent.
Either update the Props declaration (entries?: ExternalConfigAccess[]) and
change the useMemo implementation in AccessTable to treat entries as empty array
when undefined (e.g., use (entries || []) before filter) so the memo and
empty-state guard are safe, or keep entries required and remove the defensive
!entries check and any branches that handle undefined; reference AccessTable,
Props, useMemo, and the filtered variable to locate and update the code
accordingly.
cmd/scrapeui/frontend/vite.config.ts-103-108 (1)

103-108: ⚠️ Potential issue | 🟡 Minor

snap.config_meta is never populated.

The snap object built at lines 38–68 has no config_meta field, so (snap.config_meta || {})[id] always resolves to undefined and _meta is never useful. Either wire raw.ConfigMeta || raw.config_meta into the snapshot like the other fields, or drop _meta from the detail payload.

🔧 Proposed fix
         relationships: raw.Relationships || raw.relationships || [],
+        config_meta: raw.ConfigMeta || raw.config_meta || {},
         counts: {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/scrapeui/frontend/vite.config.ts` around lines 103 - 108, The snapshot
object snap (constructed earlier) never includes config_meta, so detail's _meta
= (snap.config_meta || {})[id] is always undefined; fix by wiring raw.ConfigMeta
|| raw.config_meta into the snapshot when building snap (mirror how other fields
from raw are assigned) so snap.config_meta exists and _meta is populated, or
alternatively remove the _meta property from the detail payload; update the code
that constructs snap (referencing raw and snap) to include config_meta if you
choose the first option.
scrapers/azure/devops/pipelines.go-615-626 (1)

615-626: ⚠️ Potential issue | 🟡 Minor

access.ExternalUserAliases = []string{email} can store a single empty alias.

Same root cause as the user record above — when email == "", the emitted ExternalConfigAccess row has ExternalUserAliases = [""]. If the matcher treats empty aliases as "match anything," this creates cross-linking risk for access rows. Derive the alias list from the same (filtered) slice used for the user record.

🔧 Suggested fix
-			if identity.IsContainer {
-				access.ExternalGroupAliases = DescriptorAliases(identity.Descriptor)
-			} else {
-				access.ExternalUserAliases = []string{email}
-			}
+			if identity.IsContainer {
+				access.ExternalGroupAliases = DescriptorAliases(identity.Descriptor)
+			} else if email != "" {
+				access.ExternalUserAliases = []string{email}
+			} else {
+				access.ExternalUserAliases = []string{identity.Descriptor}
+			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/azure/devops/pipelines.go` around lines 615 - 626, The code creates
ExternalConfigAccess with ExternalUserAliases set to []string{email} which can
insert an empty alias when email == ""; change this to derive the user-alias
list the same way the user record does (or compute a filtered slice) and only
set access.ExternalUserAliases when the resulting slice is non-empty—i.e.,
compute userAliases by filtering out empty strings (reusing the same slice used
for the user record), then if len(userAliases) > 0 assign
access.ExternalUserAliases = userAliases; keep the existing branch for
identity.IsContainer that uses DescriptorAliases(identity.Descriptor), and
append access to configAccess as before.
scrapers/azure/devops/groups.go-141-159 (1)

141-159: ⚠️ Potential issue | 🟡 Minor

email = name fallback stores the display name in the Email field.

When identity.Properties["Mail"] is absent, email is set to name (e.g. "Jane Doe") and stored as ExternalUser.Email. That is not an email address, and because addUser only backfills when the new value is empty (if u.Email == nil || *u.Email == ""), a later observation of the same descriptor with a real Mail value will still lose if the descriptor was first seen without one — the existing non-empty "name-as-email" wins over subsequent merges' fallback path. Prefer leaving Email nil when no actual mail property exists.

🔧 Suggested fix
-			name := ResolvedIdentityName(identity, "")
-			email := ""
-			if mail, ok := identity.Properties["Mail"]; ok {
-				email = mail.Value
-			}
-			if email == "" && name == "" {
-				continue
-			}
-			if email == "" {
-				email = name
-			}
-
-			userAliases := uniqueAliases(
-				email,
-				identity.Descriptor,
-				identity.SubjectDescriptor,
-			)
-			addUser(identity.Descriptor, dutyModels.ExternalUser{
-				Name:     name,
-				Email:    &email,
+			name := ResolvedIdentityName(identity, "")
+			email := ""
+			if mail, ok := identity.Properties["Mail"]; ok {
+				email = mail.Value
+			}
+			if email == "" && name == "" {
+				continue
+			}
+
+			var emailPtr *string
+			if email != "" {
+				emailPtr = &email
+			}
+			userAliases := uniqueAliases(
+				email,
+				identity.Descriptor,
+				identity.SubjectDescriptor,
+			)
+			addUser(identity.Descriptor, dutyModels.ExternalUser{
+				Name:     name,
+				Email:    emailPtr,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/azure/devops/groups.go` around lines 141 - 159, The code currently
falls back to email = name when Mail is missing, causing display names to be
stored as ExternalUser.Email and preventing later real emails from replacing it;
update the logic in the block around the email variable and the call to addUser
so that if email == "" you do NOT assign name to email and instead pass a nil
Email pointer into dutyModels.ExternalUser (e.g. compute an emailPtr that is nil
when email is empty, or only set &email when non-empty) while keeping the same
aliases and descriptors (identity.Descriptor, identity.SubjectDescriptor) and
still using uniqueAliases and addUser.
scrapers/azure/devops/pipelines.go-575-600 (1)

575-600: ⚠️ Potential issue | 🟡 Minor

Empty email leaks into user aliases.

emailFromIdentity(identity) can return "" (only name gated the continue below for that combination; when name != "" we fall through with email == ""). The non-container branch then writes pq.StringArray{email, identity.Descriptor, identity.SubjectDescriptor}, producing an empty-string alias. An empty alias participates in overlap-based merges and will create spurious matches.

Filter empties (e.g. reuse uniqueAliases from groups.go) and pass a nil Email when absent, matching the adjustment suggested in scrapers/azure/devops/groups.go.

🔧 Suggested fix
-		} else {
-			ctx.AddUser(dutyModels.ExternalUser{
-				Name:     name,
-				Email:    &email,
-				Aliases:  pq.StringArray{email, identity.Descriptor, identity.SubjectDescriptor},
-				Tenant:   config.Organization,
-				UserType: "AzureDevOps",
-			})
-		}
+		} else {
+			var emailPtr *string
+			if email != "" {
+				emailPtr = &email
+			}
+			ctx.AddUser(dutyModels.ExternalUser{
+				Name:     name,
+				Email:    emailPtr,
+				Aliases:  pq.StringArray(uniqueAliases(email, identity.Descriptor, identity.SubjectDescriptor)),
+				Tenant:   config.Organization,
+				UserType: "AzureDevOps",
+			})
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/azure/devops/pipelines.go` around lines 575 - 600, The user branch
can insert an empty-string alias because emailFromIdentity(identity) may return
""; update the non-container path around ResolvedIdentityName/ emailFromIdentity
so that Email is set to nil when email == "" and the Aliases passed to
ctx.AddUser (call site: ctx.AddUser in this file) are deduplicated/filtered to
remove empty strings (reuse the uniqueAliases helper pattern used in groups.go
or implement equivalent filtering for pq.StringArray{...} to exclude ""). Ensure
aliases still include identity.Descriptor and identity.SubjectDescriptor but
never include "".
scrapers/processors/json.go-464-469 (1)

464-469: ⚠️ Potential issue | 🟡 Minor

TransformOutput aliases the same map that downstream mutations will edit in place.

scriptResult.Results[i].Config is assigned to TransformOutput at line 466, storing a reference to the same map object. However, applyMask (called at line 529) mutates results[i].Config in place via m.JSONPath.Set(results[i].Config, ...) (lines 859, 863), which will also mutate TransformOutput. If TransformOutput is intended as a diagnostic snapshot of what the script emitted before masking, it should be deep-copied rather than aliased.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/processors/json.go` around lines 464 - 469,
scriptResult.Results[i].TransformOutput is currently set to the same map as
scriptResult.Results[i].Config, so later in-place mutations by applyMask (via
m.JSONPath.Set in applyMask) also modify the stored TransformOutput; fix this by
making TransformOutput a deep copy/snapshot of Config when setting it (e.g.,
clone the map/struct contents or marshal-unmarshal to create a new map) before
any masking occurs so applyMask’s modifications to results[i].Config do not
alter TransformOutput; update the assignment in the loop that sets
TransformOutput for scriptResult.Results to use that deep-copy method.
cmd/scrapeui/frontend/src/App.tsx-178-199 (1)

178-199: ⚠️ Potential issue | 🟡 Minor

Object fallback coerces a struct into extId and never matches.

const extId = (a.external_config_id as any)?.external_id || a.external_config_id; — if external_config_id is an object without an external_id field (e.g., only config_id populated), this assigns the entire object to extId, after which cfg.id === extId is always false. Same pattern used twice (access + logs). matchesConfig in DetailPanel.tsx handles this by probing ext.config_id as well; this helper should mirror that.

♻️ Proposed fix
-    for (const a of access) {
-      const extId = (a.external_config_id as any)?.external_id || a.external_config_id;
-      if (!extId) continue;
+    const extractExtId = (a: any): string | undefined => {
+      const ext = a.external_config_id;
+      if (typeof ext === 'string') return ext;
+      if (ext && typeof ext === 'object') return ext.external_id || ext.config_id;
+      return a.config_id;
+    };
+    for (const a of access) {
+      const extId = extractExtId(a);
+      if (!extId) continue;

Apply the same helper for the logs loop below.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/scrapeui/frontend/src/App.tsx` around lines 178 - 199, The code in the
access and logs loops computes extId using (a.external_config_id as
any)?.external_id || a.external_config_id which wrongly assigns the whole object
when external_config_id lacks external_id, so cfg.id === extId never matches;
update both loops to mirror the probing used by matchesConfig in DetailPanel.tsx
(i.e., extract external_config_id.external_id, then
external_config_id.config_id, then fall back to a primitive id) so extId is
always a string/number id before comparing to cfg.id and then incrementing the
counters (references: the access loop variable a, logs loop variable l,
external_config_id, extId, configs, zero(), m.get, c.access and c.accessLogs).
api/v1/scrape_snapshot.go-40-55 (1)

40-55: ⚠️ Potential issue | 🟡 Minor

Sub copies LastCreatedAt/LastUpdatedAt from the "after" side, which defeats the IsZero filter in diffMaps.

diffMaps only keeps entries where delta.IsZero() is false (line 143). But Sub unconditionally copies the after-side timestamps into the result, and IsZero() returns false whenever either timestamp pointer is non-nil (line 37). Consequence: a PerScraper/PerConfigType key whose counter values are identical in before and after still produces a "diff" entry (with all counter deltas = 0) as long as the after snapshot recorded a last-created/updated timestamp — which is the common case. The compact-diff guarantee in the comment above diffMaps effectively never applies.

♻️ Suggested fixes (pick one)
 func (e EntityWindowCounts) Sub(other EntityWindowCounts) EntityWindowCounts {
 	return EntityWindowCounts{
 		Total:         e.Total - other.Total,
 		...
-		LastCreatedAt: e.LastCreatedAt,
-		LastUpdatedAt: e.LastUpdatedAt,
+		// Only carry timestamps when the counter state actually changed,
+		// otherwise an unchanged entry shows up in the diff.
+		LastCreatedAt: timestampIfChanged(e, other, e.LastCreatedAt),
+		LastUpdatedAt: timestampIfChanged(e, other, e.LastUpdatedAt),
 	}
 }

Or, alternatively, exclude timestamps from IsZero:

 func (e EntityWindowCounts) IsZero() bool {
 	return e.Total == 0 &&
 		e.UpdatedLast == 0 && e.UpdatedHour == 0 && e.UpdatedDay == 0 && e.UpdatedWeek == 0 &&
-		e.DeletedLast == 0 && e.DeletedHour == 0 && e.DeletedDay == 0 && e.DeletedWeek == 0 &&
-		e.LastCreatedAt == nil && e.LastUpdatedAt == nil
+		e.DeletedLast == 0 && e.DeletedHour == 0 && e.DeletedDay == 0 && e.DeletedWeek == 0
 }

The second option keeps snapshot IsZero semantics intact for diffs while preserving timestamps for Pretty rendering.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/scrape_snapshot.go` around lines 40 - 55, Sub currently always copies
LastCreatedAt/LastUpdatedAt from the "after" side which makes IsZero() return
false even when all count deltas are zero; change EntityWindowCounts.Sub so it
still computes numeric deltas as it does but only sets LastCreatedAt and
LastUpdatedAt in the result when the timestamps actually differ between e and
other (otherwise leave them nil), so diffMaps plus IsZero() will correctly
suppress entries whose counters and timestamps didn't change; reference:
EntityWindowCounts.Sub, EntityWindowCounts.IsZero, and diffMaps.
cmd/scrapeui/server.go-248-253 (1)

248-253: ⚠️ Potential issue | 🟡 Minor

SSE fan-out: only one subscriber wakes up per notify().

updated is a single chan struct{} with buffer 1, shared across all SSE connections. When notify() fires, whichever handleSSE goroutine happens to be parked in the select drains it; any other connected clients won't see that update until the 2-second ticker tick. With a single viewer this is fine, but anything multi-tab or multi-client becomes noticeably laggy.

Consider maintaining a per-connection subscriber channel (subscribe on enter, unsubscribe on exit) and fan-out in notify().

Also applies to: 428-452

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/scrapeui/server.go` around lines 248 - 253, The notify() function
currently writes to a single buffered channel s.updated which only wakes one
handleSSE goroutine, causing missed updates for other SSE clients; change the
design to maintain per-connection subscriber channels (e.g., have Server hold a
map or slice of subscriber chans and methods subscribe/unsubscribe used by
handleSSE when a client connects/disconnects) and update notify() to iterate
over all subscriber channels and non-blockingly send the update (dropping to
avoid blocking slow clients). Update handleSSE to create and register its own
chan struct{} on entry and deregister it on exit so every connected client
receives immediate fan-out notifications.
scrapers/extract/extract.go-60-79 (1)

60-79: ⚠️ Potential issue | 🟡 Minor

AddWarning keys by w.Error — empty error strings collapse unrelated warnings.

Every call path in validateConfigRefs supplies a non-empty Error, so this is not hit today, but AddWarning is exported and there is no guard. A caller that accidentally passes v1.Warning{Result: r} (Error omitted) will silently have all such entries deduped into a single aggregated record keyed by "". Consider rejecting or at minimum labelling empty-error warnings (e.g. fall back to "unknown") to avoid misleading counts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/extract/extract.go` around lines 60 - 79, The AddWarning method keys
warnings by w.Error which lets empty strings collapse unrelated warnings; update
AddWarning (and its use of warningIndex, warningCount, Warnings, and v1.Warning)
to guard against empty Error values by normalizing them to a meaningful sentinel
(e.g. "unknown_error" or "unnamed_warning") before indexing or aggregating —
alternatively return/early-reject when w.Error == "" if you prefer strictness;
ensure the normalized key is used for warningIndex lookups and when assigning
e.Warnings entries so counts are not silently merged.
db/update.go-688-693 (1)

688-693: ⚠️ Potential issue | 🟡 Minor

Warning transform attribution is taken from an unrelated result.

extractResult.transformInput / transformExpr are populated by the first scrape result that carries a non-nil transform (see extraction path assigning once). The warnings created here for config_access / access_log rows typically originate from arbitrary other results in the same batch, so a consumer reading the warning's Input/Expr will see fields from a result that has nothing to do with the skipped row.

If the intent is diagnostic, consider either (a) dropping Input/Expr on these warnings (leave Result only), or (b) tracking transform context per-result and looking up the correct one for the row being warned about.

Also applies to: 735-740

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/update.go` around lines 688 - 693, The warning currently uses
extractResult.transformInput/transformExpr (set once from the first
transform-bearing scrape) when calling summary.AddScrapeWarning for
configAccess/access_log rows, which misattributes Input/Expr to unrelated
results; either remove populating the Warning.Input and Warning.Expr fields in
these AddScrapeWarning calls (leave Error and Result only), or modify the code
to look up and attach the transform context tied to the specific row (e.g.,
maintain per-result transformContext and use that instead of
extractResult.transformInput/transformExpr when constructing the Warning for
configAccess/access_log); update both places where AddScrapeWarning is invoked
for configAccess and access_log rows accordingly.
scrapers/extract/extract.go-120-134 (1)

120-134: ⚠️ Potential issue | 🟡 Minor

initWarningIndex leaves stale duplicates when Warnings is pre-populated.

If e.Warnings already contains two entries with the same Error (e.g. restored from a marshaled ExtractedConfig), only the first is added to warningIndex. Later AddWarning calls will bump only that first entry's Count, and the duplicate rows remain with their original counts, producing inconsistent totals.

♻️ Fold duplicates on init
-	e.warningIndex = make(map[string]int, len(e.Warnings))
-	for i := range e.Warnings {
-		if _, ok := e.warningIndex[e.Warnings[i].Error]; ok {
-			continue
-		}
-		if e.Warnings[i].Count == 0 {
-			e.Warnings[i].Count = 1
-		}
-		e.warningIndex[e.Warnings[i].Error] = i
-	}
+	e.warningIndex = make(map[string]int, len(e.Warnings))
+	deduped := e.Warnings[:0]
+	for i := range e.Warnings {
+		w := e.Warnings[i]
+		if w.Count == 0 {
+			w.Count = 1
+		}
+		if idx, ok := e.warningIndex[w.Error]; ok {
+			deduped[idx].Count += w.Count
+			continue
+		}
+		e.warningIndex[w.Error] = len(deduped)
+		deduped = append(deduped, w)
+	}
+	e.Warnings = deduped
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/extract/extract.go` around lines 120 - 134, initWarningIndex
currently builds e.warningIndex but leaves duplicate entries in e.Warnings (when
e.Warnings was pre-populated), so future AddWarning increments only the first
duplicate and totals are inconsistent; fix initWarningIndex to fold duplicates
by scanning e.Warnings, aggregating Counts per unique Warning.Error (treat
Count==0 as 1), produce a new deduplicated slice of warnings with summed counts,
assign that back to e.Warnings, then rebuild e.warningIndex mapping each unique
Error to its index; update references to ExtractedConfig.warningIndex and
methods like AddWarning to rely on the deduplicated list.
db/update.go-792-814 (1)

792-814: ⚠️ Potential issue | 🟡 Minor

AccessLogs.LastCreatedAt is advanced for rows that never got saved, but appears unused.

This loop walks extractResult.configAccessLogs — the input set — and picks the max CreatedAt per config type before SaveConfigAccessLogs runs at line 835. Any row that later hits an unresolved config, an unresolved user, or the FK-violation fallback path inside SaveConfigAccessLogs (lines 1510-1523) still contributes to this watermark. Combined with the retention in lines 1047-1049 (summary.AccessLogs.LastCreatedAt is propagated to the next scrape), the watermark becomes semantically incorrect.

However, extensive code search found no scraper definitions, templates, or code paths that actually use last_scrape_summary.AccessLogs.LastCreatedAt as a cursor for incremental fetching. The watermark appears to be unused in practice, so while the logic is wrong, data loss is not observed. Nevertheless, compute LastCreatedAt from rows that actually made it into SaveConfigAccessLogs for semantic correctness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/update.go` around lines 792 - 814, The loop computing accessLogMaxTime
wrongly uses extractResult.configAccessLogs (input rows) so rows that never get
persisted still advance summary.ConfigTypes[...].AccessLogs.LastCreatedAt and
summary.AccessLogs.LastCreatedAt; move this watermark computation to use the
actual persisted rows (the output of SaveConfigAccessLogs) instead of
extractResult.configAccessLogs — i.e., after calling SaveConfigAccessLogs,
iterate the slice/rows that were saved (or the function return value) and
compute per-config max CreatedAt, then update
summary.ConfigTypes[configType].AccessLogs.LastCreatedAt and
summary.AccessLogs.LastCreatedAt accordingly (referencing
extractResult.configAccessLogs, SaveConfigAccessLogs, summary.ConfigTypes, and
summary.AccessLogs.LastCreatedAt to locate the affected code).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4e8c685a-4e9c-4f6f-93aa-19a5c61416cc

📥 Commits

Reviewing files that changed from the base of the PR and between ccc955a and 6a3ba3f.

⛔ Files ignored due to path filters (2)
  • cmd/scrapeui/frontend/package-lock.json is excluded by !**/package-lock.json
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (92)
  • .gitignore
  • Makefile
  • PROPERTIES.md
  • api/context.go
  • api/context_test.go
  • api/v1/azure.go
  • api/v1/interface.go
  • api/v1/interface_test.go
  • api/v1/scrape_snapshot.go
  • api/v1/scrape_snapshot_pretty.go
  • api/v1/scrape_snapshot_test.go
  • api/v1/summary_pretty.go
  • api/v1/summary_pretty_test.go
  • cmd/root.go
  • cmd/run.go
  • cmd/scrapeui/assets.go
  • cmd/scrapeui/convert.go
  • cmd/scrapeui/frontend/.gitignore
  • cmd/scrapeui/frontend/package.json
  • cmd/scrapeui/frontend/src/App.tsx
  • cmd/scrapeui/frontend/src/components/AccessLogTable.tsx
  • cmd/scrapeui/frontend/src/components/AccessTable.tsx
  • cmd/scrapeui/frontend/src/components/AliasList.tsx
  • cmd/scrapeui/frontend/src/components/AnsiHtml.tsx
  • cmd/scrapeui/frontend/src/components/ConfigNode.tsx
  • cmd/scrapeui/frontend/src/components/ConfigTree.tsx
  • cmd/scrapeui/frontend/src/components/DetailPanel.tsx
  • cmd/scrapeui/frontend/src/components/EntityTable.tsx
  • cmd/scrapeui/frontend/src/components/FilterBar.tsx
  • cmd/scrapeui/frontend/src/components/HARPanel.tsx
  • cmd/scrapeui/frontend/src/components/JsonView.tsx
  • cmd/scrapeui/frontend/src/components/ScrapeConfigPanel.tsx
  • cmd/scrapeui/frontend/src/components/ScraperList.tsx
  • cmd/scrapeui/frontend/src/components/SnapshotPanel.tsx
  • cmd/scrapeui/frontend/src/components/SplitPane.tsx
  • cmd/scrapeui/frontend/src/components/Summary.tsx
  • cmd/scrapeui/frontend/src/globals.d.ts
  • cmd/scrapeui/frontend/src/hooks/useRoute.ts
  • cmd/scrapeui/frontend/src/hooks/useSort.tsx
  • cmd/scrapeui/frontend/src/index.tsx
  • cmd/scrapeui/frontend/src/types.ts
  • cmd/scrapeui/frontend/src/utils.ts
  • cmd/scrapeui/frontend/tsconfig.json
  • cmd/scrapeui/frontend/vite.config.ts
  • cmd/scrapeui/server.go
  • cmd/scrapeui/types.go
  • cmd/ui.go
  • config/schemas/config_azuredevops.schema.json
  • config/schemas/config_playwright.schema.json
  • config/schemas/scrape_config.schema.json
  • db/change_traversal.go
  • db/external_cache.go
  • db/external_entities.go
  • db/external_entities_test.go
  • db/external_loser_alias_test.go
  • db/permission_changes.go
  • db/scrape_snapshot.go
  • db/scrape_snapshot_test.go
  • db/update.go
  • go.mod
  • scrapers/azure/active_directory.go
  • scrapers/azure/azure.go
  • scrapers/azure/devops/audit_log.go
  • scrapers/azure/devops/audit_log_test.go
  • scrapers/azure/devops/client.go
  • scrapers/azure/devops/groups.go
  • scrapers/azure/devops/groups_test.go
  • scrapers/azure/devops/pipelines.go
  • scrapers/azure/devops/releases.go
  • scrapers/azure/devops/repositories.go
  • scrapers/azure/devops/repositories_test.go
  • scrapers/changes/rules.go
  • scrapers/changes/rules_test.go
  • scrapers/config_access_test.go
  • scrapers/cron.go
  • scrapers/extract/extract.go
  • scrapers/extract/extract_test.go
  • scrapers/extract/fullmode.go
  • scrapers/extract/testdata/unit/access_invalid_ref.yaml
  • scrapers/extract/testdata/unit/access_logs_basic.yaml
  • scrapers/extract/testdata/unit/analysis_invalid_ref.yaml
  • scrapers/extract/testdata/unit/analysis_no_parent.yaml
  • scrapers/extract/testdata/unit/changes_invalid_ref.yaml
  • scrapers/extract/testdata/unit/changes_missing_change_id.yaml
  • scrapers/extract/testdata/unit/changes_missing_change_type.yaml
  • scrapers/extract/testdata/unit/changes_no_parent.yaml
  • scrapers/http/http.go
  • scrapers/processors/json.go
  • scrapers/processors/script.go
  • scrapers/processors/script_test.go
  • scrapers/run.go
  • scrapers/slack/slack.go

Comment thread .gitignore
*.har
*.png
.playwright-mcp/
*.json
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

*.json is too broad for a repo-wide ignore

Ignoring all JSON files can silently block committing legitimate source/config artifacts and future tooling manifests. Please scope this to generated paths/files only.

Proposed scoped change
-*.json
+# Generated JSON artifacts only (adjust paths as needed)
+gh-api-data/*.json
+.playwright-mcp/**/*.json
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
*.json
# Generated JSON artifacts only (adjust paths as needed)
gh-api-data/*.json
.playwright-mcp/**/*.json
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore at line 61, Remove the repo-wide "*.json" ignore entry and
replace it with scoped ignore patterns that target only generated or tooling
artifacts (e.g., CI/build output, generated schemas) rather than all JSON files;
update the .gitignore by deleting the "*.json" line and adding specific paths or
filenames such as the build/output directories or known generated files (for
example: generated/, dist/, build/, .cache/, or specific file names like
swagger.json or package-lock.json) so source/config JSON files remain tracked.

Comment thread cmd/run.go
Comment on lines +199 to +212
if save && dutyapi.DefaultConfig.ConnectionString != "" && summary != nil {
history := models.NewJobHistory(logger.StandardLogger(), "scraper", job.ResourceTypeScraper, scraperUID)
history.Start()
history.SuccessCount = len(results)
history.AddDetails("scrape_summary", *summary)
if hasErrors {
history.AddError(err.Error())
}
history.End()
if persistErr := history.Persist(dutyCtx.DB()); persistErr != nil {
logger.Warnf("failed to persist job history: %v", persistErr)
}
scrapers.ScraperSummaryCache.Store(scraperUID, *summary)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Nil-pointer panic: hasErrors is cross-iteration but err is per-iteration.

hasErrors is set once to true and never reset, so after any earlier scraper fails, every subsequent iteration that succeeds (err == nil, summary != nil) still enters the if hasErrors branch and dereferences err.Error() → panic.

Gate on the current iteration's error instead.

🐛 Proposed fix
 			if save && dutyapi.DefaultConfig.ConnectionString != "" && summary != nil {
 				history := models.NewJobHistory(logger.StandardLogger(), "scraper", job.ResourceTypeScraper, scraperUID)
 				history.Start()
 				history.SuccessCount = len(results)
 				history.AddDetails("scrape_summary", *summary)
-				if hasErrors {
+				if err != nil {
 					history.AddError(err.Error())
 				}
 				history.End()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if save && dutyapi.DefaultConfig.ConnectionString != "" && summary != nil {
history := models.NewJobHistory(logger.StandardLogger(), "scraper", job.ResourceTypeScraper, scraperUID)
history.Start()
history.SuccessCount = len(results)
history.AddDetails("scrape_summary", *summary)
if hasErrors {
history.AddError(err.Error())
}
history.End()
if persistErr := history.Persist(dutyCtx.DB()); persistErr != nil {
logger.Warnf("failed to persist job history: %v", persistErr)
}
scrapers.ScraperSummaryCache.Store(scraperUID, *summary)
}
if save && dutyapi.DefaultConfig.ConnectionString != "" && summary != nil {
history := models.NewJobHistory(logger.StandardLogger(), "scraper", job.ResourceTypeScraper, scraperUID)
history.Start()
history.SuccessCount = len(results)
history.AddDetails("scrape_summary", *summary)
if err != nil {
history.AddError(err.Error())
}
history.End()
if persistErr := history.Persist(dutyCtx.DB()); persistErr != nil {
logger.Warnf("failed to persist job history: %v", persistErr)
}
scrapers.ScraperSummaryCache.Store(scraperUID, *summary)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/run.go` around lines 199 - 212, The code uses a cross-iteration boolean
hasErrors but calls err.Error() when persisting history, causing a nil-pointer
panic when hasErrors is true but err is nil; update the block that builds
JobHistory (models.NewJobHistory / history.AddError) to check the
iteration-local error variable (err) instead of hasErrors (i.e., only call
history.AddError(err.Error()) when err != nil), ensuring summary and err are
validated before dereferencing and preserving existing history.End()/Persist
logic and ScraperSummaryCache.Store(scraperUID, *summary).

Comment thread cmd/scrapeui/assets.go
Comment on lines +5 to +6
//go:embed frontend/dist/scrapeui.js
var bundleJS string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
fd -HI scrapeui.js cmd/scrapeui/frontend/dist 2>/dev/null
cat .gitignore 2>/dev/null | rg -n 'dist|scrapeui' || true
fd -HI Makefile | head -5 | xargs -I{} rg -n 'scrapeui|vite' {} 2>/dev/null

Repository: flanksource/config-db

Length of output: 49


🏁 Script executed:

git ls-files cmd/scrapeui/ 2>/dev/null | head -20

Repository: flanksource/config-db

Length of output: 991


🏁 Script executed:

cat .gitignore 2>/dev/null

Repository: flanksource/config-db

Length of output: 584


🏁 Script executed:

find . -name "assets.go" -type f 2>/dev/null

Repository: flanksource/config-db

Length of output: 89


🏁 Script executed:

cat -n cmd/scrapeui/assets.go

Repository: flanksource/config-db

Length of output: 199


🏁 Script executed:

cat -n cmd/scrapeui/frontend/.gitignore

Repository: flanksource/config-db

Length of output: 98


🏁 Script executed:

find . -name "Makefile" -o -name "*.sh" -o -name "package.json" | grep -E "(Makefile|build|package.json)" | head -10

Repository: flanksource/config-db

Length of output: 137


🏁 Script executed:

cat -n Makefile | head -50

Repository: flanksource/config-db

Length of output: 2066


🏁 Script executed:

cat -n cmd/scrapeui/frontend/package.json

Repository: flanksource/config-db

Length of output: 500


🏁 Script executed:

ls -la cmd/scrapeui/frontend/dist/ 2>/dev/null || echo "dist directory does not exist"

Repository: flanksource/config-db

Length of output: 94


🏁 Script executed:

grep -r "scrapeui.js" . --include="*.go" --include="*.ts" --include="*.tsx" 2>/dev/null | head -10

Repository: flanksource/config-db

Length of output: 202


🏁 Script executed:

grep -r "vite\|scrapeui" Makefile --include="*.md" --include="*.txt" 2>/dev/null | head -20

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

find . -path ./vendor -prune -o -name "*.yml" -o -name "*.yaml" | grep -i -E "(ci|action|workflow)" | head -10

Repository: flanksource/config-db

Length of output: 366


🏁 Script executed:

cat -n Makefile | tail -100 | head -50

Repository: flanksource/config-db

Length of output: 2103


🏁 Script executed:

cat -n .github/workflows/build.yml

Repository: flanksource/config-db

Length of output: 1344


🏁 Script executed:

cat -n cmd/scrapeui/frontend/vite.config.ts

Repository: flanksource/config-db

Length of output: 7144


🏁 Script executed:

cat -n build/Dockerfile

Repository: flanksource/config-db

Length of output: 1632


🏁 Script executed:

grep -A 10 -B 5 "frontend\|vite\|npm\|yarn" build/Dockerfile

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

grep -n "build-prod" Makefile -A 20

Repository: flanksource/config-db

Length of output: 724


🏁 Script executed:

git ls-files cmd/scrapeui/frontend/dist/ 2>/dev/null || echo "dist/ not in git"

Repository: flanksource/config-db

Length of output: 47


Frontend build missing: go build will fail due to missing frontend/dist/scrapeui.js.

The //go:embed frontend/dist/scrapeui.js directive requires the embedded file to exist at compile time. However:

  • cmd/scrapeui/frontend/dist/ is gitignored and does not exist in the repository
  • The build process (Makefile's build-prod target and Dockerfile) does not run npm build before go build

This causes build failures in CI and for developers. Add a frontend build step to the Makefile before the Go build (e.g., cd cmd/scrapeui/frontend && npm install && npm run build), or include it in the Dockerfile before line 26.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/scrapeui/assets.go` around lines 5 - 6, The embed directive //go:embed
frontend/dist/scrapeui.js (variable bundleJS) fails the Go build because
frontend/dist/scrapeui.js is not present; add a frontend build step that runs in
cmd/scrapeui/frontend (npm install && npm run build) before any go build that
compiles the package referencing bundleJS—either add that step to the Makefile's
build-prod target before invoking go build or add the npm install && npm run
build step in the Dockerfile prior to the Go build stage so the embedded file
exists at compile time.

Comment thread cmd/scrapeui/server.go
Comment on lines +255 to +279
func (s *Server) snapshot() Snapshot {
logs := ""
if s.logBuf != nil {
logs = s.logBuf.String()
}
return Snapshot{
Scrapers: s.scrapers,
Results: s.results,
Relationships: s.relationships,
ConfigMeta: s.configMeta,
Issues: s.issues,
Counts: BuildCounts(s.results, s.relationships),
SaveSummary: s.summary,
Snapshots: s.snapshots,
ScrapeSpec: s.scrapeSpec,
Properties: s.properties,
LogLevel: s.logLevel,
HAR: s.har,
Logs: logs,
Done: s.done,
StartedAt: s.startedAt,
BuildInfo: s.buildInfo,
LastScrapeSummary: s.lastScrapeSummary,
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Data race: snapshot() returns shared map/slice refs; encoding happens after releasing the lock.

snapshot() copies the struct but the map/slice fields (s.configMeta, s.snapshots, s.results, s.relationships, s.issues, s.scrapers) still point at the server's live containers. In handleJSON the RLock is released (line 403) before json.NewEncoder(...).Encode(data) runs, and handleSSE does the same at lines 423/443. Meanwhile UpdateScraper, SetSnapshots, SetProperties, SetBuildInfo, SetLastScrapeSummary, and SetHAR take the write lock and mutate exactly those maps/slices. The encoder's map iteration concurrent with those writes will crash the process with fatal error: concurrent map iteration and map write (this is not a benign torn read — Go panics).

Either hold the lock across Encode, or deep-copy the snapshot before unlocking.

🛡️ Minimal fix: hold the lock for the full encode
 func (s *Server) handleJSON(w http.ResponseWriter, _ *http.Request) {
-	s.mu.RLock()
-	data := s.snapshot()
-	s.mu.RUnlock()
-
-	w.Header().Set("Content-Type", "application/json")
-	json.NewEncoder(w).Encode(data) //nolint:errcheck
+	w.Header().Set("Content-Type", "application/json")
+	s.mu.RLock()
+	defer s.mu.RUnlock()
+	data := s.snapshot()
+	json.NewEncoder(w).Encode(data) //nolint:errcheck
 }

Same treatment needed in handleSSE (both the initial marshal around lines 420-426 and the per-tick marshal around lines 439-445). Note that encoding while holding the write-blocking lock does penalize writers; if that matters, instead deep-copy the maps/slices inside the RLock and encode the copy.

Also applies to: 400-407, 420-452

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/scrapeui/server.go` around lines 255 - 279, snapshot() returns a Snapshot
struct that still references server-owned maps/slices (s.configMeta,
s.snapshots, s.results, s.relationships, s.issues, s.scrapers) and
handleJSON/handleSSE release the RLock before running json.Encode, allowing
concurrent writers (UpdateScraper, SetSnapshots, SetProperties, SetBuildInfo,
SetLastScrapeSummary, SetHAR) to mutate those maps and trigger panics; fix by
either holding the server mutex (RLock/RUnlock) across the entire json.Encode
call in handleJSON and both encoding sites in handleSSE so encoding happens
while the snapshot's underlying containers are protected, or instead deep-copy
all map/slice fields inside snapshot() while still holding the RLock and return
those copies so encoding can safely occur after unlocking.

Comment thread db/permission_changes.go
Comment on lines +163 to 211
tx.Exec("SAVEPOINT bulk_insert")

if err := tx.Raw(newSQL).Scan(&newRows).Error; err != nil {
tx.Rollback()
return result, fmt.Errorf("failed to upsert config access: %w", err)
if !dutydb.IsForeignKeyError(err) {
tx.Rollback()
return result, fmt.Errorf("failed to upsert config access: %w", err)
}

// Restore transaction state; temp table is preserved.
tx.Exec("ROLLBACK TO SAVEPOINT bulk_insert")

// Row-by-row with exception handling: successfully inserted rows
// are deleted from the temp table; FK violations stay.
fallbackSQL := fmt.Sprintf(`
DO $$
DECLARE
v_rec RECORD;
BEGIN
FOR v_rec IN SELECT * FROM %s LOOP
BEGIN
INSERT INTO config_access (id, config_id, external_user_id, external_role_id,
external_group_id, scraper_id, application_id, source, created_at)
VALUES (v_rec.id, v_rec.config_id, v_rec.external_user_id, v_rec.external_role_id,
v_rec.external_group_id, v_rec.scraper_id, v_rec.application_id, v_rec.source, v_rec.created_at)
ON CONFLICT (id) DO UPDATE SET deleted_at = NULL
WHERE config_access.deleted_at IS NOT NULL;
DELETE FROM %s WHERE id = v_rec.id;
EXCEPTION WHEN foreign_key_violation THEN
NULL;
END;
END LOOP;
END $$;
`, tempTable, tempTable)

if err := tx.Exec(fallbackSQL).Error; err != nil {
tx.Rollback()
return result, fmt.Errorf("failed to fallback upsert config access: %w", err)
}

var fkErrorCount int64
tx.Raw(fmt.Sprintf("SELECT count(*) FROM %s", tempTable)).Scan(&fkErrorCount)
result.foreignKeyErrors = int(fkErrorCount)
result.saved -= result.foreignKeyErrors

if fkErrorCount > 0 {
ctx.Logger.Warnf("config_access: %d rows with FK violations (scraper=%s)", fkErrorCount, scraperIDStr)
logFKDiagnostics(ctx, tx, tempTable)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

FK fallback path silently drops PermissionAdded change records.

When the bulk INSERT fails with an FK error, we roll back to the savepoint and re-insert row-by-row inside a PL/pgSQL DO block. The DO block cannot RETURNING, so newRows stays empty for this path, and the loop at line 213 produces no result.added entries — yet these rows are in fact being inserted and counted in result.saved. Downstream, this means permission-added change history is lost for any scrape that triggers the fallback (which is exactly the case the change-tracking was designed for).

A few options, in order of simplicity:

  1. Capture inserted IDs into a side temp table inside the DO block and SELECT ... FROM ... LEFT JOIN config_access after, so you can both detect new-vs-restored and fetch the FK columns needed for buildPermissionSummary.
  2. After the DO block, query config_access filtered by the IDs still present in tempTable subtracted from the original set — any id that's no longer in the temp table (and wasn't already in config_access pre-run) is a new insert.
  3. Drop the DO block and do row-by-row in Go with savepoints; slower but retains RETURNING.

Leaving it as-is without at least a comment noting the semantic gap is likely to cause confusion later.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/permission_changes.go` around lines 163 - 211, The FK-fallback PL/pgSQL DO
block (used after tx.Exec("ROLLBACK TO SAVEPOINT bulk_insert") that loops over
tempTable) performs row-by-row INSERTs but cannot RETURNING, so newRows remains
empty and result.added (and subsequent buildPermissionSummary) loses actual
PermissionAdded change records; fix by capturing inserted IDs into a side temp
table inside the DO block (or alternatively query config_access for IDs removed
from tempTable after the DO block) and then populate newRows/result.added from
that captured set before continuing; update references around tempTable,
newRows, result.saved/result.added, and the fallbackSQL/DO block handling so
change history is reconstructed when falling back.

Comment thread Makefile
Comment on lines +105 to +108
.PHONY: bench
bench:
go test -bench=. -benchmem -run=^$$ ./bench $(BENCH_ARGS)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Duplicate bench target — the new recipe is silently overridden.

bench is defined twice (lines 105–108 and 299–304). make emits "overriding recipe for target 'bench'" (see pipeline logs) and the earlier, $(BENCH_ARGS)-aware recipe is discarded in favor of the later one. Either remove one, or merge them into a single target (e.g., parameterize the old one to honor BENCH_ARGS and -bench selection via variables).

🛠️ Suggested consolidation
-.PHONY: bench
-bench:
-	go test -bench=. -benchmem -run=^$$ ./bench $(BENCH_ARGS)
-
-
-
 .PHONY: gotest-prod
 gotest-prod:
 	$(validate-envtest-assets) \
 	KUBEBUILDER_ASSETS="$$ASSETS" go test -tags rustdiffgen -skip ^TestE2E$$ ./... -coverprofile cover.out
@@
-.PHONY: bench
-bench:
-	go test ./... -bench='^Benchmark(EventProcessing|CacheMemory|Deserialization)' \
-		-benchmem -run='^$$' \
-		-count=3 \
-		-benchtime=2s -v
+.PHONY: bench
+bench:
+	go test -bench=. -benchmem -run='^$$' ./bench $(BENCH_ARGS)
+
+.PHONY: bench-all
+bench-all:
+	go test ./... -bench='^Benchmark(EventProcessing|CacheMemory|Deserialization)' \
+		-benchmem -run='^$$' \
+		-count=3 \
+		-benchtime=2s -v

Also applies to: 299-304

🧰 Tools
🪛 GitHub Actions: Lint

[warning] 301-107: Makefile warning: overriding recipe for target 'bench' / ignoring old recipe for target 'bench'

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 105 - 108, The Makefile currently defines the bench
target twice so the earlier recipe that uses BENCH_ARGS is being overridden;
consolidate into a single bench target by removing the duplicate recipe (or
merging them) and ensure the surviving bench target preserves BENCH_ARGS and the
-bench selection (e.g., use a variable like BENCH_ARGS and pass
-bench=$(BENCH_SELECTION) or similar) and keep .PHONY declaration for bench;
update any callers/README if they relied on the discarded behavior.

Comment thread scrapers/azure/azure.go
Comment on lines 133 to 141
if t.ConnectionName != "" {
connection, err := ctx.HydrateConnectionByURL(t.ConnectionName)
if err != nil {
return t, fmt.Errorf("could not hydrate connection: %w", err)
} else if connection == nil {
return t, fmt.Errorf("connection %s not found", t.ConnectionName)
if err := azureConn.HydrateConnection(ctx); err != nil {
return t, azureConn, fmt.Errorf("could not hydrate connection: %w", err)
}

t.ClientID.ValueStatic = connection.Username
t.ClientSecret.ValueStatic = connection.Password
t.TenantID = connection.Properties["tenant"]
return t, nil
t.ClientID = *azureConn.ClientID
t.ClientSecret = *azureConn.ClientSecret
t.TenantID = azureConn.TenantID
return t, azureConn, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check whether duty/connection.AzureConnection.HydrateConnection can leave ClientID/ClientSecret nil (bearer-token, workload identity, etc.)
fd -t f -e go . | xargs rg -nP '(?s)func\s+\(\w*\s*\*?AzureConnection\)\s+HydrateConnection' -A 80
rg -nP 'TokenCredential\s*\(' --type=go -C2

Repository: flanksource/config-db

Length of output: 410


🏁 Script executed:

#!/bin/bash
# Find AzureConnection definition and HydrateConnection method implementation
fd -t f -e go . | xargs rg -l 'HydrateConnection' | head -20

Repository: flanksource/config-db

Length of output: 228


🏁 Script executed:

#!/bin/bash
# Search for HydrateConnection method definition with more context
rg -B5 -A50 'func.*HydrateConnection' --type go | head -150

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Look for AzureConnection struct definition
rg -B2 -A20 'type AzureConnection struct' --type go

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Read the azure.go file to understand context around lines 133-141
cat -n scrapers/azure/azure.go | sed -n '100,170p'

Repository: flanksource/config-db

Length of output: 2888


🏁 Script executed:

#!/bin/bash
# Look for where azureConn is created/initialized
rg -B10 'azureConn.*:=' scrapers/azure/azure.go | head -80

Repository: flanksource/config-db

Length of output: 954


🏁 Script executed:

#!/bin/bash
# Check imports and AzureConnection type definition in azure.go
head -50 scrapers/azure/azure.go

Repository: flanksource/config-db

Length of output: 1976


🏁 Script executed:

#!/bin/bash
# Search for any usage or error handling patterns around HydrateConnection
rg -B5 -A10 'HydrateConnection' scrapers/ --type go | head -80

Repository: flanksource/config-db

Length of output: 4421


🏁 Script executed:

#!/bin/bash
# Check if there are any null checks or guards around ClientID/ClientSecret after HydrateConnection elsewhere
rg 'ClientID.*!=' scrapers/azure/ --type go

Repository: flanksource/config-db

Length of output: 47


Unconditional nil dereference after HydrateConnection for bearer-token authentication paths.

The hydrateConnection doc comment states that ConnectionName "handles service-principal and bearer-token fallbacks," but lines 137–138 unconditionally dereference *azureConn.ClientID and *azureConn.ClientSecret. Bearer-token, workload-identity, or federated-credential connections will not populate these fields, and HydrateConnection may leave them nil, causing a panic.

Add defensive nil checks before dereferencing, matching the pattern used in the GCP scraper:

Defensive dereference
 	if t.ConnectionName != "" {
 		if err := azureConn.HydrateConnection(ctx); err != nil {
 			return t, azureConn, fmt.Errorf("could not hydrate connection: %w", err)
 		}
-		t.ClientID = *azureConn.ClientID
-		t.ClientSecret = *azureConn.ClientSecret
+		if azureConn.ClientID != nil {
+			t.ClientID = *azureConn.ClientID
+		}
+		if azureConn.ClientSecret != nil {
+			t.ClientSecret = *azureConn.ClientSecret
+		}
 		t.TenantID = azureConn.TenantID
 		return t, azureConn, nil
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if t.ConnectionName != "" {
connection, err := ctx.HydrateConnectionByURL(t.ConnectionName)
if err != nil {
return t, fmt.Errorf("could not hydrate connection: %w", err)
} else if connection == nil {
return t, fmt.Errorf("connection %s not found", t.ConnectionName)
if err := azureConn.HydrateConnection(ctx); err != nil {
return t, azureConn, fmt.Errorf("could not hydrate connection: %w", err)
}
t.ClientID.ValueStatic = connection.Username
t.ClientSecret.ValueStatic = connection.Password
t.TenantID = connection.Properties["tenant"]
return t, nil
t.ClientID = *azureConn.ClientID
t.ClientSecret = *azureConn.ClientSecret
t.TenantID = azureConn.TenantID
return t, azureConn, nil
}
if t.ConnectionName != "" {
if err := azureConn.HydrateConnection(ctx); err != nil {
return t, azureConn, fmt.Errorf("could not hydrate connection: %w", err)
}
if azureConn.ClientID != nil {
t.ClientID = *azureConn.ClientID
}
if azureConn.ClientSecret != nil {
t.ClientSecret = *azureConn.ClientSecret
}
t.TenantID = azureConn.TenantID
return t, azureConn, nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/azure/azure.go` around lines 133 - 141, The code unconditionally
dereferences azureConn.ClientID and azureConn.ClientSecret after calling
HydrateConnection (in the ConnectionName branch), which can be nil for
bearer-token/workload-identity flows; update the block that follows
HydrateConnection to defensively check azureConn.ClientID and
azureConn.ClientSecret for nil before assigning to t.ClientID and t.ClientSecret
(only set t.ClientID/t.ClientSecret when the pointers are non-nil), and preserve
the existing assignment for azureConn.TenantID (or similarly check if needed) so
HydrateConnection and the ConnectionName path cannot panic when
ClientID/ClientSecret are absent.

Comment on lines +1011 to +1025
const (
ReleasePermissionViewReleaseDefinition = 1
ReleasePermissionEditReleaseDefinition = 2
ReleasePermissionDeleteReleaseDefinition = 4
ReleasePermissionManageDeployments = 8
ReleasePermissionManageReleaseApprovers = 16
ReleasePermissionManageReleases = 32
ReleasePermissionViewReleases = 64
ReleasePermissionCreateReleases = 128
ReleasePermissionEditReleaseEnvironment = 256
ReleasePermissionDeleteReleaseEnvironment = 512
ReleasePermissionAdministerReleasePermissions = 1024
ReleasePermissionDeleteReleases = 2048
ReleasePermissionManageDefinitionReleaseApprovers = 4096
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetches the authoritative bit->name mapping for the ReleaseManagement namespace
# from any public Azure DevOps org. Replace ORG if needed; any valid org works as
# namespaces are organization-agnostic.
ORG="${ADO_ORG:-microsoft}"
curl -s "https://dev.azure.com/${ORG}/_apis/securitynamespaces/c788c23e-1b46-4162-8f5e-d7585343b5de/actionids?api-version=7.1-preview.1" \
  | jq '.value | sort_by(.bit) | .[] | {bit, name}'

Repository: flanksource/config-db

Length of output: 122


🏁 Script executed:

#!/bin/bash
# First, let's check the curl response directly to see what went wrong
ORG="microsoft"
curl -s -w "\nHTTP Status: %{http_code}\n" "https://dev.azure.com/${ORG}/_apis/securitynamespaces/c788c23e-1b46-4162-8f5e-d7585343b5de/actionids?api-version=7.1-preview.1" | head -100

Repository: flanksource/config-db

Length of output: 2036


🏁 Script executed:

#!/bin/bash
# Search the codebase for comments or tests related to these permissions
rg "ReleasePermission|ManageDeployments" -A 2 -B 2 --type go

Repository: flanksource/config-db

Length of output: 8226


🌐 Web query:

Microsoft Azure DevOps ReleaseManagement namespace permissions actionids bit values

💡 Result:

Microsoft Azure DevOps has two primary ReleaseManagement security namespaces with associated permissions using bitmask values (powers of 2). Namespace ID: c788c23e-1b46-4162-8f5e-d7585343b5de (Release definition permissions) - bit 1 (1): ViewReleaseDefinition - bit 2 (2): EditReleaseDefinition - bit 4 (4): DeleteReleaseDefinition - bit 8 (8): ManageReleaseApprovers - Higher bits include: ManageReleases (likely 16), ViewReleases, CreateReleases, EditReleaseEnvironment, DeleteReleaseEnvironment, AdministerReleasePermissions, DeleteReleases, ManageDeployments, ManageReleaseSettings, ManageTaskHubExtension (exact bits follow standard sequential power-of-2 assignment: 16,32,64,128,256,512,1024,2048,4096) Namespace ID: 7c7d32f7-0e86-4cd6-892e-b35dbba870bd (Release Management UI elements) - bit 1 (1): ViewTaskEditor - bit 2 (2): ViewCDWorkflowEditor (or similar, partial in sources) - bit 4 (4): ExportReleaseDefinition (inferred sequential) - bit 8 (8): ViewLegacyUI (inferred) - bit 32 (32): DeploymentSummaryAcrossProjects (example higher bit) Permissions are bitmasks combined for allow/deny in ACLs. Exact complete mappings are not published in static docs; they are product-defined constants retrieved dynamically via: - REST: GET https://dev.azure.com/{org}/_apis/securitynamespaces/{namespaceId}?api-version=7.1-preview.1 (returns actions[] with bit/name) - CLI: az devops security permission namespace show --id {namespaceId} Bits are stable across orgs but new actions may be added over time. Use runtime query for scripting/completeness. Sources confirm standard bitmask pattern (1=2^0, 2=2^1, 4=2^2, etc.).

Citations:


Fix release permission bit values to match Azure DevOps ReleaseManagement namespace ordering.

The code's bit→name mapping is misaligned with the ReleaseManagement namespace. Per Microsoft's documented permission order, the bits should be: ManageReleaseApprovers=8, ManageReleases=16, ViewReleases=32, CreateReleases=64, EditReleaseEnvironment=128, DeleteReleaseEnvironment=256, AdministerReleasePermissions=512, DeleteReleases=1024, ManageDeployments=2048.

Currently, the code assigns ManageDeployments=8, ManageReleaseApprovers=16, ManageReleases=32, ViewReleases=64, ..., shifting every permission from bit 8 onward. This silently mis-labels release permissions throughout ParseReleasePermissions and ResolveRoles—for example, a user with effective CreateReleases (bit 64) is reported as ViewReleases, mapping to Viewer instead of Developer.

Current definitions (incorrect)
const (
	ReleasePermissionViewReleaseDefinition              = 1
	ReleasePermissionEditReleaseDefinition              = 2
	ReleasePermissionDeleteReleaseDefinition            = 4
	ReleasePermissionManageDeployments                  = 8
	ReleasePermissionManageReleaseApprovers             = 16
	ReleasePermissionManageReleases                     = 32
	ReleasePermissionViewReleases                       = 64
	ReleasePermissionCreateReleases                     = 128
	ReleasePermissionEditReleaseEnvironment             = 256
	ReleasePermissionDeleteReleaseEnvironment           = 512
	ReleasePermissionAdministerReleasePermissions       = 1024
	ReleasePermissionDeleteReleases                     = 2048
	ReleasePermissionManageDefinitionReleaseApprovers   = 4096
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/azure/devops/client.go` around lines 1011 - 1025, The release
permission constants are shifted and mis-map bits to names, causing
ParseReleasePermissions and ResolveRoles to report wrong roles; update the
constants (ReleasePermissionManageReleaseApprovers,
ReleasePermissionManageReleases, ReleasePermissionViewReleases,
ReleasePermissionCreateReleases, ReleasePermissionEditReleaseEnvironment,
ReleasePermissionDeleteReleaseEnvironment,
ReleasePermissionAdministerReleasePermissions, ReleasePermissionDeleteReleases,
ReleasePermissionManageDeployments) so their bit values follow the Azure
ReleaseManagement ordering (ManageReleaseApprovers=8, ManageReleases=16,
ViewReleases=32, CreateReleases=64, EditReleaseEnvironment=128,
DeleteReleaseEnvironment=256, AdministerReleasePermissions=512,
DeleteReleases=1024, ManageDeployments=2048) ensuring ParseReleasePermissions
and ResolveRoles read the correct bits.

Comment on lines +170 to +218
email := emailFromIdentity(identity)
if name == "" && email == "" {
continue
}

if identity.IsContainer {
aliases := append(DescriptorAliases(identity.Descriptor), identity.SubjectDescriptor)
aliases = append(aliases, DescriptorAliases(identity.SubjectDescriptor)...)
// No ID — the SQL merge resolves this group against the AAD scraper's
// authoritative record by alias overlap. AAD takes precedence.
ctx.AddGroup(dutyModels.ExternalGroup{
Name: name,
Aliases: pq.StringArray(aliases),
Tenant: config.Organization,
GroupType: "AzureDevOps",
})
} else {
ctx.AddUser(dutyModels.ExternalUser{
Name: name,
Email: &email,
Aliases: pq.StringArray{email, identity.Descriptor, identity.SubjectDescriptor},
Tenant: config.Organization,
UserType: "AzureDevOps",
})
}

resolvedRoles := ResolveRoles("Release", perm.Permissions, config.Permissions.Roles)
for _, roleName := range resolvedRoles {
if _, exists := roleIDs[roleName]; !exists {
roleID := azure.RoleID(ctx.ScraperID(), roleName)
roleIDs[roleName] = roleID
roles = append(roles, dutyModels.ExternalRole{
ID: roleID,
Name: roleName,
RoleType: "AzureDevOps",
Tenant: config.Organization,
})
}

roleID := roleIDs[roleName]
access := v1.ExternalConfigAccess{
ConfigExternalID: v1.ExternalID{ConfigType: ReleaseType, ExternalID: releaseExternalID},
ExternalRoleID: &roleID,
}
if identity.IsContainer {
access.ExternalGroupAliases = DescriptorAliases(identity.Descriptor)
} else {
access.ExternalUserAliases = []string{email}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Empty-string aliases when an identity has a name but no email.

The guard at line 171 is name == "" && email == "", so users with a display name but no resolvable email proceed. Then:

  • Line 190: Aliases: pq.StringArray{email, identity.Descriptor, identity.SubjectDescriptor}email is "", so "" is stored as an alias.
  • Line 217: access.ExternalUserAliases = []string{email} — the access record's only alias is "".

On the merge side, an access row whose sole alias is "" can overlap with any user whose aliases also happened to pick up an empty string, producing incorrect access linkage. Filter empties out before constructing both arrays.

🛡️ Proposed fix
 		} else {
-			ctx.AddUser(dutyModels.ExternalUser{
-				Name:     name,
-				Email:    &email,
-				Aliases:  pq.StringArray{email, identity.Descriptor, identity.SubjectDescriptor},
-				Tenant:   config.Organization,
-				UserType: "AzureDevOps",
-			})
+			userAliases := []string{}
+			for _, a := range []string{email, identity.Descriptor, identity.SubjectDescriptor} {
+				if a != "" {
+					userAliases = append(userAliases, a)
+				}
+			}
+			ctx.AddUser(dutyModels.ExternalUser{
+				Name:     name,
+				Email:    &email,
+				Aliases:  pq.StringArray(userAliases),
+				Tenant:   config.Organization,
+				UserType: "AzureDevOps",
+			})
 		}
@@
-			if identity.IsContainer {
-				access.ExternalGroupAliases = DescriptorAliases(identity.Descriptor)
-			} else {
-				access.ExternalUserAliases = []string{email}
-			}
+			if identity.IsContainer {
+				access.ExternalGroupAliases = DescriptorAliases(identity.Descriptor)
+			} else if email != "" {
+				access.ExternalUserAliases = []string{email}
+			} else {
+				access.ExternalUserAliases = []string{identity.Descriptor}
+			}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
email := emailFromIdentity(identity)
if name == "" && email == "" {
continue
}
if identity.IsContainer {
aliases := append(DescriptorAliases(identity.Descriptor), identity.SubjectDescriptor)
aliases = append(aliases, DescriptorAliases(identity.SubjectDescriptor)...)
// No ID — the SQL merge resolves this group against the AAD scraper's
// authoritative record by alias overlap. AAD takes precedence.
ctx.AddGroup(dutyModels.ExternalGroup{
Name: name,
Aliases: pq.StringArray(aliases),
Tenant: config.Organization,
GroupType: "AzureDevOps",
})
} else {
ctx.AddUser(dutyModels.ExternalUser{
Name: name,
Email: &email,
Aliases: pq.StringArray{email, identity.Descriptor, identity.SubjectDescriptor},
Tenant: config.Organization,
UserType: "AzureDevOps",
})
}
resolvedRoles := ResolveRoles("Release", perm.Permissions, config.Permissions.Roles)
for _, roleName := range resolvedRoles {
if _, exists := roleIDs[roleName]; !exists {
roleID := azure.RoleID(ctx.ScraperID(), roleName)
roleIDs[roleName] = roleID
roles = append(roles, dutyModels.ExternalRole{
ID: roleID,
Name: roleName,
RoleType: "AzureDevOps",
Tenant: config.Organization,
})
}
roleID := roleIDs[roleName]
access := v1.ExternalConfigAccess{
ConfigExternalID: v1.ExternalID{ConfigType: ReleaseType, ExternalID: releaseExternalID},
ExternalRoleID: &roleID,
}
if identity.IsContainer {
access.ExternalGroupAliases = DescriptorAliases(identity.Descriptor)
} else {
access.ExternalUserAliases = []string{email}
}
email := emailFromIdentity(identity)
if name == "" && email == "" {
continue
}
if identity.IsContainer {
aliases := append(DescriptorAliases(identity.Descriptor), identity.SubjectDescriptor)
aliases = append(aliases, DescriptorAliases(identity.SubjectDescriptor)...)
// No ID — the SQL merge resolves this group against the AAD scraper's
// authoritative record by alias overlap. AAD takes precedence.
ctx.AddGroup(dutyModels.ExternalGroup{
Name: name,
Aliases: pq.StringArray(aliases),
Tenant: config.Organization,
GroupType: "AzureDevOps",
})
} else {
userAliases := []string{}
for _, a := range []string{email, identity.Descriptor, identity.SubjectDescriptor} {
if a != "" {
userAliases = append(userAliases, a)
}
}
ctx.AddUser(dutyModels.ExternalUser{
Name: name,
Email: &email,
Aliases: pq.StringArray(userAliases),
Tenant: config.Organization,
UserType: "AzureDevOps",
})
}
resolvedRoles := ResolveRoles("Release", perm.Permissions, config.Permissions.Roles)
for _, roleName := range resolvedRoles {
if _, exists := roleIDs[roleName]; !exists {
roleID := azure.RoleID(ctx.ScraperID(), roleName)
roleIDs[roleName] = roleID
roles = append(roles, dutyModels.ExternalRole{
ID: roleID,
Name: roleName,
RoleType: "AzureDevOps",
Tenant: config.Organization,
})
}
roleID := roleIDs[roleName]
access := v1.ExternalConfigAccess{
ConfigExternalID: v1.ExternalID{ConfigType: ReleaseType, ExternalID: releaseExternalID},
ExternalRoleID: &roleID,
}
if identity.IsContainer {
access.ExternalGroupAliases = DescriptorAliases(identity.Descriptor)
} else if email != "" {
access.ExternalUserAliases = []string{email}
} else {
access.ExternalUserAliases = []string{identity.Descriptor}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/azure/devops/releases.go` around lines 170 - 218, The code is adding
empty-string aliases when identity has a name but no email; update the user
alias construction and the access alias assignment to filter out empty strings:
when building the pq.StringArray for dutyModels.ExternalUser in the branch that
calls ctx.AddUser (refer to emailFromIdentity, identity.Descriptor and
identity.SubjectDescriptor), omit email if it's empty so the Aliases array never
contains "". Likewise, before setting access.ExternalUserAliases =
[]string{email} (and generally before assigning any ExternalUserAliases),
construct the slice by including only non-empty alias strings; keep the same
ordering and keep DescriptorAliases handling unchanged for container cases.

Comment on lines +140 to 165
name := ResolvedIdentityName(identity, project.Name)
email := emailFromIdentity(identity)
if identity.ProviderDisplayName == "" && email == "" {
if name == "" && email == "" {
continue
}

if identity.IsContainer {
aliases := append(DescriptorAliases(identity.Descriptor), identity.SubjectDescriptor)
aliases = append(aliases, DescriptorAliases(identity.SubjectDescriptor)...)
// No ID — the SQL merge resolves this group against the AAD scraper's
// authoritative record by alias overlap. AAD takes precedence.
ctx.AddGroup(dutyModels.ExternalGroup{
Name: identity.ProviderDisplayName,
Aliases: pq.StringArray{identity.Descriptor, identity.SubjectDescriptor},
Name: name,
Aliases: pq.StringArray(aliases),
Tenant: config.Organization,
GroupType: "AzureDevOps",
})
} else {
ctx.AddUser(dutyModels.ExternalUser{
Name: identity.ProviderDisplayName,
Name: name,
Email: &email,
Aliases: pq.StringArray{email, identity.Descriptor, identity.SubjectDescriptor},
Tenant: config.Organization,
UserType: "AzureDevOps",
})
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Empty SubjectDescriptor / empty email leak as empty-string aliases.

Several alias-building sites here accept blanks unconditionally, which will store "" entries in the resulting pq.StringArray — risky given the comment on lines 149‑150 says the group is merged against AAD "by alias overlap". Two empty-alias arrays overlap trivially.

  • Line 147 appends identity.SubjectDescriptor without checking it's non-empty.
  • Line 148 calls DescriptorAliases(identity.SubjectDescriptor); when the input is empty the helper falls through to return []string{descriptor}, i.e. [""].
  • Line 161 puts email and identity.SubjectDescriptor into the user's Aliases array even when either is empty (the line 142 guard only requires one of name/email to be non-empty).
  • Line 189 similarly sets ExternalUserAliases = []string{email} which becomes [""] when email == "".

BuildIdentityMap in client.go guards this correctly (if id.SubjectDescriptor != ""); the same guard should be applied here.

🛡️ Proposed fix
 		if identity.IsContainer {
-			aliases := append(DescriptorAliases(identity.Descriptor), identity.SubjectDescriptor)
-			aliases = append(aliases, DescriptorAliases(identity.SubjectDescriptor)...)
+			aliases := DescriptorAliases(identity.Descriptor)
+			if identity.SubjectDescriptor != "" {
+				aliases = append(aliases, identity.SubjectDescriptor)
+				aliases = append(aliases, DescriptorAliases(identity.SubjectDescriptor)...)
+			}
 			// No ID — the SQL merge resolves this group against the AAD scraper's
 			// authoritative record by alias overlap. AAD takes precedence.
 			ctx.AddGroup(dutyModels.ExternalGroup{
 				Name:      name,
 				Aliases:   pq.StringArray(aliases),
 				Tenant:    config.Organization,
 				GroupType: "AzureDevOps",
 			})
 		} else {
+			userAliases := []string{identity.Descriptor}
+			if email != "" {
+				userAliases = append(userAliases, email)
+			}
+			if identity.SubjectDescriptor != "" {
+				userAliases = append(userAliases, identity.SubjectDescriptor)
+			}
 			ctx.AddUser(dutyModels.ExternalUser{
 				Name:     name,
 				Email:    &email,
-				Aliases:  pq.StringArray{email, identity.Descriptor, identity.SubjectDescriptor},
+				Aliases:  pq.StringArray(userAliases),
 				Tenant:   config.Organization,
 				UserType: "AzureDevOps",
 			})
 		}

Apply similar non-empty guard at line 189 (ExternalUserAliases).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/azure/devops/repositories.go` around lines 140 - 165, The current
alias construction in the DevOps identity handling (around ResolvedIdentityName,
emailFromIdentity, DescriptorAliases and the ctx.AddGroup / ctx.AddUser calls)
can push empty strings into pq.StringArray (e.g., empty SubjectDescriptor or
empty email), causing spurious alias overlap; update the logic to only append
non-empty alias values: (1) before appending identity.SubjectDescriptor or
calling DescriptorAliases(identity.SubjectDescriptor) ensure
identity.SubjectDescriptor != "" (or filter the returned slice to remove ""
entries), (2) only include email in the Aliases/ExternalUserAliases when email
!= "", and (3) when building aliases from DescriptorAliases(filter result) drop
any empty elements—mirror the guard used in BuildIdentityMap so ctx.AddGroup and
ctx.AddUser receive pq.StringArray with no empty strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants