Conversation
Introduces support for Playwright-based browser automation to enable web scraping, screenshot capture, and federation login workflows. Includes browser instance management via chromedp, AWS federated login support, and HAR recording integration.
…nd tracing options Refactors Playwright scraper config into separate types for login providers, audit log, and tracing. Adds support for OutputMode, headless mode, timeout, and browser login. Fixes artifact linking to use resolved change IDs from DB and improves deepcopy generation for nested types.
Adds dedicated CI job for end-to-end Playwright tests with bun and Chromium. Migrates browser tests from unit test framework (ginkgo) to e2e test suite, adding parseOutput validation tests and interactive browser scenarios.
The bun package manager may not be available in the PATH by default on CI runners. Explicitly add $HOME/.bun/bin to PATH before running playwright installation and tests to ensure the bundled bun binary can be found.
Use GITHUB_PATH to set bun binary location instead of exporting PATH in shell, which is more reliable across workflow steps. Removes redundant PATH exports and simplifies test command.
Adds a new `--ui` flag to the run command that launches an interactive browser dashboard. The dashboard provides real-time progress tracking with config browsing, logs, HTTP traces, and access/relationship visualization. Includes a complete Preact-based frontend with filtering, split-pane layout, and detailed config inspection. Also adds a new `ui` command to view saved JSON results in the dashboard.
… permission resolution Adds descriptor format conversion functions (DescriptorToSID, SIDToVssgp, NormalizeDescriptor) to handle multiple Azure DevOps identity formats. Introduces BuildIdentityMap for multi-form descriptor lookups and ResolvedIdentityName for better display names. Filters system descriptors (svc., s2s.) from resolution API calls and handles 404 responses gracefully. Improves Release audit log mapping to handle ReleaseDefinitionId in addition to PipelineId.
Implements frontend-friendly relationship representation using external IDs and resolved names. Adds global search across all tabs with hit counts, config metadata (parents, location), change status indicators (new/updated/deleted), and displays relationships in detail panel.
Add ForeignKeyErrors field to EntitySummary and implement row-by-row fallback for FK violations during bulk config_access inserts. This allows graceful handling of missing referenced entities while preserving valid rows and tracking constraint violations for diagnostics.
…ole resolution Refactors permission handling to support Build and Release namespace permissions alongside Git. Introduces ResolveRoles as a generalized permission-to-role mapper with DefaultRoles, replacing ResolveGitRoles's fallback behavior. Adds ParseBuildPermissions and ParseReleasePermissions to extract permission bits, GetReleasePermissions client method, and permission fetching for both pipelines and releases with role-based access tracking.
…tching Adds in-memory resolution of RelationshipSelectors in BuildUIRelationships and implements fallback matching logic for access control entities when aliases are unavailable. This enables correct relationship discovery from selector configurations and graceful handling of ID-only entity references.
Adds URL hash-based state persistence to enable bookmarkable and shareable links for specific configs, tabs, and search queries. Also enhances config relationship deduplication and extends Azure permissions documentation for resource-type-prefixed permissions.
…downloads
Adds an Issues tab to display orphaned changes and FK errors. Replaces hash-based routing with history API for cleaner URLs and deep-linking support (e.g., /configs/{id}). Includes per-config JSON downloads, membership resolution for user/group relationships, and SPA fallback routing. Consolidates external entity sources to prefer post-merge canonical IDs from SaveSummary.
BREAKING CHANGE: add scrape issues tab and client-side routing with config downloads
…ed deduplication Remove DescriptorID function and switch to alias-based identity reconciliation. Users and groups are now deduplicated locally by descriptor and reconciled via alias overlap in SQL merge, allowing the AAD scraper's authoritative records to take precedence.
Extends ChangeResult with a Resolved field to track post-transformation changes through the mapping pipeline, and makes EntitySummary generic to carry both counts and canonical post-merge entities. This enables UI and downstream consumers to read merged state from a single authoritative location instead of deduplicating across individual results.
Add before/after/diff snapshots of DB state to track what changed during a scrape. Snapshots capture per-scraper and per-type config item counts, external entities, and access records across multiple time windows (Last/Hour/Day/Week). Captured before and after scrape runs to show impact and persist even on partial failures. Wired into UI with new Snapshot tab and pretty-print summaries in CLI logs.
Add last_created_at and last_updated_at fields to EntityWindowCounts to surface when entities were most recently created or modified. Update IsZero() and Sub() methods to handle new timestamp fields. Extend database queries to calculate MAX timestamps from underlying tables.
…e summaries Extends EntitySummary with LastCreatedAt field to track the most recent entity creation timestamp across merges. Adds Warning struct to capture full diagnostic context (input, output, expression, error) of failed transformations. Introduces ScrapeSummary.AsMap() to expose json-tagged snake_case keys for template/CEL environments. Updates processors to pass transform context through the extraction pipeline. BREAKING CHANGE: add warning context and last_created_at tracking to scrape summaries
…agnostic tooling Restructure external entity resolution to handle merged entities via loser-id aliases. Add id-keyed caches for users/roles/groups, new findExternalEntityByID helper for resolving both canonical and historical ids via alias fallback. Implement savepoint-guarded merge functions with temp-table dump diagnostics on failure. Improve user-group resolution to validate direct UUIDs against current scrape entities, preventing dangling FKs. Add comprehensive test coverage for edge cases. Session properties now applied to merge functions.
… summary Add ScraperTemplateEnv() method to provide request-shaping config (URLs, query payloads) with scraper-level template inputs. Restrict last_scrape_summary access to scraper-level config only, not per-item transforms. Implement deduplication of warnings by error key across ExtractedConfig, ScrapeSummary, and ConfigTypeScrapeSummary to prevent duplicate entries and consolidate counts.
… and config queries Add support for AWS federation and browser login auto-authentication, trace recording configuration, config item export via queries, and improved script parameter flexibility including headless mode and execution timeout controls.
Update Go dependencies to latest versions including: - flanksource packages (artifacts, clicky, commons, deps, duty, gomplate, sandbox-runtime) - AWS SDK v2 services and related packages - Google Cloud libraries - OpenTelemetry packages - Kubernetes client libraries - Various other transitive dependencies Add PROPERTIES.md documentation covering runtime configuration properties for retention, deduplication, scraper concurrency, scheduling, timeouts, logging, and other system controls. Fix database query in external_cache.go to properly handle UUID slice results from Pluck operation. Update Makefile to simplify go run command and add bench target for benchmarking. Refs: dependency maintenance and documentation improvements
WalkthroughThis PR introduces a real-time scrape UI with streaming snapshots and diff visualization, enhances the scraping pipeline with structured warning tracking and transformation diagnostics, implements external entity merge deduplication via alias overlap, expands Azure DevOps permission handling for pipelines and releases, and adds property/change-result exposure in template contexts. Changes
Sequence Diagram(s)sequenceDiagram
participant Browser as Browser/Client
participant ScrapeUI as Scrape UI Server
participant DB as Database
participant Scrapers as Scraper Workers
Note over Browser,Scrapers: Real-Time Scrape Monitoring
Browser->>ScrapeUI: GET /api/scrape (initial snapshot)
activate ScrapeUI
ScrapeUI->>DB: CaptureScrapeSnapshot(runStart)
activate DB
DB-->>ScrapeUI: ScrapeSnapshot (before state)
deactivate DB
ScrapeUI-->>Browser: JSON Snapshot
deactivate ScrapeUI
Browser->>ScrapeUI: EventSource /api/scrape/stream
activate ScrapeUI
loop During Scrape Execution
Scrapers->>ScrapeUI: UpdateScraper(status, results, summary)
activate ScrapeUI
ScrapeUI->>DB: Process Results & Store
ScrapeUI->>DB: CaptureScrapeSnapshot(runStart)
activate DB
DB-->>ScrapeUI: ScrapeSnapshot (after state)
deactivate DB
ScrapeUI->>ScrapeUI: DiffSnapshots(before, after)
ScrapeUI-->>Browser: SSE Event with Updated Snapshot
deactivate ScrapeUI
end
Scrapers->>ScrapeUI: SetDone()
ScrapeUI-->>Browser: SSE done event
deactivate ScrapeUI
Browser-->>Browser: Render Final Results
sequenceDiagram
participant Scraper as Scraper Worker
participant ExtEntity as External Entity Sync
participant DB as Database (Temp Table)
participant Merge as Merge Function
participant Cache as Entity ID Cache
participant LiveDB as Live Database
Note over Scraper,LiveDB: External Entity Merge & Deduplication
Scraper->>ExtEntity: syncExternalEntities(users, groups, roles)
activate ExtEntity
ExtEntity->>ExtEntity: dedupeByIDWithIndex (alias overlap union-find)
ExtEntity->>DB: Create Temp Tables (external_users, etc.)
activate DB
DB-->>ExtEntity: Temp tables ready
deactivate DB
ExtEntity->>Merge: Call merge_and_upsert_external_users
activate Merge
Merge->>Merge: Find connected components by alias overlap
Merge->>Merge: Select winner (prefer non-nil ID)
Merge-->>ExtEntity: Merge result with winner IDs
deactivate Merge
ExtEntity->>ExtEntity: Rewrite IDs based on idMap
ExtEntity->>Cache: Populate ExternalUserIDCache (loser→winner)
Cache-->>ExtEntity: Cache updated
ExtEntity->>LiveDB: Fetch final canonical entities
activate LiveDB
LiveDB-->>ExtEntity: External Users/Groups/Roles
deactivate LiveDB
ExtEntity-->>Scraper: syncedExternalEntities (canonical + maps)
deactivate ExtEntity
Note over Scraper: Later: FindExternalEntityByID uses cache<br/>to resolve loser UUIDs to winners
Possibly related PRs
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
|
BenchstatBase: ✅ No significant performance changes detectedFull benchstat output |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
|
Warning Review the following alerts detected in dependencies. According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.
|
There was a problem hiding this comment.
Actionable comments posted: 10
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
scrapers/cron.go (1)
417-436:⚠️ Potential issue | 🟠 Major
ScraperSummaryCacheentries are not evicted on scraper deletion.
DeleteScrapeJobclearsscrapeJobsand unschedules related jobs, but leaves entries in the now-exportedScraperSummaryCache. For long-running instances with scraper churn (CRD churn, renames, tenant teardown) this grows unbounded. Since the cache is now also reachable externally via the exported symbol, the leak is more likely to manifest.🛠️ Suggested fix
func DeleteScrapeJob(id string) { logger.Debugf("deleting scraper job for %s", id) + ScraperSummaryCache.Delete(id) + if j, ok := scrapeJobs.Load(id); ok {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scrapers/cron.go` around lines 417 - 436, DeleteScrapeJob currently unschedules and removes entries from scrapeJobs but never evicts entries from ScraperSummaryCache, causing an unbounded leak; update DeleteScrapeJob to call ScraperSummaryCache.Delete for the primary id and any derived keys you use (e.g., the same keys produced by consumeKubernetesWatchJobKey(id) and consumePubSubJobKey(id)) after you Unschedule/Delete each job (or once at the end) so cached summaries are removed when a scraper is deleted.scrapers/http/http.go (1)
82-89:⚠️ Potential issue | 🟡 MinorPotential env-name collision with
last_scrape_summary.
ScraperTemplateEnv()seedstemplateEnvwith the reserved keylast_scrape_summary. If a user defines aspec.Enventry namedlast_scrape_summary, the loop at Line 87 will silently overwrite it (or vice-versa depending on ordering). Low likelihood but worth either documenting the reserved key or rejecting/warning on collision.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scrapers/http/http.go` around lines 82 - 89, The loop that populates templateEnv (returned by ScraperTemplateEnv) can overwrite the reserved key "last_scrape_summary"; update the loop in the function that iterates spec.Env to detect collisions before assigning into templateEnv (use env.Name and the map key lookup on templateEnv), and either return a clear error or emit a warning when an incoming env.Name equals "last_scrape_summary" or any existing templateEnv key; reference the existing GetEnvValueFromCache(ctx, env, ctx.Namespace()) usage and keep the same value-fetch logic but skip assignment on collision and handle it via error/warn as per project convention.scrapers/slack/slack.go (1)
138-147:⚠️ Potential issue | 🟡 MinorConsider surfacing template errors instead of silently rejecting.
When
ctx.RunTemplatefails (e.g., a malformed expression or missing variable), the filter now silently drops the message (return false, nil). For a misconfiguredfilter.Exprthis will quietly skip every message in a channel with zero signal to the operator. Consider either returning the error (treated like a bad-filter exit inprocessRule) or at minimum logging it viactx.Loggerso that expression regressions are observable.Also a minor typo on line 143:
expresion→expression.🛠️ Suggested change
- output, err := ctx.RunTemplate(gomplate.Template{Expression: string(filter.Expr)}, message.AsMap()) - if err != nil { - return false, nil - } else if parsed, err := strconv.ParseBool(output); err != nil { - return false, fmt.Errorf("expected expresion to return a boolean value: %w", err) - } else if !parsed { - return false, nil - } + output, err := ctx.RunTemplate(gomplate.Template{Expression: string(filter.Expr)}, message.AsMap()) + if err != nil { + ctx.Logger.V(3).Infof("slack filter expression failed: %v", err) + return false, nil + } + parsed, err := strconv.ParseBool(output) + if err != nil { + return false, fmt.Errorf("expected expression to return a boolean value: %w", err) + } + if !parsed { + return false, nil + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scrapers/slack/slack.go` around lines 138 - 147, The current filter block swallows template evaluation errors from ctx.RunTemplate (using filter.Expr) by returning false, nil; change this to surface failures by returning the error (so processRule can treat it as a bad-filter exit) or at minimum call ctx.Logger.Errorf(...) with the error and context (including filter.Expr and message.AsMap()) before returning false, err; also fix the typo in the error message in the strconv.ParseBool branch from "expresion" to "expression". Ensure you update the branch handling of ctx.RunTemplate, keeping references to ctx.RunTemplate, filter.Expr, message.AsMap(), and processRule.cmd/run.go (1)
266-275:⚠️ Potential issue | 🟡 MinorUI mode suppresses the non-zero exit on errors.
When
uiEnabledis true, the handlerreturns after receiving SIGINT/SIGTERM, bypassing theif hasErrors { os.Exit(1) }block below. If that's intentional (UI users expect graceful Ctrl-C even after failures), consider preserving the exit code for CI/scripted use, e.g.os.Exit(1)onhasErrorsbefore returning. At minimum worth a comment clarifying the intent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/run.go` around lines 266 - 275, The UI branch returns immediately after receiving SIGINT/SIGTERM which skips the subsequent hasErrors check and non-zero exit; update the uiServer signal handling (the sig channel and its receive block where uiServer != nil) to check hasErrors and call os.Exit(1) before returning (or add a clarifying comment if the behavior is intentional) so that failures still produce a non-zero exit code for CI/scripted runs; reference the uiServer variable, the sig channel/signal.Notify block, and the hasErrors/os.Exit(1) logic when making the change.db/external_entities.go (1)
81-134:⚠️ Potential issue | 🟠 Major
groupIDMapis built but not returned;resolvedGroupsandresolvedRolesare never rewritten to winner IDs.
upsertExternalEntitiesbuilds and populatesgroupIDMapduring the merge (loser→winner mappings from the groups merge block), but the function signature only returns a single map:func upsertExternalEntities( ... ) (upsertCounts, map[uuid.UUID]uuid.UUID, error) { // ... userIDMap := make(map[uuid.UUID]uuid.UUID) groupIDMap := make(map[uuid.UUID]uuid.UUID) // ...populate both maps... return counts, userIDMap, nil // only userIDMap returned }As a result:
Only
resolvedUsersis rewritten in the ID-rewrite loop at lines 112–116.resolvedGroupsandresolvedRolesstill carry their pre-merge (loser) IDs when published viaresult.Groups.Entities/result.Roles.Entities(lines 130, 133) and viasynced.Groups/synced.Roles(lines 120–121). Downstream consumers (e.g. FK remapping inSaveResultsthat usesconfigAccess.ExternalGroupID/ExternalRoleID) receive IDs that may already be obsolete in the DB.The
synced.GroupIDMapfield — documented in the struct at lines 41–44 — is declared but never assigned anywhere, leaving it nil for callers that need it.The ID-cache writes at lines 97–103 insert loser→loser mappings for groups/roles that got merged, instead of loser→winner. Lookups via
findExternalEntityByIDreturn stale loser IDs from the cache until TTL eviction.(Note:
groupIDMapis used internally byupsertExternalUserGroupsBlock→remapExternalUserGroupsat line 633, confirming the data exists and just isn't being propagated to the caller.)Suggested fix
Expose both
userIDMapandgroupIDMapfromupsertExternalEntities:func upsertExternalEntities( ... -) (upsertCounts, map[uuid.UUID]uuid.UUID, error) { +) (upsertCounts, map[uuid.UUID]uuid.UUID, map[uuid.UUID]uuid.UUID, error) { ... - return counts, userIDMap, nil + return counts, userIDMap, groupIDMap, nil }Update the callsite in
syncExternalEntitiesto receive and use both maps:- counts, idMap, err := upsertExternalEntities(ctx, resolvedUsers, resolvedGroups, resolvedRoles, resolvedUserGroups, scraperID) + counts, userIDMap, groupIDMap, err := upsertExternalEntities(ctx, resolvedUsers, resolvedGroups, resolvedRoles, resolvedUserGroups, scraperID) - for i := range resolvedUsers { - if winner, ok := idMap[resolvedUsers[i].ID]; ok { - resolvedUsers[i].ID = winner + for i := range resolvedUsers { + if winner, ok := userIDMap[resolvedUsers[i].ID]; ok { + resolvedUsers[i].ID = winner + } + } + for i := range resolvedGroups { + if winner, ok := groupIDMap[resolvedGroups[i].ID]; ok { + resolvedGroups[i].ID = winner + } } synced = syncedExternalEntities{ Users: resolvedUsers, Groups: resolvedGroups, Roles: resolvedRoles, - UserIDMap: idMap, + UserIDMap: userIDMap, + GroupIDMap: groupIDMap, }Fix the cache eviction and population for groups (mirroring the user block):
if scraperID != nil { - for loserID := range idMap { + for loserID := range userIDMap { ExternalUserIDCache.Delete(loserID.String()) } + for loserID, winnerID := range groupIDMap { + ExternalGroupIDCache.Delete(loserID.String()) + ExternalGroupIDCache.Set(loserID.String(), winnerID, cache.DefaultExpiration) + } ... }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@db/external_entities.go` around lines 81 - 134, upsertExternalEntities builds a groupIDMap (and possibly roleIDMap) but only returns the user ID map, so syncExternalEntities only rewrites resolvedUsers and writes loser→loser entries into caches; fix by changing upsertExternalEntities to return the additional ID maps (e.g., groupIDMap, roleIDMap) alongside the user map, update the call in syncExternalEntities to capture them from upsertExternalEntities, apply the same winner-ID rewrite loop to resolvedGroups and resolvedRoles (mirror the users loop), assign synced.GroupIDMap (and synced.RoleIDMap if present) from the returned maps, and when updating ExternalGroupIDCache/ExternalGroupCache and ExternalRoleIDCache/ExternalRoleCache use the winner ID (map lookup) instead of the original loser ID so caches store loser→winner mappings used by remapExternalUserGroups/upsertExternalUserGroupsBlock.
🟡 Minor comments (21)
config/schemas/config_playwright.schema.json-722-729 (1)
722-729:⚠️ Potential issue | 🟡 MinorRemove or clarify the unused top-level
harfield in the Playwright config.The
Playwright.harfield (line 727-729) is defined in both the Go struct and JSON schema but is never used in the codebase. HAR recording is controlled exclusively byctx.IsTrace()orTrace.har(insidePlaywrightTrace). The field should either be removed or marked as deprecated to avoid user confusion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/schemas/config_playwright.schema.json` around lines 722 - 729, The top-level Playwright.har field is unused (HAR is controlled by ctx.IsTrace() and Trace.har inside PlaywrightTrace), so remove or deprecate it: update the JSON schema entry for "har" (and the corresponding Go struct field Playwright.har) to either delete that property entirely or replace it with a deprecated boolean (add a "deprecated": true and update the description to point users to Trace.har and ctx.IsTrace()), and ensure any validation or unmarshalling code that references Playwright.har is removed or adjusted to use PlaywrightTrace (PlaywrightTrace.har) instead so no dead/duplicated config remains; also update any docs/messages to refer to Trace.har.cmd/scrapeui/frontend/src/components/AliasList.tsx-12-21 (1)
12-21:⚠️ Potential issue | 🟡 MinorSilent clipboard failure & minor a11y gap.
navigator.clipboard.writeText(alias)returns a Promise that can reject (no permission, insecure context, or API unavailable), and the rejection is currently swallowed. Consider catching the rejection and surfacing a fallback/toast. Also, for screen readers, the icon-only button benefits fromaria-labelin addition totitle.♻️ Suggested change
<button class="text-gray-300 hover:text-gray-600 opacity-0 group-hover:opacity-100 transition-opacity shrink-0" title="Copy" + aria-label={`Copy ${alias}`} onClick={(e) => { e.stopPropagation(); - navigator.clipboard.writeText(alias); + navigator.clipboard?.writeText(alias).catch((err) => { + console.warn('Failed to copy alias', err); + }); }} >🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/scrapeui/frontend/src/components/AliasList.tsx` around lines 12 - 21, In AliasList.tsx update the copy button handler to handle clipboard failures and improve accessibility: wrap navigator.clipboard.writeText(alias) in a promise catch (or use async/await try/catch) and on rejection show a fallback (e.g., programmatic selection+execCommand or dispatch a toast/error notification) so errors aren’t swallowed, and add an aria-label (e.g., aria-label="Copy alias") to the icon-only <button> (keep the existing title) to support screen readers.cmd/scrapeui/frontend/src/components/AccessLogTable.tsx-99-99 (1)
99-99:⚠️ Potential issue | 🟡 MinorUse a stable row key so expansion state follows the entry, not the position.
Keying
AccessLogRowby array index means that when the user changes the sort column, the localopenstate of each row (useStateinAccessLogRow) stays bound to the row position, not the underlying entry — so the wrong row appears expanded after sorting. Prefer a stable key derived from the entry (e.g., a composite ofexternal_config_id,external_user_id, andcreated_at, or a backend-supplied id if available).🛠️ Suggested change
- {sorted.map((e, i) => <AccessLogRow key={i} entry={e} lookups={lookups} />)} + {sorted.map((e, i) => ( + <AccessLogRow + key={`${e.external_config_id ?? ''}-${e.external_user_id ?? ''}-${e.created_at ?? i}`} + entry={e} + lookups={lookups} + /> + ))}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/scrapeui/frontend/src/components/AccessLogTable.tsx` at line 99, The list is keyed by array index so AccessLogRow's internal open state follows row position instead of the actual entry; change the key in the sorted.map call that renders AccessLogRow to a stable identifier derived from the entry (for example use a composite of entry.external_config_id + '|' + entry.external_user_id + '|' + entry.created_at, or use a backend-supplied id if present) so expansion state follows the entry rather than its sorted position.cmd/scrapeui/frontend/src/components/HARPanel.tsx-133-135 (1)
133-135:⚠️ Potential issue | 🟡 MinorIndex-as-key on sorted+filtered rows.
key={i}onsorted.map((e, i) => <HARRow …/>)means row identity changes whenever the list is re-sorted or filtered. SinceHARRowholds localopenstate, the wrong row can appear expanded after a sort/filter change. Prefer a stable key derived from the entry (e.g.,startedDateTime + url) or an id if available.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/scrapeui/frontend/src/components/HARPanel.tsx` around lines 133 - 135, The rows use an unstable index key (key={i}) in sorted.map((e, i) => <HARRow … />), which causes HARRow's local open state to mix up after sorting/filtering; change the key to a stable identifier derived from the entry (for example combine entry.startedDateTime and entry.request.url or use an existing entry.id) when rendering <HARRow>, i.e., replace key={i} with a stable expression like `${entry.startedDateTime}-${entry.request.url}` (or the id field) so each row keeps its identity across reorders/filters.cmd/scrapeui/frontend/src/components/ConfigNode.tsx-31-31 (1)
31-31:⚠️ Potential issue | 🟡 Minor
isNewheuristic risks falsely flagging every persisted item.
!item.Action && !!item.created_atwill classify essentially every historical config item (all of which have acreated_at) as "new" whenever the scrape result didn't setAction. If the intent is "created in this scrape window", gate on the snapshot's window (e.g., comparecreated_atagainst a window boundary) or rely strictly onitem.Action === 'inserted'.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/scrapeui/frontend/src/components/ConfigNode.tsx` at line 31, The isNew heuristic in ConfigNode.tsx (the isNew variable using item.Action and item.created_at) wrongly marks all persisted items as new; change it to either rely strictly on item.Action === 'inserted' or, if you really need to treat items created during the current scrape as new, compare item.created_at against the scrape snapshot window boundaries (e.g., snapshotWindowStart/snapshotWindowEnd) instead of merely checking for presence of created_at; update the isNew assignment to use that comparison or drop the created_at check so only item.Action === 'inserted' signals new.cmd/scrapeui/frontend/src/components/ConfigNode.tsx-29-29 (1)
29-29:⚠️ Potential issue | 🟡 Minor
isSelectedis incorrectly true whenselectedisnulland the item has no id/config_type.
selected?.id === item.id && selected?.config_type === item.config_typeevaluates toundefined === undefined→truefor anyitemwhoseidandconfig_typeare undefined while nothing is selected. Aggregated/metadata-only ScrapeResults (e.g., the synthetic entries fromunmarshalConfigsFromStringmetadata-only path covered inscrapers/processors/script_test.go) can have emptyid/config_type, which would light them all up as "selected" in the tree.🐛 Proposed fix
- const isSelected = selected?.id === item.id && selected?.config_type === item.config_type; + const isSelected = + selected != null && + selected.id === item.id && + selected.config_type === item.config_type;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/scrapeui/frontend/src/components/ConfigNode.tsx` at line 29, The isSelected expression incorrectly returns true when both selected and item have undefined id/config_type; change it to explicitly require a non-null selected before comparing fields (e.g., check selected != null or !!selected) and then compare selected.id === item.id && selected.config_type === item.config_type so that items with undefined id/config_type are not treated as selected when nothing is selected; update the isSelected variable in ConfigNode (the declaration using selected and item) accordingly.cmd/ui.go-24-38 (1)
24-38:⚠️ Potential issue | 🟡 Minor
Artifactsis deserialized but never propagated.
jsonResults.Artifactsis parsed from the input JSON but not copied intoscrapeui.Snapshot(lines 55-70). Either wire it through (if the UI is meant to show artifacts from saved runs) or drop the field to avoid confusing silent drops.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ui.go` around lines 24 - 38, jsonResults.Artifacts is parsed from input JSON but never copied into the scrapeui.Snapshot, so artifacts are silently dropped; update the code that maps jsonResults -> scrapeui.Snapshot (where scrapeui.Snapshot is constructed/filled) to include the Artifacts field (copy jsonResults.Artifacts into the corresponding scrapeui.Snapshot.Artifacts or equivalent field) or remove the Artifacts field from jsonResults if artifacts are not intended to be propagated; locate the mapping logic that builds scrapeui.Snapshot from jsonResults and add the Artifacts assignment to ensure artifacts are passed through.db/scrape_snapshot_test.go-295-295 (1)
295-295:⚠️ Potential issue | 🟡 MinorVacuous assertion —
int >= 0is always true.Based on the seed (the single group in-run is
lastHour, outside the run-start window), the intent looks like assertingEqual(0)or at least a tighter bound. As written it's a no-op and will pass on any value.- Expect(snap.ExternalGroups.UpdatedLast).To(BeNumerically(">=", 0)) + // seeded group's updated_at is before runStart, so it must not land in Last + Expect(snap.ExternalGroups.UpdatedLast).To(BeNumerically("<", 1_000_000))(Or remove the line entirely if no meaningful upper bound is asserted elsewhere.)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@db/scrape_snapshot_test.go` at line 295, The assertion Expect(snap.ExternalGroups.UpdatedLast).To(BeNumerically(">=", 0)) is vacuous; change it to assert the intended value based on the test seed (e.g., Expect(snap.ExternalGroups.UpdatedLast).To(Equal(0)) if the single seeded group is outside the run-start window) or tighten the bound to the correct expected range for ExternalGroups.UpdatedLast so the test actually validates behavior.cmd/scrapeui/frontend/src/components/AccessTable.tsx-80-96 (1)
80-96:⚠️ Potential issue | 🟡 MinorInconsistent nullability on
entries— possible NPE if undefined + search is set.The
Propstype declaresentries: ExternalConfigAccess[](required), yet line 94 guards against!entries. Meanwhile theuseMemoon line 81 runs before the guard and callsentries.filter(...)whensearchis truthy — if a caller ever passesundefined, this throws before the empty-state renders. Either tighten the type (entries?: ExternalConfigAccess[]) and guard in the memo, or drop the defensive check on line 94.🔧 Proposed fix
- const filtered = useMemo(() => { - if (!search) return entries; - return entries.filter(e => + const filtered = useMemo(() => { + if (!entries) return []; + if (!search) return entries; + return entries.filter(e => matchesSearch(search,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/scrapeui/frontend/src/components/AccessTable.tsx` around lines 80 - 96, Props currently claims entries is required but the component defensively checks !entries and useMemo assumes entries exists; make the contract consistent. Either update the Props declaration (entries?: ExternalConfigAccess[]) and change the useMemo implementation in AccessTable to treat entries as empty array when undefined (e.g., use (entries || []) before filter) so the memo and empty-state guard are safe, or keep entries required and remove the defensive !entries check and any branches that handle undefined; reference AccessTable, Props, useMemo, and the filtered variable to locate and update the code accordingly.cmd/scrapeui/frontend/vite.config.ts-103-108 (1)
103-108:⚠️ Potential issue | 🟡 Minor
snap.config_metais never populated.The
snapobject built at lines 38–68 has noconfig_metafield, so(snap.config_meta || {})[id]always resolves toundefinedand_metais never useful. Either wireraw.ConfigMeta || raw.config_metainto the snapshot like the other fields, or drop_metafrom the detail payload.🔧 Proposed fix
relationships: raw.Relationships || raw.relationships || [], + config_meta: raw.ConfigMeta || raw.config_meta || {}, counts: {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/scrapeui/frontend/vite.config.ts` around lines 103 - 108, The snapshot object snap (constructed earlier) never includes config_meta, so detail's _meta = (snap.config_meta || {})[id] is always undefined; fix by wiring raw.ConfigMeta || raw.config_meta into the snapshot when building snap (mirror how other fields from raw are assigned) so snap.config_meta exists and _meta is populated, or alternatively remove the _meta property from the detail payload; update the code that constructs snap (referencing raw and snap) to include config_meta if you choose the first option.scrapers/azure/devops/pipelines.go-615-626 (1)
615-626:⚠️ Potential issue | 🟡 Minor
access.ExternalUserAliases = []string{email}can store a single empty alias.Same root cause as the user record above — when
email == "", the emittedExternalConfigAccessrow hasExternalUserAliases = [""]. If the matcher treats empty aliases as "match anything," this creates cross-linking risk for access rows. Derive the alias list from the same (filtered) slice used for the user record.🔧 Suggested fix
- if identity.IsContainer { - access.ExternalGroupAliases = DescriptorAliases(identity.Descriptor) - } else { - access.ExternalUserAliases = []string{email} - } + if identity.IsContainer { + access.ExternalGroupAliases = DescriptorAliases(identity.Descriptor) + } else if email != "" { + access.ExternalUserAliases = []string{email} + } else { + access.ExternalUserAliases = []string{identity.Descriptor} + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scrapers/azure/devops/pipelines.go` around lines 615 - 626, The code creates ExternalConfigAccess with ExternalUserAliases set to []string{email} which can insert an empty alias when email == ""; change this to derive the user-alias list the same way the user record does (or compute a filtered slice) and only set access.ExternalUserAliases when the resulting slice is non-empty—i.e., compute userAliases by filtering out empty strings (reusing the same slice used for the user record), then if len(userAliases) > 0 assign access.ExternalUserAliases = userAliases; keep the existing branch for identity.IsContainer that uses DescriptorAliases(identity.Descriptor), and append access to configAccess as before.scrapers/azure/devops/groups.go-141-159 (1)
141-159:⚠️ Potential issue | 🟡 Minor
email = namefallback stores the display name in the Email field.When
identity.Properties["Mail"]is absent,name(e.g. "Jane Doe") and stored asExternalUser.Email. That is not an email address, and becauseaddUseronly backfills when the new value is empty (if u.Email == nil || *u.Email == ""), a later observation of the same descriptor with a real🔧 Suggested fix
- name := ResolvedIdentityName(identity, "") - email := "" - if mail, ok := identity.Properties["Mail"]; ok { - email = mail.Value - } - if email == "" && name == "" { - continue - } - if email == "" { - email = name - } - - userAliases := uniqueAliases( - email, - identity.Descriptor, - identity.SubjectDescriptor, - ) - addUser(identity.Descriptor, dutyModels.ExternalUser{ - Name: name, - Email: &email, + name := ResolvedIdentityName(identity, "") + email := "" + if mail, ok := identity.Properties["Mail"]; ok { + email = mail.Value + } + if email == "" && name == "" { + continue + } + + var emailPtr *string + if email != "" { + emailPtr = &email + } + userAliases := uniqueAliases( + email, + identity.Descriptor, + identity.SubjectDescriptor, + ) + addUser(identity.Descriptor, dutyModels.ExternalUser{ + Name: name, + Email: emailPtr,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scrapers/azure/devops/groups.go` around lines 141 - 159, The code currently falls back to email = name when Mail is missing, causing display names to be stored as ExternalUser.Email and preventing later real emails from replacing it; update the logic in the block around the email variable and the call to addUser so that if email == "" you do NOT assign name to email and instead pass a nil Email pointer into dutyModels.ExternalUser (e.g. compute an emailPtr that is nil when email is empty, or only set &email when non-empty) while keeping the same aliases and descriptors (identity.Descriptor, identity.SubjectDescriptor) and still using uniqueAliases and addUser.scrapers/azure/devops/pipelines.go-575-600 (1)
575-600:⚠️ Potential issue | 🟡 MinorEmpty email leaks into user aliases.
emailFromIdentity(identity)can return""(onlynamegated thecontinuebelow for that combination; whenname != ""we fall through withemail == ""). The non-container branch then writespq.StringArray{email, identity.Descriptor, identity.SubjectDescriptor}, producing an empty-string alias. An empty alias participates in overlap-based merges and will create spurious matches.Filter empties (e.g. reuse
uniqueAliasesfromgroups.go) and pass a nilscrapers/azure/devops/groups.go.🔧 Suggested fix
- } else { - ctx.AddUser(dutyModels.ExternalUser{ - Name: name, - Email: &email, - Aliases: pq.StringArray{email, identity.Descriptor, identity.SubjectDescriptor}, - Tenant: config.Organization, - UserType: "AzureDevOps", - }) - } + } else { + var emailPtr *string + if email != "" { + emailPtr = &email + } + ctx.AddUser(dutyModels.ExternalUser{ + Name: name, + Email: emailPtr, + Aliases: pq.StringArray(uniqueAliases(email, identity.Descriptor, identity.SubjectDescriptor)), + Tenant: config.Organization, + UserType: "AzureDevOps", + }) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scrapers/azure/devops/pipelines.go` around lines 575 - 600, The user branch can insert an empty-string alias because emailFromIdentity(identity) may return ""; update the non-container path around ResolvedIdentityName/ emailFromIdentity so that Email is set to nil when email == "" and the Aliases passed to ctx.AddUser (call site: ctx.AddUser in this file) are deduplicated/filtered to remove empty strings (reuse the uniqueAliases helper pattern used in groups.go or implement equivalent filtering for pq.StringArray{...} to exclude ""). Ensure aliases still include identity.Descriptor and identity.SubjectDescriptor but never include "".scrapers/processors/json.go-464-469 (1)
464-469:⚠️ Potential issue | 🟡 Minor
TransformOutputaliases the same map that downstream mutations will edit in place.
scriptResult.Results[i].Configis assigned toTransformOutputat line 466, storing a reference to the same map object. However,applyMask(called at line 529) mutatesresults[i].Configin place viam.JSONPath.Set(results[i].Config, ...)(lines 859, 863), which will also mutateTransformOutput. IfTransformOutputis intended as a diagnostic snapshot of what the script emitted before masking, it should be deep-copied rather than aliased.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scrapers/processors/json.go` around lines 464 - 469, scriptResult.Results[i].TransformOutput is currently set to the same map as scriptResult.Results[i].Config, so later in-place mutations by applyMask (via m.JSONPath.Set in applyMask) also modify the stored TransformOutput; fix this by making TransformOutput a deep copy/snapshot of Config when setting it (e.g., clone the map/struct contents or marshal-unmarshal to create a new map) before any masking occurs so applyMask’s modifications to results[i].Config do not alter TransformOutput; update the assignment in the loop that sets TransformOutput for scriptResult.Results to use that deep-copy method.cmd/scrapeui/frontend/src/App.tsx-178-199 (1)
178-199:⚠️ Potential issue | 🟡 MinorObject fallback coerces a struct into
extIdand never matches.
const extId = (a.external_config_id as any)?.external_id || a.external_config_id;— ifexternal_config_idis an object without anexternal_idfield (e.g., onlyconfig_idpopulated), this assigns the entire object toextId, after whichcfg.id === extIdis always false. Same pattern used twice (access + logs).matchesConfiginDetailPanel.tsxhandles this by probingext.config_idas well; this helper should mirror that.♻️ Proposed fix
- for (const a of access) { - const extId = (a.external_config_id as any)?.external_id || a.external_config_id; - if (!extId) continue; + const extractExtId = (a: any): string | undefined => { + const ext = a.external_config_id; + if (typeof ext === 'string') return ext; + if (ext && typeof ext === 'object') return ext.external_id || ext.config_id; + return a.config_id; + }; + for (const a of access) { + const extId = extractExtId(a); + if (!extId) continue;Apply the same helper for the
logsloop below.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/scrapeui/frontend/src/App.tsx` around lines 178 - 199, The code in the access and logs loops computes extId using (a.external_config_id as any)?.external_id || a.external_config_id which wrongly assigns the whole object when external_config_id lacks external_id, so cfg.id === extId never matches; update both loops to mirror the probing used by matchesConfig in DetailPanel.tsx (i.e., extract external_config_id.external_id, then external_config_id.config_id, then fall back to a primitive id) so extId is always a string/number id before comparing to cfg.id and then incrementing the counters (references: the access loop variable a, logs loop variable l, external_config_id, extId, configs, zero(), m.get, c.access and c.accessLogs).api/v1/scrape_snapshot.go-40-55 (1)
40-55:⚠️ Potential issue | 🟡 Minor
SubcopiesLastCreatedAt/LastUpdatedAtfrom the "after" side, which defeats theIsZerofilter indiffMaps.
diffMapsonly keeps entries wheredelta.IsZero()is false (line 143). ButSubunconditionally copies the after-side timestamps into the result, andIsZero()returns false whenever either timestamp pointer is non-nil (line 37). Consequence: aPerScraper/PerConfigTypekey whose counter values are identical in before and after still produces a "diff" entry (with all counter deltas = 0) as long as the after snapshot recorded a last-created/updated timestamp — which is the common case. The compact-diff guarantee in the comment abovediffMapseffectively never applies.♻️ Suggested fixes (pick one)
func (e EntityWindowCounts) Sub(other EntityWindowCounts) EntityWindowCounts { return EntityWindowCounts{ Total: e.Total - other.Total, ... - LastCreatedAt: e.LastCreatedAt, - LastUpdatedAt: e.LastUpdatedAt, + // Only carry timestamps when the counter state actually changed, + // otherwise an unchanged entry shows up in the diff. + LastCreatedAt: timestampIfChanged(e, other, e.LastCreatedAt), + LastUpdatedAt: timestampIfChanged(e, other, e.LastUpdatedAt), } }Or, alternatively, exclude timestamps from
IsZero:func (e EntityWindowCounts) IsZero() bool { return e.Total == 0 && e.UpdatedLast == 0 && e.UpdatedHour == 0 && e.UpdatedDay == 0 && e.UpdatedWeek == 0 && - e.DeletedLast == 0 && e.DeletedHour == 0 && e.DeletedDay == 0 && e.DeletedWeek == 0 && - e.LastCreatedAt == nil && e.LastUpdatedAt == nil + e.DeletedLast == 0 && e.DeletedHour == 0 && e.DeletedDay == 0 && e.DeletedWeek == 0 }The second option keeps snapshot IsZero semantics intact for diffs while preserving timestamps for Pretty rendering.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/v1/scrape_snapshot.go` around lines 40 - 55, Sub currently always copies LastCreatedAt/LastUpdatedAt from the "after" side which makes IsZero() return false even when all count deltas are zero; change EntityWindowCounts.Sub so it still computes numeric deltas as it does but only sets LastCreatedAt and LastUpdatedAt in the result when the timestamps actually differ between e and other (otherwise leave them nil), so diffMaps plus IsZero() will correctly suppress entries whose counters and timestamps didn't change; reference: EntityWindowCounts.Sub, EntityWindowCounts.IsZero, and diffMaps.cmd/scrapeui/server.go-248-253 (1)
248-253:⚠️ Potential issue | 🟡 MinorSSE fan-out: only one subscriber wakes up per
notify().
updatedis a singlechan struct{}with buffer 1, shared across all SSE connections. Whennotify()fires, whicheverhandleSSEgoroutine happens to be parked in theselectdrains it; any other connected clients won't see that update until the 2-second ticker tick. With a single viewer this is fine, but anything multi-tab or multi-client becomes noticeably laggy.Consider maintaining a per-connection subscriber channel (subscribe on enter, unsubscribe on exit) and fan-out in
notify().Also applies to: 428-452
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/scrapeui/server.go` around lines 248 - 253, The notify() function currently writes to a single buffered channel s.updated which only wakes one handleSSE goroutine, causing missed updates for other SSE clients; change the design to maintain per-connection subscriber channels (e.g., have Server hold a map or slice of subscriber chans and methods subscribe/unsubscribe used by handleSSE when a client connects/disconnects) and update notify() to iterate over all subscriber channels and non-blockingly send the update (dropping to avoid blocking slow clients). Update handleSSE to create and register its own chan struct{} on entry and deregister it on exit so every connected client receives immediate fan-out notifications.scrapers/extract/extract.go-60-79 (1)
60-79:⚠️ Potential issue | 🟡 Minor
AddWarningkeys byw.Error— empty error strings collapse unrelated warnings.Every call path in
validateConfigRefssupplies a non-emptyError, so this is not hit today, butAddWarningis exported and there is no guard. A caller that accidentally passesv1.Warning{Result: r}(Error omitted) will silently have all such entries deduped into a single aggregated record keyed by"". Consider rejecting or at minimum labelling empty-error warnings (e.g. fall back to"unknown") to avoid misleading counts.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scrapers/extract/extract.go` around lines 60 - 79, The AddWarning method keys warnings by w.Error which lets empty strings collapse unrelated warnings; update AddWarning (and its use of warningIndex, warningCount, Warnings, and v1.Warning) to guard against empty Error values by normalizing them to a meaningful sentinel (e.g. "unknown_error" or "unnamed_warning") before indexing or aggregating — alternatively return/early-reject when w.Error == "" if you prefer strictness; ensure the normalized key is used for warningIndex lookups and when assigning e.Warnings entries so counts are not silently merged.db/update.go-688-693 (1)
688-693:⚠️ Potential issue | 🟡 MinorWarning transform attribution is taken from an unrelated result.
extractResult.transformInput/transformExprare populated by the first scrape result that carries a non-nil transform (see extraction path assigning once). The warnings created here forconfig_access/access_logrows typically originate from arbitrary other results in the same batch, so a consumer reading the warning'sInput/Exprwill see fields from a result that has nothing to do with the skipped row.If the intent is diagnostic, consider either (a) dropping
Input/Expron these warnings (leaveResultonly), or (b) tracking transform context per-result and looking up the correct one for the row being warned about.Also applies to: 735-740
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@db/update.go` around lines 688 - 693, The warning currently uses extractResult.transformInput/transformExpr (set once from the first transform-bearing scrape) when calling summary.AddScrapeWarning for configAccess/access_log rows, which misattributes Input/Expr to unrelated results; either remove populating the Warning.Input and Warning.Expr fields in these AddScrapeWarning calls (leave Error and Result only), or modify the code to look up and attach the transform context tied to the specific row (e.g., maintain per-result transformContext and use that instead of extractResult.transformInput/transformExpr when constructing the Warning for configAccess/access_log); update both places where AddScrapeWarning is invoked for configAccess and access_log rows accordingly.scrapers/extract/extract.go-120-134 (1)
120-134:⚠️ Potential issue | 🟡 Minor
initWarningIndexleaves stale duplicates whenWarningsis pre-populated.If
e.Warningsalready contains two entries with the sameError(e.g. restored from a marshaledExtractedConfig), only the first is added towarningIndex. LaterAddWarningcalls will bump only that first entry'sCount, and the duplicate rows remain with their original counts, producing inconsistent totals.♻️ Fold duplicates on init
- e.warningIndex = make(map[string]int, len(e.Warnings)) - for i := range e.Warnings { - if _, ok := e.warningIndex[e.Warnings[i].Error]; ok { - continue - } - if e.Warnings[i].Count == 0 { - e.Warnings[i].Count = 1 - } - e.warningIndex[e.Warnings[i].Error] = i - } + e.warningIndex = make(map[string]int, len(e.Warnings)) + deduped := e.Warnings[:0] + for i := range e.Warnings { + w := e.Warnings[i] + if w.Count == 0 { + w.Count = 1 + } + if idx, ok := e.warningIndex[w.Error]; ok { + deduped[idx].Count += w.Count + continue + } + e.warningIndex[w.Error] = len(deduped) + deduped = append(deduped, w) + } + e.Warnings = deduped🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scrapers/extract/extract.go` around lines 120 - 134, initWarningIndex currently builds e.warningIndex but leaves duplicate entries in e.Warnings (when e.Warnings was pre-populated), so future AddWarning increments only the first duplicate and totals are inconsistent; fix initWarningIndex to fold duplicates by scanning e.Warnings, aggregating Counts per unique Warning.Error (treat Count==0 as 1), produce a new deduplicated slice of warnings with summed counts, assign that back to e.Warnings, then rebuild e.warningIndex mapping each unique Error to its index; update references to ExtractedConfig.warningIndex and methods like AddWarning to rely on the deduplicated list.db/update.go-792-814 (1)
792-814:⚠️ Potential issue | 🟡 Minor
AccessLogs.LastCreatedAtis advanced for rows that never got saved, but appears unused.This loop walks
extractResult.configAccessLogs— the input set — and picks the maxCreatedAtper config type beforeSaveConfigAccessLogsruns at line 835. Any row that later hits an unresolved config, an unresolved user, or the FK-violation fallback path insideSaveConfigAccessLogs(lines 1510-1523) still contributes to this watermark. Combined with the retention in lines 1047-1049 (summary.AccessLogs.LastCreatedAtis propagated to the next scrape), the watermark becomes semantically incorrect.However, extensive code search found no scraper definitions, templates, or code paths that actually use
last_scrape_summary.AccessLogs.LastCreatedAtas a cursor for incremental fetching. The watermark appears to be unused in practice, so while the logic is wrong, data loss is not observed. Nevertheless, computeLastCreatedAtfrom rows that actually made it intoSaveConfigAccessLogsfor semantic correctness.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@db/update.go` around lines 792 - 814, The loop computing accessLogMaxTime wrongly uses extractResult.configAccessLogs (input rows) so rows that never get persisted still advance summary.ConfigTypes[...].AccessLogs.LastCreatedAt and summary.AccessLogs.LastCreatedAt; move this watermark computation to use the actual persisted rows (the output of SaveConfigAccessLogs) instead of extractResult.configAccessLogs — i.e., after calling SaveConfigAccessLogs, iterate the slice/rows that were saved (or the function return value) and compute per-config max CreatedAt, then update summary.ConfigTypes[configType].AccessLogs.LastCreatedAt and summary.AccessLogs.LastCreatedAt accordingly (referencing extractResult.configAccessLogs, SaveConfigAccessLogs, summary.ConfigTypes, and summary.AccessLogs.LastCreatedAt to locate the affected code).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4e8c685a-4e9c-4f6f-93aa-19a5c61416cc
⛔ Files ignored due to path filters (2)
cmd/scrapeui/frontend/package-lock.jsonis excluded by!**/package-lock.jsongo.sumis excluded by!**/*.sum
📒 Files selected for processing (92)
.gitignoreMakefilePROPERTIES.mdapi/context.goapi/context_test.goapi/v1/azure.goapi/v1/interface.goapi/v1/interface_test.goapi/v1/scrape_snapshot.goapi/v1/scrape_snapshot_pretty.goapi/v1/scrape_snapshot_test.goapi/v1/summary_pretty.goapi/v1/summary_pretty_test.gocmd/root.gocmd/run.gocmd/scrapeui/assets.gocmd/scrapeui/convert.gocmd/scrapeui/frontend/.gitignorecmd/scrapeui/frontend/package.jsoncmd/scrapeui/frontend/src/App.tsxcmd/scrapeui/frontend/src/components/AccessLogTable.tsxcmd/scrapeui/frontend/src/components/AccessTable.tsxcmd/scrapeui/frontend/src/components/AliasList.tsxcmd/scrapeui/frontend/src/components/AnsiHtml.tsxcmd/scrapeui/frontend/src/components/ConfigNode.tsxcmd/scrapeui/frontend/src/components/ConfigTree.tsxcmd/scrapeui/frontend/src/components/DetailPanel.tsxcmd/scrapeui/frontend/src/components/EntityTable.tsxcmd/scrapeui/frontend/src/components/FilterBar.tsxcmd/scrapeui/frontend/src/components/HARPanel.tsxcmd/scrapeui/frontend/src/components/JsonView.tsxcmd/scrapeui/frontend/src/components/ScrapeConfigPanel.tsxcmd/scrapeui/frontend/src/components/ScraperList.tsxcmd/scrapeui/frontend/src/components/SnapshotPanel.tsxcmd/scrapeui/frontend/src/components/SplitPane.tsxcmd/scrapeui/frontend/src/components/Summary.tsxcmd/scrapeui/frontend/src/globals.d.tscmd/scrapeui/frontend/src/hooks/useRoute.tscmd/scrapeui/frontend/src/hooks/useSort.tsxcmd/scrapeui/frontend/src/index.tsxcmd/scrapeui/frontend/src/types.tscmd/scrapeui/frontend/src/utils.tscmd/scrapeui/frontend/tsconfig.jsoncmd/scrapeui/frontend/vite.config.tscmd/scrapeui/server.gocmd/scrapeui/types.gocmd/ui.goconfig/schemas/config_azuredevops.schema.jsonconfig/schemas/config_playwright.schema.jsonconfig/schemas/scrape_config.schema.jsondb/change_traversal.godb/external_cache.godb/external_entities.godb/external_entities_test.godb/external_loser_alias_test.godb/permission_changes.godb/scrape_snapshot.godb/scrape_snapshot_test.godb/update.gogo.modscrapers/azure/active_directory.goscrapers/azure/azure.goscrapers/azure/devops/audit_log.goscrapers/azure/devops/audit_log_test.goscrapers/azure/devops/client.goscrapers/azure/devops/groups.goscrapers/azure/devops/groups_test.goscrapers/azure/devops/pipelines.goscrapers/azure/devops/releases.goscrapers/azure/devops/repositories.goscrapers/azure/devops/repositories_test.goscrapers/changes/rules.goscrapers/changes/rules_test.goscrapers/config_access_test.goscrapers/cron.goscrapers/extract/extract.goscrapers/extract/extract_test.goscrapers/extract/fullmode.goscrapers/extract/testdata/unit/access_invalid_ref.yamlscrapers/extract/testdata/unit/access_logs_basic.yamlscrapers/extract/testdata/unit/analysis_invalid_ref.yamlscrapers/extract/testdata/unit/analysis_no_parent.yamlscrapers/extract/testdata/unit/changes_invalid_ref.yamlscrapers/extract/testdata/unit/changes_missing_change_id.yamlscrapers/extract/testdata/unit/changes_missing_change_type.yamlscrapers/extract/testdata/unit/changes_no_parent.yamlscrapers/http/http.goscrapers/processors/json.goscrapers/processors/script.goscrapers/processors/script_test.goscrapers/run.goscrapers/slack/slack.go
| *.har | ||
| *.png | ||
| .playwright-mcp/ | ||
| *.json |
There was a problem hiding this comment.
*.json is too broad for a repo-wide ignore
Ignoring all JSON files can silently block committing legitimate source/config artifacts and future tooling manifests. Please scope this to generated paths/files only.
Proposed scoped change
-*.json
+# Generated JSON artifacts only (adjust paths as needed)
+gh-api-data/*.json
+.playwright-mcp/**/*.json📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| *.json | |
| # Generated JSON artifacts only (adjust paths as needed) | |
| gh-api-data/*.json | |
| .playwright-mcp/**/*.json |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.gitignore at line 61, Remove the repo-wide "*.json" ignore entry and
replace it with scoped ignore patterns that target only generated or tooling
artifacts (e.g., CI/build output, generated schemas) rather than all JSON files;
update the .gitignore by deleting the "*.json" line and adding specific paths or
filenames such as the build/output directories or known generated files (for
example: generated/, dist/, build/, .cache/, or specific file names like
swagger.json or package-lock.json) so source/config JSON files remain tracked.
| if save && dutyapi.DefaultConfig.ConnectionString != "" && summary != nil { | ||
| history := models.NewJobHistory(logger.StandardLogger(), "scraper", job.ResourceTypeScraper, scraperUID) | ||
| history.Start() | ||
| history.SuccessCount = len(results) | ||
| history.AddDetails("scrape_summary", *summary) | ||
| if hasErrors { | ||
| history.AddError(err.Error()) | ||
| } | ||
| history.End() | ||
| if persistErr := history.Persist(dutyCtx.DB()); persistErr != nil { | ||
| logger.Warnf("failed to persist job history: %v", persistErr) | ||
| } | ||
| scrapers.ScraperSummaryCache.Store(scraperUID, *summary) | ||
| } |
There was a problem hiding this comment.
Nil-pointer panic: hasErrors is cross-iteration but err is per-iteration.
hasErrors is set once to true and never reset, so after any earlier scraper fails, every subsequent iteration that succeeds (err == nil, summary != nil) still enters the if hasErrors branch and dereferences err.Error() → panic.
Gate on the current iteration's error instead.
🐛 Proposed fix
if save && dutyapi.DefaultConfig.ConnectionString != "" && summary != nil {
history := models.NewJobHistory(logger.StandardLogger(), "scraper", job.ResourceTypeScraper, scraperUID)
history.Start()
history.SuccessCount = len(results)
history.AddDetails("scrape_summary", *summary)
- if hasErrors {
+ if err != nil {
history.AddError(err.Error())
}
history.End()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if save && dutyapi.DefaultConfig.ConnectionString != "" && summary != nil { | |
| history := models.NewJobHistory(logger.StandardLogger(), "scraper", job.ResourceTypeScraper, scraperUID) | |
| history.Start() | |
| history.SuccessCount = len(results) | |
| history.AddDetails("scrape_summary", *summary) | |
| if hasErrors { | |
| history.AddError(err.Error()) | |
| } | |
| history.End() | |
| if persistErr := history.Persist(dutyCtx.DB()); persistErr != nil { | |
| logger.Warnf("failed to persist job history: %v", persistErr) | |
| } | |
| scrapers.ScraperSummaryCache.Store(scraperUID, *summary) | |
| } | |
| if save && dutyapi.DefaultConfig.ConnectionString != "" && summary != nil { | |
| history := models.NewJobHistory(logger.StandardLogger(), "scraper", job.ResourceTypeScraper, scraperUID) | |
| history.Start() | |
| history.SuccessCount = len(results) | |
| history.AddDetails("scrape_summary", *summary) | |
| if err != nil { | |
| history.AddError(err.Error()) | |
| } | |
| history.End() | |
| if persistErr := history.Persist(dutyCtx.DB()); persistErr != nil { | |
| logger.Warnf("failed to persist job history: %v", persistErr) | |
| } | |
| scrapers.ScraperSummaryCache.Store(scraperUID, *summary) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/run.go` around lines 199 - 212, The code uses a cross-iteration boolean
hasErrors but calls err.Error() when persisting history, causing a nil-pointer
panic when hasErrors is true but err is nil; update the block that builds
JobHistory (models.NewJobHistory / history.AddError) to check the
iteration-local error variable (err) instead of hasErrors (i.e., only call
history.AddError(err.Error()) when err != nil), ensuring summary and err are
validated before dereferencing and preserving existing history.End()/Persist
logic and ScraperSummaryCache.Store(scraperUID, *summary).
| //go:embed frontend/dist/scrapeui.js | ||
| var bundleJS string |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
fd -HI scrapeui.js cmd/scrapeui/frontend/dist 2>/dev/null
cat .gitignore 2>/dev/null | rg -n 'dist|scrapeui' || true
fd -HI Makefile | head -5 | xargs -I{} rg -n 'scrapeui|vite' {} 2>/dev/nullRepository: flanksource/config-db
Length of output: 49
🏁 Script executed:
git ls-files cmd/scrapeui/ 2>/dev/null | head -20Repository: flanksource/config-db
Length of output: 991
🏁 Script executed:
cat .gitignore 2>/dev/nullRepository: flanksource/config-db
Length of output: 584
🏁 Script executed:
find . -name "assets.go" -type f 2>/dev/nullRepository: flanksource/config-db
Length of output: 89
🏁 Script executed:
cat -n cmd/scrapeui/assets.goRepository: flanksource/config-db
Length of output: 199
🏁 Script executed:
cat -n cmd/scrapeui/frontend/.gitignoreRepository: flanksource/config-db
Length of output: 98
🏁 Script executed:
find . -name "Makefile" -o -name "*.sh" -o -name "package.json" | grep -E "(Makefile|build|package.json)" | head -10Repository: flanksource/config-db
Length of output: 137
🏁 Script executed:
cat -n Makefile | head -50Repository: flanksource/config-db
Length of output: 2066
🏁 Script executed:
cat -n cmd/scrapeui/frontend/package.jsonRepository: flanksource/config-db
Length of output: 500
🏁 Script executed:
ls -la cmd/scrapeui/frontend/dist/ 2>/dev/null || echo "dist directory does not exist"Repository: flanksource/config-db
Length of output: 94
🏁 Script executed:
grep -r "scrapeui.js" . --include="*.go" --include="*.ts" --include="*.tsx" 2>/dev/null | head -10Repository: flanksource/config-db
Length of output: 202
🏁 Script executed:
grep -r "vite\|scrapeui" Makefile --include="*.md" --include="*.txt" 2>/dev/null | head -20Repository: flanksource/config-db
Length of output: 47
🏁 Script executed:
find . -path ./vendor -prune -o -name "*.yml" -o -name "*.yaml" | grep -i -E "(ci|action|workflow)" | head -10Repository: flanksource/config-db
Length of output: 366
🏁 Script executed:
cat -n Makefile | tail -100 | head -50Repository: flanksource/config-db
Length of output: 2103
🏁 Script executed:
cat -n .github/workflows/build.ymlRepository: flanksource/config-db
Length of output: 1344
🏁 Script executed:
cat -n cmd/scrapeui/frontend/vite.config.tsRepository: flanksource/config-db
Length of output: 7144
🏁 Script executed:
cat -n build/DockerfileRepository: flanksource/config-db
Length of output: 1632
🏁 Script executed:
grep -A 10 -B 5 "frontend\|vite\|npm\|yarn" build/DockerfileRepository: flanksource/config-db
Length of output: 47
🏁 Script executed:
grep -n "build-prod" Makefile -A 20Repository: flanksource/config-db
Length of output: 724
🏁 Script executed:
git ls-files cmd/scrapeui/frontend/dist/ 2>/dev/null || echo "dist/ not in git"Repository: flanksource/config-db
Length of output: 47
Frontend build missing: go build will fail due to missing frontend/dist/scrapeui.js.
The //go:embed frontend/dist/scrapeui.js directive requires the embedded file to exist at compile time. However:
cmd/scrapeui/frontend/dist/is gitignored and does not exist in the repository- The build process (
Makefile'sbuild-prodtarget andDockerfile) does not runnpm buildbeforego build
This causes build failures in CI and for developers. Add a frontend build step to the Makefile before the Go build (e.g., cd cmd/scrapeui/frontend && npm install && npm run build), or include it in the Dockerfile before line 26.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/scrapeui/assets.go` around lines 5 - 6, The embed directive //go:embed
frontend/dist/scrapeui.js (variable bundleJS) fails the Go build because
frontend/dist/scrapeui.js is not present; add a frontend build step that runs in
cmd/scrapeui/frontend (npm install && npm run build) before any go build that
compiles the package referencing bundleJS—either add that step to the Makefile's
build-prod target before invoking go build or add the npm install && npm run
build step in the Dockerfile prior to the Go build stage so the embedded file
exists at compile time.
| func (s *Server) snapshot() Snapshot { | ||
| logs := "" | ||
| if s.logBuf != nil { | ||
| logs = s.logBuf.String() | ||
| } | ||
| return Snapshot{ | ||
| Scrapers: s.scrapers, | ||
| Results: s.results, | ||
| Relationships: s.relationships, | ||
| ConfigMeta: s.configMeta, | ||
| Issues: s.issues, | ||
| Counts: BuildCounts(s.results, s.relationships), | ||
| SaveSummary: s.summary, | ||
| Snapshots: s.snapshots, | ||
| ScrapeSpec: s.scrapeSpec, | ||
| Properties: s.properties, | ||
| LogLevel: s.logLevel, | ||
| HAR: s.har, | ||
| Logs: logs, | ||
| Done: s.done, | ||
| StartedAt: s.startedAt, | ||
| BuildInfo: s.buildInfo, | ||
| LastScrapeSummary: s.lastScrapeSummary, | ||
| } | ||
| } |
There was a problem hiding this comment.
Data race: snapshot() returns shared map/slice refs; encoding happens after releasing the lock.
snapshot() copies the struct but the map/slice fields (s.configMeta, s.snapshots, s.results, s.relationships, s.issues, s.scrapers) still point at the server's live containers. In handleJSON the RLock is released (line 403) before json.NewEncoder(...).Encode(data) runs, and handleSSE does the same at lines 423/443. Meanwhile UpdateScraper, SetSnapshots, SetProperties, SetBuildInfo, SetLastScrapeSummary, and SetHAR take the write lock and mutate exactly those maps/slices. The encoder's map iteration concurrent with those writes will crash the process with fatal error: concurrent map iteration and map write (this is not a benign torn read — Go panics).
Either hold the lock across Encode, or deep-copy the snapshot before unlocking.
🛡️ Minimal fix: hold the lock for the full encode
func (s *Server) handleJSON(w http.ResponseWriter, _ *http.Request) {
- s.mu.RLock()
- data := s.snapshot()
- s.mu.RUnlock()
-
- w.Header().Set("Content-Type", "application/json")
- json.NewEncoder(w).Encode(data) //nolint:errcheck
+ w.Header().Set("Content-Type", "application/json")
+ s.mu.RLock()
+ defer s.mu.RUnlock()
+ data := s.snapshot()
+ json.NewEncoder(w).Encode(data) //nolint:errcheck
}Same treatment needed in handleSSE (both the initial marshal around lines 420-426 and the per-tick marshal around lines 439-445). Note that encoding while holding the write-blocking lock does penalize writers; if that matters, instead deep-copy the maps/slices inside the RLock and encode the copy.
Also applies to: 400-407, 420-452
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/scrapeui/server.go` around lines 255 - 279, snapshot() returns a Snapshot
struct that still references server-owned maps/slices (s.configMeta,
s.snapshots, s.results, s.relationships, s.issues, s.scrapers) and
handleJSON/handleSSE release the RLock before running json.Encode, allowing
concurrent writers (UpdateScraper, SetSnapshots, SetProperties, SetBuildInfo,
SetLastScrapeSummary, SetHAR) to mutate those maps and trigger panics; fix by
either holding the server mutex (RLock/RUnlock) across the entire json.Encode
call in handleJSON and both encoding sites in handleSSE so encoding happens
while the snapshot's underlying containers are protected, or instead deep-copy
all map/slice fields inside snapshot() while still holding the RLock and return
those copies so encoding can safely occur after unlocking.
| tx.Exec("SAVEPOINT bulk_insert") | ||
|
|
||
| if err := tx.Raw(newSQL).Scan(&newRows).Error; err != nil { | ||
| tx.Rollback() | ||
| return result, fmt.Errorf("failed to upsert config access: %w", err) | ||
| if !dutydb.IsForeignKeyError(err) { | ||
| tx.Rollback() | ||
| return result, fmt.Errorf("failed to upsert config access: %w", err) | ||
| } | ||
|
|
||
| // Restore transaction state; temp table is preserved. | ||
| tx.Exec("ROLLBACK TO SAVEPOINT bulk_insert") | ||
|
|
||
| // Row-by-row with exception handling: successfully inserted rows | ||
| // are deleted from the temp table; FK violations stay. | ||
| fallbackSQL := fmt.Sprintf(` | ||
| DO $$ | ||
| DECLARE | ||
| v_rec RECORD; | ||
| BEGIN | ||
| FOR v_rec IN SELECT * FROM %s LOOP | ||
| BEGIN | ||
| INSERT INTO config_access (id, config_id, external_user_id, external_role_id, | ||
| external_group_id, scraper_id, application_id, source, created_at) | ||
| VALUES (v_rec.id, v_rec.config_id, v_rec.external_user_id, v_rec.external_role_id, | ||
| v_rec.external_group_id, v_rec.scraper_id, v_rec.application_id, v_rec.source, v_rec.created_at) | ||
| ON CONFLICT (id) DO UPDATE SET deleted_at = NULL | ||
| WHERE config_access.deleted_at IS NOT NULL; | ||
| DELETE FROM %s WHERE id = v_rec.id; | ||
| EXCEPTION WHEN foreign_key_violation THEN | ||
| NULL; | ||
| END; | ||
| END LOOP; | ||
| END $$; | ||
| `, tempTable, tempTable) | ||
|
|
||
| if err := tx.Exec(fallbackSQL).Error; err != nil { | ||
| tx.Rollback() | ||
| return result, fmt.Errorf("failed to fallback upsert config access: %w", err) | ||
| } | ||
|
|
||
| var fkErrorCount int64 | ||
| tx.Raw(fmt.Sprintf("SELECT count(*) FROM %s", tempTable)).Scan(&fkErrorCount) | ||
| result.foreignKeyErrors = int(fkErrorCount) | ||
| result.saved -= result.foreignKeyErrors | ||
|
|
||
| if fkErrorCount > 0 { | ||
| ctx.Logger.Warnf("config_access: %d rows with FK violations (scraper=%s)", fkErrorCount, scraperIDStr) | ||
| logFKDiagnostics(ctx, tx, tempTable) | ||
| } | ||
| } |
There was a problem hiding this comment.
FK fallback path silently drops PermissionAdded change records.
When the bulk INSERT fails with an FK error, we roll back to the savepoint and re-insert row-by-row inside a PL/pgSQL DO block. The DO block cannot RETURNING, so newRows stays empty for this path, and the loop at line 213 produces no result.added entries — yet these rows are in fact being inserted and counted in result.saved. Downstream, this means permission-added change history is lost for any scrape that triggers the fallback (which is exactly the case the change-tracking was designed for).
A few options, in order of simplicity:
- Capture inserted IDs into a side temp table inside the
DOblock andSELECT ... FROM ... LEFT JOIN config_accessafter, so you can both detect new-vs-restored and fetch the FK columns needed forbuildPermissionSummary. - After the
DOblock, queryconfig_accessfiltered by the IDs still present intempTablesubtracted from the original set — any id that's no longer in the temp table (and wasn't already inconfig_accesspre-run) is a new insert. - Drop the
DOblock and do row-by-row in Go with savepoints; slower but retainsRETURNING.
Leaving it as-is without at least a comment noting the semantic gap is likely to cause confusion later.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@db/permission_changes.go` around lines 163 - 211, The FK-fallback PL/pgSQL DO
block (used after tx.Exec("ROLLBACK TO SAVEPOINT bulk_insert") that loops over
tempTable) performs row-by-row INSERTs but cannot RETURNING, so newRows remains
empty and result.added (and subsequent buildPermissionSummary) loses actual
PermissionAdded change records; fix by capturing inserted IDs into a side temp
table inside the DO block (or alternatively query config_access for IDs removed
from tempTable after the DO block) and then populate newRows/result.added from
that captured set before continuing; update references around tempTable,
newRows, result.saved/result.added, and the fallbackSQL/DO block handling so
change history is reconstructed when falling back.
| .PHONY: bench | ||
| bench: | ||
| go test -bench=. -benchmem -run=^$$ ./bench $(BENCH_ARGS) | ||
|
|
There was a problem hiding this comment.
Duplicate bench target — the new recipe is silently overridden.
bench is defined twice (lines 105–108 and 299–304). make emits "overriding recipe for target 'bench'" (see pipeline logs) and the earlier, $(BENCH_ARGS)-aware recipe is discarded in favor of the later one. Either remove one, or merge them into a single target (e.g., parameterize the old one to honor BENCH_ARGS and -bench selection via variables).
🛠️ Suggested consolidation
-.PHONY: bench
-bench:
- go test -bench=. -benchmem -run=^$$ ./bench $(BENCH_ARGS)
-
-
-
.PHONY: gotest-prod
gotest-prod:
$(validate-envtest-assets) \
KUBEBUILDER_ASSETS="$$ASSETS" go test -tags rustdiffgen -skip ^TestE2E$$ ./... -coverprofile cover.out
@@
-.PHONY: bench
-bench:
- go test ./... -bench='^Benchmark(EventProcessing|CacheMemory|Deserialization)' \
- -benchmem -run='^$$' \
- -count=3 \
- -benchtime=2s -v
+.PHONY: bench
+bench:
+ go test -bench=. -benchmem -run='^$$' ./bench $(BENCH_ARGS)
+
+.PHONY: bench-all
+bench-all:
+ go test ./... -bench='^Benchmark(EventProcessing|CacheMemory|Deserialization)' \
+ -benchmem -run='^$$' \
+ -count=3 \
+ -benchtime=2s -vAlso applies to: 299-304
🧰 Tools
🪛 GitHub Actions: Lint
[warning] 301-107: Makefile warning: overriding recipe for target 'bench' / ignoring old recipe for target 'bench'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Makefile` around lines 105 - 108, The Makefile currently defines the bench
target twice so the earlier recipe that uses BENCH_ARGS is being overridden;
consolidate into a single bench target by removing the duplicate recipe (or
merging them) and ensure the surviving bench target preserves BENCH_ARGS and the
-bench selection (e.g., use a variable like BENCH_ARGS and pass
-bench=$(BENCH_SELECTION) or similar) and keep .PHONY declaration for bench;
update any callers/README if they relied on the discarded behavior.
| if t.ConnectionName != "" { | ||
| connection, err := ctx.HydrateConnectionByURL(t.ConnectionName) | ||
| if err != nil { | ||
| return t, fmt.Errorf("could not hydrate connection: %w", err) | ||
| } else if connection == nil { | ||
| return t, fmt.Errorf("connection %s not found", t.ConnectionName) | ||
| if err := azureConn.HydrateConnection(ctx); err != nil { | ||
| return t, azureConn, fmt.Errorf("could not hydrate connection: %w", err) | ||
| } | ||
|
|
||
| t.ClientID.ValueStatic = connection.Username | ||
| t.ClientSecret.ValueStatic = connection.Password | ||
| t.TenantID = connection.Properties["tenant"] | ||
| return t, nil | ||
| t.ClientID = *azureConn.ClientID | ||
| t.ClientSecret = *azureConn.ClientSecret | ||
| t.TenantID = azureConn.TenantID | ||
| return t, azureConn, nil | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check whether duty/connection.AzureConnection.HydrateConnection can leave ClientID/ClientSecret nil (bearer-token, workload identity, etc.)
fd -t f -e go . | xargs rg -nP '(?s)func\s+\(\w*\s*\*?AzureConnection\)\s+HydrateConnection' -A 80
rg -nP 'TokenCredential\s*\(' --type=go -C2Repository: flanksource/config-db
Length of output: 410
🏁 Script executed:
#!/bin/bash
# Find AzureConnection definition and HydrateConnection method implementation
fd -t f -e go . | xargs rg -l 'HydrateConnection' | head -20Repository: flanksource/config-db
Length of output: 228
🏁 Script executed:
#!/bin/bash
# Search for HydrateConnection method definition with more context
rg -B5 -A50 'func.*HydrateConnection' --type go | head -150Repository: flanksource/config-db
Length of output: 47
🏁 Script executed:
#!/bin/bash
# Look for AzureConnection struct definition
rg -B2 -A20 'type AzureConnection struct' --type goRepository: flanksource/config-db
Length of output: 47
🏁 Script executed:
#!/bin/bash
# Read the azure.go file to understand context around lines 133-141
cat -n scrapers/azure/azure.go | sed -n '100,170p'Repository: flanksource/config-db
Length of output: 2888
🏁 Script executed:
#!/bin/bash
# Look for where azureConn is created/initialized
rg -B10 'azureConn.*:=' scrapers/azure/azure.go | head -80Repository: flanksource/config-db
Length of output: 954
🏁 Script executed:
#!/bin/bash
# Check imports and AzureConnection type definition in azure.go
head -50 scrapers/azure/azure.goRepository: flanksource/config-db
Length of output: 1976
🏁 Script executed:
#!/bin/bash
# Search for any usage or error handling patterns around HydrateConnection
rg -B5 -A10 'HydrateConnection' scrapers/ --type go | head -80Repository: flanksource/config-db
Length of output: 4421
🏁 Script executed:
#!/bin/bash
# Check if there are any null checks or guards around ClientID/ClientSecret after HydrateConnection elsewhere
rg 'ClientID.*!=' scrapers/azure/ --type goRepository: flanksource/config-db
Length of output: 47
Unconditional nil dereference after HydrateConnection for bearer-token authentication paths.
The hydrateConnection doc comment states that ConnectionName "handles service-principal and bearer-token fallbacks," but lines 137–138 unconditionally dereference *azureConn.ClientID and *azureConn.ClientSecret. Bearer-token, workload-identity, or federated-credential connections will not populate these fields, and HydrateConnection may leave them nil, causing a panic.
Add defensive nil checks before dereferencing, matching the pattern used in the GCP scraper:
Defensive dereference
if t.ConnectionName != "" {
if err := azureConn.HydrateConnection(ctx); err != nil {
return t, azureConn, fmt.Errorf("could not hydrate connection: %w", err)
}
- t.ClientID = *azureConn.ClientID
- t.ClientSecret = *azureConn.ClientSecret
+ if azureConn.ClientID != nil {
+ t.ClientID = *azureConn.ClientID
+ }
+ if azureConn.ClientSecret != nil {
+ t.ClientSecret = *azureConn.ClientSecret
+ }
t.TenantID = azureConn.TenantID
return t, azureConn, nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if t.ConnectionName != "" { | |
| connection, err := ctx.HydrateConnectionByURL(t.ConnectionName) | |
| if err != nil { | |
| return t, fmt.Errorf("could not hydrate connection: %w", err) | |
| } else if connection == nil { | |
| return t, fmt.Errorf("connection %s not found", t.ConnectionName) | |
| if err := azureConn.HydrateConnection(ctx); err != nil { | |
| return t, azureConn, fmt.Errorf("could not hydrate connection: %w", err) | |
| } | |
| t.ClientID.ValueStatic = connection.Username | |
| t.ClientSecret.ValueStatic = connection.Password | |
| t.TenantID = connection.Properties["tenant"] | |
| return t, nil | |
| t.ClientID = *azureConn.ClientID | |
| t.ClientSecret = *azureConn.ClientSecret | |
| t.TenantID = azureConn.TenantID | |
| return t, azureConn, nil | |
| } | |
| if t.ConnectionName != "" { | |
| if err := azureConn.HydrateConnection(ctx); err != nil { | |
| return t, azureConn, fmt.Errorf("could not hydrate connection: %w", err) | |
| } | |
| if azureConn.ClientID != nil { | |
| t.ClientID = *azureConn.ClientID | |
| } | |
| if azureConn.ClientSecret != nil { | |
| t.ClientSecret = *azureConn.ClientSecret | |
| } | |
| t.TenantID = azureConn.TenantID | |
| return t, azureConn, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scrapers/azure/azure.go` around lines 133 - 141, The code unconditionally
dereferences azureConn.ClientID and azureConn.ClientSecret after calling
HydrateConnection (in the ConnectionName branch), which can be nil for
bearer-token/workload-identity flows; update the block that follows
HydrateConnection to defensively check azureConn.ClientID and
azureConn.ClientSecret for nil before assigning to t.ClientID and t.ClientSecret
(only set t.ClientID/t.ClientSecret when the pointers are non-nil), and preserve
the existing assignment for azureConn.TenantID (or similarly check if needed) so
HydrateConnection and the ConnectionName path cannot panic when
ClientID/ClientSecret are absent.
| const ( | ||
| ReleasePermissionViewReleaseDefinition = 1 | ||
| ReleasePermissionEditReleaseDefinition = 2 | ||
| ReleasePermissionDeleteReleaseDefinition = 4 | ||
| ReleasePermissionManageDeployments = 8 | ||
| ReleasePermissionManageReleaseApprovers = 16 | ||
| ReleasePermissionManageReleases = 32 | ||
| ReleasePermissionViewReleases = 64 | ||
| ReleasePermissionCreateReleases = 128 | ||
| ReleasePermissionEditReleaseEnvironment = 256 | ||
| ReleasePermissionDeleteReleaseEnvironment = 512 | ||
| ReleasePermissionAdministerReleasePermissions = 1024 | ||
| ReleasePermissionDeleteReleases = 2048 | ||
| ReleasePermissionManageDefinitionReleaseApprovers = 4096 | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Fetches the authoritative bit->name mapping for the ReleaseManagement namespace
# from any public Azure DevOps org. Replace ORG if needed; any valid org works as
# namespaces are organization-agnostic.
ORG="${ADO_ORG:-microsoft}"
curl -s "https://dev.azure.com/${ORG}/_apis/securitynamespaces/c788c23e-1b46-4162-8f5e-d7585343b5de/actionids?api-version=7.1-preview.1" \
| jq '.value | sort_by(.bit) | .[] | {bit, name}'Repository: flanksource/config-db
Length of output: 122
🏁 Script executed:
#!/bin/bash
# First, let's check the curl response directly to see what went wrong
ORG="microsoft"
curl -s -w "\nHTTP Status: %{http_code}\n" "https://dev.azure.com/${ORG}/_apis/securitynamespaces/c788c23e-1b46-4162-8f5e-d7585343b5de/actionids?api-version=7.1-preview.1" | head -100Repository: flanksource/config-db
Length of output: 2036
🏁 Script executed:
#!/bin/bash
# Search the codebase for comments or tests related to these permissions
rg "ReleasePermission|ManageDeployments" -A 2 -B 2 --type goRepository: flanksource/config-db
Length of output: 8226
🌐 Web query:
Microsoft Azure DevOps ReleaseManagement namespace permissions actionids bit values
💡 Result:
Microsoft Azure DevOps has two primary ReleaseManagement security namespaces with associated permissions using bitmask values (powers of 2). Namespace ID: c788c23e-1b46-4162-8f5e-d7585343b5de (Release definition permissions) - bit 1 (1): ViewReleaseDefinition - bit 2 (2): EditReleaseDefinition - bit 4 (4): DeleteReleaseDefinition - bit 8 (8): ManageReleaseApprovers - Higher bits include: ManageReleases (likely 16), ViewReleases, CreateReleases, EditReleaseEnvironment, DeleteReleaseEnvironment, AdministerReleasePermissions, DeleteReleases, ManageDeployments, ManageReleaseSettings, ManageTaskHubExtension (exact bits follow standard sequential power-of-2 assignment: 16,32,64,128,256,512,1024,2048,4096) Namespace ID: 7c7d32f7-0e86-4cd6-892e-b35dbba870bd (Release Management UI elements) - bit 1 (1): ViewTaskEditor - bit 2 (2): ViewCDWorkflowEditor (or similar, partial in sources) - bit 4 (4): ExportReleaseDefinition (inferred sequential) - bit 8 (8): ViewLegacyUI (inferred) - bit 32 (32): DeploymentSummaryAcrossProjects (example higher bit) Permissions are bitmasks combined for allow/deny in ACLs. Exact complete mappings are not published in static docs; they are product-defined constants retrieved dynamically via: - REST: GET https://dev.azure.com/{org}/_apis/securitynamespaces/{namespaceId}?api-version=7.1-preview.1 (returns actions[] with bit/name) - CLI: az devops security permission namespace show --id {namespaceId} Bits are stable across orgs but new actions may be added over time. Use runtime query for scripting/completeness. Sources confirm standard bitmask pattern (1=2^0, 2=2^1, 4=2^2, etc.).
Citations:
- 1: https://learn.microsoft.com/en-us/azure/devops/organizations/security/namespace-reference?view=azure-devops
- 2: https://stackoverflow.com/questions/49887951/vsts-api-returning-security-namespaces-with-duplicate-name
- 3: https://learn.microsoft.com/en-us/rest/api/azure/devops/security/security-namespaces/query?view=azure-devops-rest-7.1
- 4: https://learn.microsoft.com/en-us/answers/questions/5559248/are-the-bit-values-present-in-security-namespace-a
- 5: https://learn.microsoft.com/en-us/azure/devops/organizations/security/manage-tokens-namespaces?view=azure-devops
Fix release permission bit values to match Azure DevOps ReleaseManagement namespace ordering.
The code's bit→name mapping is misaligned with the ReleaseManagement namespace. Per Microsoft's documented permission order, the bits should be: ManageReleaseApprovers=8, ManageReleases=16, ViewReleases=32, CreateReleases=64, EditReleaseEnvironment=128, DeleteReleaseEnvironment=256, AdministerReleasePermissions=512, DeleteReleases=1024, ManageDeployments=2048.
Currently, the code assigns ManageDeployments=8, ManageReleaseApprovers=16, ManageReleases=32, ViewReleases=64, ..., shifting every permission from bit 8 onward. This silently mis-labels release permissions throughout ParseReleasePermissions and ResolveRoles—for example, a user with effective CreateReleases (bit 64) is reported as ViewReleases, mapping to Viewer instead of Developer.
Current definitions (incorrect)
const (
ReleasePermissionViewReleaseDefinition = 1
ReleasePermissionEditReleaseDefinition = 2
ReleasePermissionDeleteReleaseDefinition = 4
ReleasePermissionManageDeployments = 8
ReleasePermissionManageReleaseApprovers = 16
ReleasePermissionManageReleases = 32
ReleasePermissionViewReleases = 64
ReleasePermissionCreateReleases = 128
ReleasePermissionEditReleaseEnvironment = 256
ReleasePermissionDeleteReleaseEnvironment = 512
ReleasePermissionAdministerReleasePermissions = 1024
ReleasePermissionDeleteReleases = 2048
ReleasePermissionManageDefinitionReleaseApprovers = 4096
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scrapers/azure/devops/client.go` around lines 1011 - 1025, The release
permission constants are shifted and mis-map bits to names, causing
ParseReleasePermissions and ResolveRoles to report wrong roles; update the
constants (ReleasePermissionManageReleaseApprovers,
ReleasePermissionManageReleases, ReleasePermissionViewReleases,
ReleasePermissionCreateReleases, ReleasePermissionEditReleaseEnvironment,
ReleasePermissionDeleteReleaseEnvironment,
ReleasePermissionAdministerReleasePermissions, ReleasePermissionDeleteReleases,
ReleasePermissionManageDeployments) so their bit values follow the Azure
ReleaseManagement ordering (ManageReleaseApprovers=8, ManageReleases=16,
ViewReleases=32, CreateReleases=64, EditReleaseEnvironment=128,
DeleteReleaseEnvironment=256, AdministerReleasePermissions=512,
DeleteReleases=1024, ManageDeployments=2048) ensuring ParseReleasePermissions
and ResolveRoles read the correct bits.
| email := emailFromIdentity(identity) | ||
| if name == "" && email == "" { | ||
| continue | ||
| } | ||
|
|
||
| if identity.IsContainer { | ||
| aliases := append(DescriptorAliases(identity.Descriptor), identity.SubjectDescriptor) | ||
| aliases = append(aliases, DescriptorAliases(identity.SubjectDescriptor)...) | ||
| // No ID — the SQL merge resolves this group against the AAD scraper's | ||
| // authoritative record by alias overlap. AAD takes precedence. | ||
| ctx.AddGroup(dutyModels.ExternalGroup{ | ||
| Name: name, | ||
| Aliases: pq.StringArray(aliases), | ||
| Tenant: config.Organization, | ||
| GroupType: "AzureDevOps", | ||
| }) | ||
| } else { | ||
| ctx.AddUser(dutyModels.ExternalUser{ | ||
| Name: name, | ||
| Email: &email, | ||
| Aliases: pq.StringArray{email, identity.Descriptor, identity.SubjectDescriptor}, | ||
| Tenant: config.Organization, | ||
| UserType: "AzureDevOps", | ||
| }) | ||
| } | ||
|
|
||
| resolvedRoles := ResolveRoles("Release", perm.Permissions, config.Permissions.Roles) | ||
| for _, roleName := range resolvedRoles { | ||
| if _, exists := roleIDs[roleName]; !exists { | ||
| roleID := azure.RoleID(ctx.ScraperID(), roleName) | ||
| roleIDs[roleName] = roleID | ||
| roles = append(roles, dutyModels.ExternalRole{ | ||
| ID: roleID, | ||
| Name: roleName, | ||
| RoleType: "AzureDevOps", | ||
| Tenant: config.Organization, | ||
| }) | ||
| } | ||
|
|
||
| roleID := roleIDs[roleName] | ||
| access := v1.ExternalConfigAccess{ | ||
| ConfigExternalID: v1.ExternalID{ConfigType: ReleaseType, ExternalID: releaseExternalID}, | ||
| ExternalRoleID: &roleID, | ||
| } | ||
| if identity.IsContainer { | ||
| access.ExternalGroupAliases = DescriptorAliases(identity.Descriptor) | ||
| } else { | ||
| access.ExternalUserAliases = []string{email} | ||
| } |
There was a problem hiding this comment.
Empty-string aliases when an identity has a name but no email.
The guard at line 171 is name == "" && email == "", so users with a display name but no resolvable email proceed. Then:
- Line 190:
Aliases: pq.StringArray{email, identity.Descriptor, identity.SubjectDescriptor}—emailis"", so""is stored as an alias. - Line 217:
access.ExternalUserAliases = []string{email}— the access record's only alias is"".
On the merge side, an access row whose sole alias is "" can overlap with any user whose aliases also happened to pick up an empty string, producing incorrect access linkage. Filter empties out before constructing both arrays.
🛡️ Proposed fix
} else {
- ctx.AddUser(dutyModels.ExternalUser{
- Name: name,
- Email: &email,
- Aliases: pq.StringArray{email, identity.Descriptor, identity.SubjectDescriptor},
- Tenant: config.Organization,
- UserType: "AzureDevOps",
- })
+ userAliases := []string{}
+ for _, a := range []string{email, identity.Descriptor, identity.SubjectDescriptor} {
+ if a != "" {
+ userAliases = append(userAliases, a)
+ }
+ }
+ ctx.AddUser(dutyModels.ExternalUser{
+ Name: name,
+ Email: &email,
+ Aliases: pq.StringArray(userAliases),
+ Tenant: config.Organization,
+ UserType: "AzureDevOps",
+ })
}
@@
- if identity.IsContainer {
- access.ExternalGroupAliases = DescriptorAliases(identity.Descriptor)
- } else {
- access.ExternalUserAliases = []string{email}
- }
+ if identity.IsContainer {
+ access.ExternalGroupAliases = DescriptorAliases(identity.Descriptor)
+ } else if email != "" {
+ access.ExternalUserAliases = []string{email}
+ } else {
+ access.ExternalUserAliases = []string{identity.Descriptor}
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| email := emailFromIdentity(identity) | |
| if name == "" && email == "" { | |
| continue | |
| } | |
| if identity.IsContainer { | |
| aliases := append(DescriptorAliases(identity.Descriptor), identity.SubjectDescriptor) | |
| aliases = append(aliases, DescriptorAliases(identity.SubjectDescriptor)...) | |
| // No ID — the SQL merge resolves this group against the AAD scraper's | |
| // authoritative record by alias overlap. AAD takes precedence. | |
| ctx.AddGroup(dutyModels.ExternalGroup{ | |
| Name: name, | |
| Aliases: pq.StringArray(aliases), | |
| Tenant: config.Organization, | |
| GroupType: "AzureDevOps", | |
| }) | |
| } else { | |
| ctx.AddUser(dutyModels.ExternalUser{ | |
| Name: name, | |
| Email: &email, | |
| Aliases: pq.StringArray{email, identity.Descriptor, identity.SubjectDescriptor}, | |
| Tenant: config.Organization, | |
| UserType: "AzureDevOps", | |
| }) | |
| } | |
| resolvedRoles := ResolveRoles("Release", perm.Permissions, config.Permissions.Roles) | |
| for _, roleName := range resolvedRoles { | |
| if _, exists := roleIDs[roleName]; !exists { | |
| roleID := azure.RoleID(ctx.ScraperID(), roleName) | |
| roleIDs[roleName] = roleID | |
| roles = append(roles, dutyModels.ExternalRole{ | |
| ID: roleID, | |
| Name: roleName, | |
| RoleType: "AzureDevOps", | |
| Tenant: config.Organization, | |
| }) | |
| } | |
| roleID := roleIDs[roleName] | |
| access := v1.ExternalConfigAccess{ | |
| ConfigExternalID: v1.ExternalID{ConfigType: ReleaseType, ExternalID: releaseExternalID}, | |
| ExternalRoleID: &roleID, | |
| } | |
| if identity.IsContainer { | |
| access.ExternalGroupAliases = DescriptorAliases(identity.Descriptor) | |
| } else { | |
| access.ExternalUserAliases = []string{email} | |
| } | |
| email := emailFromIdentity(identity) | |
| if name == "" && email == "" { | |
| continue | |
| } | |
| if identity.IsContainer { | |
| aliases := append(DescriptorAliases(identity.Descriptor), identity.SubjectDescriptor) | |
| aliases = append(aliases, DescriptorAliases(identity.SubjectDescriptor)...) | |
| // No ID — the SQL merge resolves this group against the AAD scraper's | |
| // authoritative record by alias overlap. AAD takes precedence. | |
| ctx.AddGroup(dutyModels.ExternalGroup{ | |
| Name: name, | |
| Aliases: pq.StringArray(aliases), | |
| Tenant: config.Organization, | |
| GroupType: "AzureDevOps", | |
| }) | |
| } else { | |
| userAliases := []string{} | |
| for _, a := range []string{email, identity.Descriptor, identity.SubjectDescriptor} { | |
| if a != "" { | |
| userAliases = append(userAliases, a) | |
| } | |
| } | |
| ctx.AddUser(dutyModels.ExternalUser{ | |
| Name: name, | |
| Email: &email, | |
| Aliases: pq.StringArray(userAliases), | |
| Tenant: config.Organization, | |
| UserType: "AzureDevOps", | |
| }) | |
| } | |
| resolvedRoles := ResolveRoles("Release", perm.Permissions, config.Permissions.Roles) | |
| for _, roleName := range resolvedRoles { | |
| if _, exists := roleIDs[roleName]; !exists { | |
| roleID := azure.RoleID(ctx.ScraperID(), roleName) | |
| roleIDs[roleName] = roleID | |
| roles = append(roles, dutyModels.ExternalRole{ | |
| ID: roleID, | |
| Name: roleName, | |
| RoleType: "AzureDevOps", | |
| Tenant: config.Organization, | |
| }) | |
| } | |
| roleID := roleIDs[roleName] | |
| access := v1.ExternalConfigAccess{ | |
| ConfigExternalID: v1.ExternalID{ConfigType: ReleaseType, ExternalID: releaseExternalID}, | |
| ExternalRoleID: &roleID, | |
| } | |
| if identity.IsContainer { | |
| access.ExternalGroupAliases = DescriptorAliases(identity.Descriptor) | |
| } else if email != "" { | |
| access.ExternalUserAliases = []string{email} | |
| } else { | |
| access.ExternalUserAliases = []string{identity.Descriptor} | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scrapers/azure/devops/releases.go` around lines 170 - 218, The code is adding
empty-string aliases when identity has a name but no email; update the user
alias construction and the access alias assignment to filter out empty strings:
when building the pq.StringArray for dutyModels.ExternalUser in the branch that
calls ctx.AddUser (refer to emailFromIdentity, identity.Descriptor and
identity.SubjectDescriptor), omit email if it's empty so the Aliases array never
contains "". Likewise, before setting access.ExternalUserAliases =
[]string{email} (and generally before assigning any ExternalUserAliases),
construct the slice by including only non-empty alias strings; keep the same
ordering and keep DescriptorAliases handling unchanged for container cases.
| name := ResolvedIdentityName(identity, project.Name) | ||
| email := emailFromIdentity(identity) | ||
| if identity.ProviderDisplayName == "" && email == "" { | ||
| if name == "" && email == "" { | ||
| continue | ||
| } | ||
|
|
||
| if identity.IsContainer { | ||
| aliases := append(DescriptorAliases(identity.Descriptor), identity.SubjectDescriptor) | ||
| aliases = append(aliases, DescriptorAliases(identity.SubjectDescriptor)...) | ||
| // No ID — the SQL merge resolves this group against the AAD scraper's | ||
| // authoritative record by alias overlap. AAD takes precedence. | ||
| ctx.AddGroup(dutyModels.ExternalGroup{ | ||
| Name: identity.ProviderDisplayName, | ||
| Aliases: pq.StringArray{identity.Descriptor, identity.SubjectDescriptor}, | ||
| Name: name, | ||
| Aliases: pq.StringArray(aliases), | ||
| Tenant: config.Organization, | ||
| GroupType: "AzureDevOps", | ||
| }) | ||
| } else { | ||
| ctx.AddUser(dutyModels.ExternalUser{ | ||
| Name: identity.ProviderDisplayName, | ||
| Name: name, | ||
| Email: &email, | ||
| Aliases: pq.StringArray{email, identity.Descriptor, identity.SubjectDescriptor}, | ||
| Tenant: config.Organization, | ||
| UserType: "AzureDevOps", | ||
| }) | ||
| } |
There was a problem hiding this comment.
Empty SubjectDescriptor / empty email leak as empty-string aliases.
Several alias-building sites here accept blanks unconditionally, which will store "" entries in the resulting pq.StringArray — risky given the comment on lines 149‑150 says the group is merged against AAD "by alias overlap". Two empty-alias arrays overlap trivially.
- Line 147 appends
identity.SubjectDescriptorwithout checking it's non-empty. - Line 148 calls
DescriptorAliases(identity.SubjectDescriptor); when the input is empty the helper falls through toreturn []string{descriptor}, i.e.[""]. - Line 161 puts
emailandidentity.SubjectDescriptorinto the user'sAliasesarray even when either is empty (the line 142 guard only requires one ofname/emailto be non-empty). - Line 189 similarly sets
ExternalUserAliases = []string{email}which becomes[""]whenemail == "".
BuildIdentityMap in client.go guards this correctly (if id.SubjectDescriptor != ""); the same guard should be applied here.
🛡️ Proposed fix
if identity.IsContainer {
- aliases := append(DescriptorAliases(identity.Descriptor), identity.SubjectDescriptor)
- aliases = append(aliases, DescriptorAliases(identity.SubjectDescriptor)...)
+ aliases := DescriptorAliases(identity.Descriptor)
+ if identity.SubjectDescriptor != "" {
+ aliases = append(aliases, identity.SubjectDescriptor)
+ aliases = append(aliases, DescriptorAliases(identity.SubjectDescriptor)...)
+ }
// No ID — the SQL merge resolves this group against the AAD scraper's
// authoritative record by alias overlap. AAD takes precedence.
ctx.AddGroup(dutyModels.ExternalGroup{
Name: name,
Aliases: pq.StringArray(aliases),
Tenant: config.Organization,
GroupType: "AzureDevOps",
})
} else {
+ userAliases := []string{identity.Descriptor}
+ if email != "" {
+ userAliases = append(userAliases, email)
+ }
+ if identity.SubjectDescriptor != "" {
+ userAliases = append(userAliases, identity.SubjectDescriptor)
+ }
ctx.AddUser(dutyModels.ExternalUser{
Name: name,
Email: &email,
- Aliases: pq.StringArray{email, identity.Descriptor, identity.SubjectDescriptor},
+ Aliases: pq.StringArray(userAliases),
Tenant: config.Organization,
UserType: "AzureDevOps",
})
}Apply similar non-empty guard at line 189 (ExternalUserAliases).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scrapers/azure/devops/repositories.go` around lines 140 - 165, The current
alias construction in the DevOps identity handling (around ResolvedIdentityName,
emailFromIdentity, DescriptorAliases and the ctx.AddGroup / ctx.AddUser calls)
can push empty strings into pq.StringArray (e.g., empty SubjectDescriptor or
empty email), causing spurious alias overlap; update the logic to only append
non-empty alias values: (1) before appending identity.SubjectDescriptor or
calling DescriptorAliases(identity.SubjectDescriptor) ensure
identity.SubjectDescriptor != "" (or filter the returned slice to remove ""
entries), (2) only include email in the Aliases/ExternalUserAliases when email
!= "", and (3) when building aliases from DescriptorAliases(filter result) drop
any empty elements—mirror the guard used in BuildIdentityMap so ctx.AddGroup and
ctx.AddUser receive pq.StringArray with no empty strings.
Summary by CodeRabbit
New Features
uicommand for browsing configurations, relationships, and diagnostics.Documentation