[BDP-102028] feat(optimizer): [3/N] Analyzer#533
Open
mkuchenbecker wants to merge 39 commits into
Open
Conversation
…ling Introduces apps/optimizer-analyzer, a Spring Boot CommandLineRunner that evaluates every table in table_stats against pluggable OperationAnalyzer strategies. The first strategy, OrphanFilesDeletionAnalyzer, schedules OFD operations with 24h success / 1h failure retry cadence, a 6h SCHEDULED timeout, and a 5-strike circuit breaker. Key design choices: - Bulk-loads operations and history into maps (one query per type), then iterates the stats list — O(types) queries, not O(tables). - Uses the existing generic find() repository methods with null params. - Pure unit tests with Mockito — no Spring context needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was referenced Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
- Rename TableSummary to Table, TableOperationRecord to TableOperation - Add Table.from(TableStatsRow) and TableOperation.from(TableOperationRow) - Add TableOperation.pending(Table, type) factory and toRow() for JPA - Move circuit breaker check into OperationAnalyzer as overridable default - Parameterize analyze() with optional filters (optype, db, table, uuid) - Inline loadOpsMap, loadHistoryMap, remove standalone converter methods - Expand CadencePolicy field javadoc with plain-english examples - Add TODOs: per-db iteration, benchmarking, querybuilder, CB reset Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mkuchenbecker
commented
Apr 7, 2026
…nalyzer - apps/optimizer shared module: rename databaseId -> databaseName in TableStatsRow + TableStatsRepository; submittedAt -> completedAt in TableOperationHistoryRow + TableOperationHistoryRepository; tighten database_name and table_name from VARCHAR(255) to VARCHAR(128). - Analyzer Table model: rename databaseId -> databaseName so the domain object matches the underlying entity. Update Table.from factory and downstream usages in AnalyzerRunner, TableOperation, and CadencePolicy (which now reads completedAt off history rows). - Analyzer tests: update Table builder and TableOperationHistoryRow builder usages to the renamed fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The empty @configuration class did nothing. @SpringBootApplication on AnalyzerApplication already triggers @componentscan, which discovers all @Component-annotated beans without help. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The circuit breaker was hardcoded (threshold=5, no reset, no operator visibility) and forced the AnalyzerRunner to materialize the full history of every (table, operation_type) just to check the last N rows. Cadence policy only needs the single latest history entry; pulling everything was wasted I/O. Changes: - Remove getCircuitBreakerThreshold and isCircuitBroken from OperationAnalyzer. - Add a TODO documenting requirements for the eventual replacement (configurable threshold, exponential-backoff reset, operator-visible signal). - In AnalyzerRunner, fold history loading into a per-(uuid, type) map holding only the most-recent entry; drop the per-table history list and the isCircuitBroken call. - Add a TODO to switch the history scan to a windowed query that returns at most one row per (uuid, type). - Drop the two circuit-breaker tests from AnalyzerRunnerTest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mkuchenbecker
commented
May 13, 2026
mkuchenbecker
commented
May 13, 2026
mkuchenbecker
commented
May 13, 2026
mkuchenbecker
commented
May 13, 2026
Drop the blank lines inside analyze() between statements (post-signature, post-dbs, post-forEach). Also remove the explanatory comment above latestHistory — the variable name is self-describing and the tied-rows edge case it warned about doesn't affect correctness here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move imports from com.linkedin.openhouse.analyzer.model to the shared com.linkedin.openhouse.optimizer.model package. The five local copies (Table, TableOperation, OperationType, OperationStatus, HistoryStatus) are removed; tests and runtime classes import from the shared location. Adds a /* ... */ block comment above the per-table loop in AnalyzerRunner.analyzeDatabase walking through what each step does: opt-in check, current-op and latest-history lookup, delegation to shouldSchedule, and PENDING persistence on true. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `:apps:optimizer-data` module was removed in the prior optimizer-2 forward commit. The analyzer's JPA + model classes now live in `:services:optimizer`; package FQNs are unchanged, so no Java imports need updating — only the gradle dep.
…lders TableStatsRow and TableOperationsRow had their constructors locked down (protected) and @Setter removed when the entity shape was realigned with optimizer-0. The analyzer tests still used `new Entity()` + setters, which no longer compiles. Switch to the builder pattern; behaviour-equivalent.
Adapt the analyzer to the entity→db rename and the removal of factory methods on model/ data types. The analyzer's strategy interface and policy now work in pure model/ types (TableOperationsHistory instead of TableOperationsHistoryRow); AnalyzerRunner uses ModelDbMapper to bridge db rows to those types. - OperationAnalyzer.shouldSchedule: parameter type changes from db.TableOperationsHistoryRow to model.TableOperationsHistory. - CadencePolicy: same parameter change; remove String→HistoryStatus parsing — the model carries it as a typed enum. - CadenceBasedOrphanFilesDeletionAnalyzer: signature update. - AnalyzerRunner: inject ModelDbMapper. Imports switch from entity/ to db/. Repository find / findLatestPerTable now take db.OperationType (translated via dbMapper.toDbOperationType). Row → model translation uses dbMapper.toTable / .toOperation / .toHistory. Persisting a new PENDING operation goes through dbMapper.toRow(op) instead of the removed TableOperation.toRow(). - Tests rewritten with the new types and pass a real ModelDbMapper into the runner-under-test.
…apper AnalyzerRunner no longer injects ModelDbMapper. Conversion goes through the model types' static factories / instance methods: - Row → model: Table.fromRow, TableOperation.fromRow, TableOperationsHistory.fromRow. - Model → row: op.toRow(). - model.OperationType → db.OperationType via operationType.toDb(). Test: drop the ModelDbMapper field; use static factories directly.
mkuchenbecker
commented
May 15, 2026
Comment on lines
+60
to
+63
| if (analyzerOpt.isEmpty()) { | ||
| log.warn("No analyzer registered for operation type {}; skipping", operationType); | ||
| return; | ||
| } |
Collaborator
Author
There was a problem hiding this comment.
impossible this is false
mkuchenbecker
commented
May 15, 2026
| OperationAnalyzer analyzer = analyzerOpt.get(); | ||
| List<String> dbs = databaseName.map(List::of).orElseGet(statsRepo::findDistinctDatabaseNames); | ||
| log.info("Analyzing {} across {} database(s)", operationType, dbs.size()); | ||
| dbs.forEach(db -> analyzeDatabase(analyzer, db, tableName, tableUuid)); |
Collaborator
Author
There was a problem hiding this comment.
.get is free. OperationAnalyzer analyzer = analyzerOpt.get();
mkuchenbecker
commented
May 15, 2026
| Optional<String> tableName, | ||
| Optional<String> tableUuid) { | ||
|
|
||
| com.linkedin.openhouse.optimizer.db.OperationType dbOperationType = |
Collaborator
Author
There was a problem hiding this comment.
Import vs using fully qualified names. apply the rule across all prs in the stack. make a plan.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Optimizer Stack
Summary
PR 3 of N in the optimizer stack.
Overall Project
Service Design doc.
Introduces
apps/optimizer-analyzer, a Spring Boot CommandLineRunner that evaluates every table intable_statsagainst pluggableOperationAnalyzerstrategies. The first strategy,OrphanFilesDeletionAnalyzer, schedules OFD operations with 24h success / 1h failure retry cadence, a 6h SCHEDULED timeout, and a 5-strike circuit breaker.Key design choices:
find()repository methods with null params.Changes
Core:
AnalyzerRunner— loads table_stats, pre-loads operations and history into maps, evaluates each table against all analyzers, circuit breaker logic.Strategy interface:
OperationAnalyzer—isEnabled(table),shouldSchedule(table, currentOp, latestHistory),getCircuitBreakerThreshold().Cadence policy:
CadencePolicy— encapsulates time-based retry logic shared across operation types.OFD analyzer:
OrphanFilesDeletionAnalyzer— enabled viamaintenance.optimizer.ofd.enabledtable property.Testing Done
25 unit tests:
AnalyzerRunnerTest(7 tests) — eligible table insertion, cadence skip, disabled table, shouldSchedule=false, null UUID, circuit breaker trip, below-threshold passOrphanFilesDeletionAnalyzerTest(18 tests) — isEnabled variants, shouldSchedule for no-op/PENDING/SCHEDULING/SCHEDULED with history combinationsAdditional Information