Skip to content

[BDP-102028] feat(optimizer): [2/N] Optimizer REST Service and Controller#531

Open
mkuchenbecker wants to merge 39 commits into
mkuchenb/optimizer-1from
mkuchenb/optimizer-2
Open

[BDP-102028] feat(optimizer): [2/N] Optimizer REST Service and Controller#531
mkuchenbecker wants to merge 39 commits into
mkuchenb/optimizer-1from
mkuchenb/optimizer-2

Conversation

@mkuchenbecker
Copy link
Copy Markdown
Collaborator

@mkuchenbecker mkuchenbecker commented Apr 6, 2026

Optimizer Stack

PR Content
#527 Data Model
#530 Database Repos
#531 (this) REST service
#533 Analyzer app
#534 Scheduler app
#tbd Spark BatchedOFD app
#tbd Infra, docker-compose, smoke test

Summary

PR 2 of N in the optimizer stack.
Overall Project
Service Design doc.

Service layer and REST controllers for the optimizer service, plus the apps/optimizer shared module providing lightweight entity/repo copies for the analyzer and scheduler apps.

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

Service layer: OptimizerDataService interface and OptimizerDataServiceImpl — CRUD operations, complete-operation lifecycle, stats upsert with history double-write, filtered queries.

Controllers: TableOperationsController, TableOperationsHistoryController, TableStatsController — REST endpoints per the design doc API spec.

Shared module (apps/optimizer): Lightweight entity and repository copies used by the analyzer and scheduler apps to read optimizer state directly from MySQL.

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

H2 integration tests in OptimizerDataServiceImplTest (5 tests):

  • completeOperation_writesHistoryFromOperationRow — saves SCHEDULED row, completes it, asserts history DTO fields
  • completeOperation_notFound_returnsEmpty — completes nonexistent ID, asserts empty
  • upsertTableStats_createsNewRow — upserts new table, asserts DTO and repo row
  • upsertTableStats_updatesExistingRow — upserts twice, asserts overwrite with single row
  • upsertTableStats_appendsHistoryOnEveryCall — upserts twice, asserts 2 history rows
./gradlew :services:optimizer:test
# BUILD SUCCESSFUL — all 25 tests pass (repo tests from PR 1 + 5 new service tests)

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

mkuchenbecker and others added 2 commits April 6, 2026 11:36
Service interface and implementation for all optimizer CRUD operations
including complete-operation lifecycle, stats upsert with history
double-write, and filtered queries. Three REST controllers expose the
endpoints. The apps/optimizer shared module provides lightweight
entity/repo copies for the analyzer and scheduler apps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Align OptimizerDataServiceImpl with renamed repository methods from
optimizer-1 review feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mkuchenbecker mkuchenbecker force-pushed the mkuchenb/optimizer-2 branch from e628426 to ef3260f Compare April 6, 2026 18:37
Resolve repo conflicts by taking optimizer-1's clean find-only versions.
Scheduler-specific methods and streamAll removed per review feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator Author

@mkuchenbecker mkuchenbecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs tests

Propagate CompleteOperationRequest orphan field removal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mkuchenbecker mkuchenbecker changed the title feat(optimizer): add REST service layer, controllers, and shared module feat(optimizer): add REST service layer Apr 6, 2026
@mkuchenbecker mkuchenbecker changed the title feat(optimizer): add REST service layer feat(optimizer): [3/N] Optimizer REST service layer Apr 6, 2026
H2 integration tests for OptimizerDataServiceImpl covering
completeOperation (write history, not-found) and upsertTableStats
(create, update, history append).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strengthen upsertTableStats test to verify history rows contain the raw
delta stats from each call, not just the row count.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mkuchenbecker mkuchenbecker marked this pull request as ready for review April 6, 2026 19:46
@mkuchenbecker mkuchenbecker changed the title feat(optimizer): [3/N] Optimizer REST service layer feat(optimizer): [2/N] Optimizer REST service layer Apr 6, 2026
* with the history row, or 404 if the operation does not exist.
*/
@PostMapping("/{id}/complete")
public ResponseEntity<TableOperationsHistoryDto> completeOperation(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need table name and database name as input. We can keep the url format same as how tables sevice urls are specified like v1/databases/DB/tables/TABLE

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or can be passed as parameters.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These APIs are intentionally keyed by table UUID because of drop-and-recreate semantics: a recreated table is a brand-new entity for the optimizer (new stats, new storage, new operation history), and a name-based key would conflate two distinct identities. The Spark caller of /{id}/complete already has the operation id. We'll add a name-based variant when a concrete use case lands; today the only such use case is operation-history browsing, which is covered separately.


/** Fetch a single operation row by its ID, regardless of status. Returns 404 if not found. */
@GetMapping("/{id}")
public ResponseEntity<TableOperationsDto> getTableOperation(@PathVariable String id) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment database name and table name needed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer — fetch-by-id is intentional for the same drop-and-recreate reason. The list endpoint at the controller root already accepts databaseName / tableName as optional query-param filters when a multi-criteria browse is needed.


/** REST controller for {@code table_operations}. */
@RestController
@RequestMapping("/v1/table-operations")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have common format for all urls like common prefix /v1/optimizer/ and operations can be be suffix. So the url can be something like /v1/optimizer/operations.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Renamed to /v1/optimizer/operations. Applied the same /v1/optimizer/... namespacing across all three controllers.

@RestController
@RequestMapping("/v1/optimizer/databases/{databaseName}/tables/{tableName}")
@RequiredArgsConstructor
public class TableByNameController {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge this controller with TableOperationsController or TableOperationsHistoryController. Basically in these controllers additional endpoint can be added.

@mkuchenbecker mkuchenbecker changed the title feat(optimizer): [2/N] Optimizer REST service layer [BDP-102028] feat(optimizer): [2/N] Optimizer REST service layer May 13, 2026
*/
@PutMapping("/{tableUuid}")
public ResponseEntity<TableStatsDto> upsertTableStats(
@PathVariable String tableUuid, @RequestBody UpsertTableStatsRequest request) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the request body UpsertTableStatsRequest not part of this PR?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its in the first PR.

#527

mkuchenbecker and others added 11 commits May 13, 2026 11:47
…e layer

OptimizerDataService's filter-style methods (listTableOperations,
listTableStats, getStatsHistory, listHistory) accepted nullable strings/
enums to mean "no filter". Switch to Optional<T> at the service boundary;
controllers wrap their nullable @RequestParam values via Optional.ofNullable.
The implementation unwraps via .orElse(null) at the JPA repo call site —
the @query "IS NULL OR ..." pattern is idiomatic with nullable parameters
and stays unchanged.

No behavior change. No tests required updating.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rename wire HistoryStatus

services/optimizer no longer maintains its own JPA entities or Spring Data
repositories for the four optimizer DB tables. Apps/optimizer is the single
source of truth; services/optimizer depends on apps/optimizer (renamed via
project(':apps:optimizer').name = 'optimizer-data' to dodge the project.name
collision that previously caused a self-referential dependency error).

Removed from services/optimizer:
- entity/{TableOperationsRow, TableOperationsHistoryRow, TableStatsRow,
  TableStatsHistoryRow}.java
- repository/{TableOperationsRepository, TableOperationsHistoryRepository,
  TableStatsRepository, TableStatsHistoryRepository}.java
- api/model/TableStats.java (duplicate)
- api/model/OperationHistoryStatus.java (renamed → HistoryStatus to match
  the internal enum naming)
- config/JobResultConverter.java (no longer needed — entity stores result as
  raw String JSON and the mapper converts at the wire boundary)

Added on apps-side:
- TableStatsHistoryRow + TableStatsHistoryRepository (previously only on
  services-side)
- jobId + result fields on TableOperationsHistoryRow so it covers all
  services-side use cases
- find(...) on TableOperationsHistoryRepository extended to the 8-filter
  service-layer shape (databaseName, tableName, tableUuid, operationType,
  status, since, until, pageable)
- toBuilder = true on TableStatsRow so OptimizerDataServiceImpl.upsertTableStats
  can use the existing.toBuilder() pattern

Mapper updates:
- OptimizerMapper gains String ↔ wire-enum helpers and a JSON ↔ JobResult
  pair (replaces the old JPA AttributeConverter approach).
- OptimizerDataServiceImpl unwraps Optional<Enum> filters via .name() before
  calling the now-shared apps-side repos.

Tests updated to match: entity-builder calls pass enum.name() Strings;
repo.find(...) args reordered to apps-side (operationType, status,
tableUuid, databaseName, tableName); JobResult.builder() in test fixtures
replaced with literal JSON strings to match the String-typed result column.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to the optimizer-1 module move:

- services/optimizer/build.gradle: drop the now-dead
  `implementation project(':apps:optimizer-data')` (target module
  removed in the prior merge).
- Restore services-side TableStatsHistoryRepository (lost in the merge
  because optimizer-1 did not touch its services-side copy, but
  optimizer-2's HEAD had removed it during the R7-5 consolidation).
- Drop the multi-filter `listHistory` service method, its
  controller endpoint, and the standalone TableByNameController.
  Callers use `getHistory(tableUuid, limit)` which now uses the
  simplified `findByTableUuidOrderByCompletedAtDesc` derived query.
- TableStatsRow: enable `@Builder(toBuilder = true)` so
  `upsertTableStats` can build from the existing row.
@mkuchenbecker mkuchenbecker changed the title [BDP-102028] feat(optimizer): [2/N] Optimizer REST service layer [BDP-102028] feat(optimizer): [2/N] Optimizer REST Service and Controller May 14, 2026
mkuchenbecker and others added 13 commits May 14, 2026 14:42
Adapt the REST service layer to the new architecture introduced on
optimizer-0 and optimizer-1:
- api/ and db/ data types are self-contained per layer.
- model/mapper/ApiModelMapper and model/mapper/ModelDbMapper own all
  cross-layer translation.
- The old api/mapper/OptimizerMapper is gone.
- JobResult is removed from the wire entirely.
- TableStats is split on the DB side: TableStatsRow holds only the
  snapshot; TableStatsHistoryRow holds snapshot + delta per commit.

Changes:

OptimizerDataServiceImpl rewrite:
- Inject ApiModelMapper + ModelDbMapper instead of OptimizerMapper.
- Operations: list/get/complete/append go db row → ModelDbMapper →
  model object → ApiModelMapper → wire DTO. Enum filters on list()
  translate api → model → db.
- completeOperation: signature is now (CompleteOperationRequest)
  only; operationId lives in the body. No jobId / result on the
  written history row.
- Stats: split api.TableStats into snapshot (current-state row) and
  snapshot+delta (history row) at write time. Join back to the wire
  TableStats at read time (current-state has snapshot only; history
  has both).

OptimizerDataService interface:
- completeOperation(CompleteOperationRequest) — drop the String id
  path-style parameter.

TableOperationsController:
- POST endpoint moves from /{id}/complete to /complete. operationId
  is read from the request body.

application.properties:
- Re-introduced with production runtime config (server.port,
  application name, actuator) and JPA/MySQL datasource + schema-init
  pointing at the schema added on optimizer-1.

OptimizerDataServiceImplTest: rewritten to use api/ + db/ types,
new completeOperation signature, and the split snapshot/delta on
stats; drop JobResult-dependent assertions.
Push the api/model boundary out of the service entirely. After this
commit, calling into OptimizerDataService never returns or accepts a
wire DTO; controllers (or any future CLI / in-process consumer) own
the marshalling at their own edge.

Service interface:
- All return types and parameters are model/ types or primitives.
- completeOperation(String operationId, model.HistoryStatus status).
- upsertTableStats(model.Table table) — caller supplies a Table; the
  service stamps Table.updatedAt and returns the updated Table.
- listTableOperations / getStatsHistory / etc. return Lists of model
  types.

Service impl:
- Drop ApiModelMapper injection. Only depends on ModelDbMapper.
- All conversions are db row → ModelDbMapper → model. The new
  toStatsHistory mapper method (landed on optimizer-1) handles the
  history-row case. The updated toTable now stamps Table.updatedAt
  from the row.

Controllers (api/controller/*):
- TableOperationsController, TableOperationsHistoryController,
  TableStatsController now inject ApiModelMapper and do api↔model
  conversion at the boundary. Each controller method takes api
  request types, converts to model, calls the service, converts the
  returned model back to api DTOs.
- TableOperationsController.complete continues to take the
  operationId from the request body.

Test:
- OptimizerDataServiceImplTest now exercises the service in model
  types: builders create model.Table, assertions read model.HistoryStatus
  / model.OperationType / model.TableStats etc.

Verification: `git grep "import com.linkedin.openhouse.optimizer.api"
-- services/optimizer/src/main/java/com/linkedin/openhouse/optimizer/service/`
returns empty.
The api↔model and model↔db boundaries no longer go through injected
mapper beans. Switch every call site to the to/from methods that now
live on the types themselves.

OptimizerDataServiceImpl:
- Drop the ModelDbMapper field. No DI at all (only repositories).
- Row → model via TableOperation::fromRow, TableOperationsHistory::fromRow,
  Table::fromRow, TableStatsHistory::fromRow.
- Model → row via instance methods: history.toRow(), table.toBuilder()
  ...build().toRow(), and TableStats stats.toSnapshotRow() / .toDeltaRow().
- Enum filters on list() use OperationType::toDb / OperationStatus::toDb
  method references.

Controllers (TableOperationsController, TableOperationsHistoryController,
TableStatsController):
- Drop the ApiModelMapper field.
- api → model on the way in: dto.toModel(), request.toModel(uuid),
  request.getStatus().toModel(), apiEnum.toModel().
- model → api on the way out: Dto.fromModel(modelObj).
…t Table

OptimizerDataService.upsertTableStats / getTableStats / listTableStats
now operate on model.TableStats. The service stays decoupled from
Table — stats are stats, not tables. Conversion to TableStatsRow goes
through TableStats.toRow / fromRow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants