feat: Add Incremental Entity Provider and Global API Pagination Support by InduwaraSMPN · Pull Request #140 · openchoreo/backstage-plugins

InduwaraSMPN · 2025-12-17T13:05:02Z

Purpose

This PR addresses scalability and performance limitations when syncing large numbers of entities from OpenChoreo. It introduces a new Incremental Entity Provider to handle large datasets efficiently via cursor-based pagination and updates the entire plugin ecosystem to support the new paginated API endpoints.

Goals

Scalability: Enable the ingestion of thousands of components without hitting API timeouts or memory limits.
Resiliency: Implement a stateful ingestion engine that can resume from the last successful cursor in case of interruption or token expiration.
Performance: Optimize API usage by implementing burst/rest cycles and batched processing for component details.
UI Experience: specific improvements to the Scaffolder UI to handle large lists of traits via pagination.

Approach

1. New Incremental Backend Module (catalog-backend-module-openchoreo-incremental)

Created a new backend module dedicated to incremental ingestion.
Database: Added Knex migrations to create ingestions, ingestion_marks, and ingestion_mark_entities tables for persisting cursor state and tracking processed entities.
Ingestion Engine: Implemented OpenChoreoIncrementalIngestionEngine which manages the ingestion lifecycle (Burst -> Interstitial -> Rest) and handles backoff strategies for errors.
Cursor Traversal: Implemented OpenChoreoIncrementalEntityProvider to traverse resources in a specific order (Organizations -> Projects -> Components) using the new API continue tokens.
Management API: Added router endpoints (/incremental/...) to monitor health, trigger runs, and reset provider state.

2. API Client & Common Utilities

Updated openapi/openchoreo-api.yaml and generated types to support limit and continue query parameters on all list endpoints.
Added support for ResponseMetadata (resourceVersion, hasMore).
Implemented fetchAllResources utility in openchoreo-common to standardize pagination logic across legacy and new services.
Added handling for 410 Gone errors to automatically reset cursors if tokens expire.

3. Existing Backend Refactoring

Refactored OpenChoreoEntityProvider and various info services (EnvironmentInfoService, TraitInfoService, etc.) to use the new fetchAllResources utility, ensuring they work with the updated paginated API.
Updated CtdToTemplateConverter to improve tag inference, UI widget selection (e.g., using radio for booleans), and conditional CI/CD setup sections.

4. Frontend Updates

Updated TraitsFieldExtension.tsx to support "Load More" functionality for retrieving traits incrementally in the Scaffolder.

5. Configuration

Added new configuration schema openchoreo.incremental in app-config.yaml to control burst length, intervals, and chunk sizes.

User stories

As a Platform Engineer, I can configure Backstage to ingest entities from OpenChoreo incrementally, preventing "Out of Memory" errors during large syncs.
As an Administrator, I can monitor the status of incremental ingestion and trigger manual syncs via API.
As a User, I can browse a large list of Component Traits in the Scaffolder without waiting for the entire list to load upfront.

Release note

New Feature: Added @openchoreo/plugin-catalog-backend-module-openchoreo-incremental for scalable, cursor-based entity ingestion.
Enhancement: Updated all OpenChoreo backend services to support API pagination.
Enhancement: Improved Component Template generation logic (better tag inference and UI widgets).
Fix: Added handling for expired API continuation tokens (HTTP 410).

Documentation

Added README.md in plugins/catalog-backend-module-openchoreo-incremental detailing configuration and architecture.
Updated app-config.yaml examples with comments explaining how to enable the incremental provider.

Training

N/A

Certification

N/A

Marketing

N/A

Automation tests

Unit tests: Added comprehensive tests for the new incremental provider module:
- OpenChoreoIncrementalEntityProvider.test.ts: Verifies cursor traversal, phase transitions (Org -> Project -> Component), and 410 error recovery.
- OpenChoreoIncrementalIngestionDatabaseManager.test.ts: Verifies database persistence for cursors and marks.
- CtdToTemplateConverter.test.ts: Updated tests to reflect changes in template generation (tags, CI setup).
Dev setup: Added dev/index.ts in the new module for local testing with a dummy provider.

Security checks

Followed secure coding standards in http://wso2.com/technical-reports/wso2-secure-engineering-guidelines? yes/no
Ran FindSecurityBugs plugin and verified report? yes/no
Confirmed that this PR doesn't commit any keys, passwords, tokens, usernames, or other secrets? yes/no

Samples

N/A

Related PRs

N/A

Migrations (if applicable)

Database Migrations: This PR includes Knex migrations for the catalog-backend-module-openchoreo-incremental plugin.
- 20221116073152_init.js: Creates initial ingestion tables.
- 20240110000001_add_performance_indexes.ts: Adds indexes for performance.
- 20240110000003_expand_last_error_field.ts: Expands error logging column size.
Configuration: Users opting into incremental ingestion must update app-config.yaml to configure openchoreo.incremental and register the new module in packages/backend/src/index.ts.

Test environment

OS: Linux (Ubuntu)
Database: SQLite (Local)
Node: v20.19.5
Backstage: v1.43.3

Learning

Utilized the incremental-ingestion backend pattern (inspired by the official Backstage incremental provider) to handle large datasets efficiently.
Implemented "Cursor-based pagination" to ensure data consistency during long-running sync processes.

Summary by CodeRabbit

Release Notes

New Features
- Added incremental catalog ingestion with cursor-based pagination support for handling large-scale data efficiently
- Implemented "Load more" functionality in traits selection dropdown for improved usability with large trait sets
Improvements
- Enhanced API pagination across catalog entities, environments, and components
- Improved trait loading with server-side pagination and dedicated loading states
- Refined template generation with enhanced UI field controls and tag management
Configuration
- Added optional incremental ingestion settings for burst processing and pagination tuning

Introduces generic helpers and refactors API calls to use cursor-based pagination for all list endpoints, improving scalability and reliability for large data sets. Updates OpenAPI schema and client types to support metadata-driven pagination, replaces page/size parameters with limit/continue, and implements a default max page size for efficiency. Enhances error handling for paginated requests and prevents infinite loops. Refactors several backend services and entity providers to leverage the new pagination utilities, ensuring consistent resource synchronization across organizations, projects, and components. Motivated by the need for reliable handling of large deployments and to align with upstream OpenChoreo API changes.

Introduces a new incremental ingestion backend module supporting burst-based, cursor-driven processing for large-scale OpenChoreo deployments. Enables efficient, fault-tolerant, and memory-conscious catalog updates by fetching entities in resumable batches with database-persisted state and management APIs. Improves scalability, observability, and operational control over catalog ingestion. Relates to large dataset handling and platform scalability needs.

Prevents migration failures by ensuring PostgreSQL CREATE INDEX CONCURRENTLY commands run outside transaction blocks, as required by the database. Improves migration reliability for performance optimizations.

Updates test cases to reflect revised tag logic, ensuring tags now include 'openchoreo', the component name, and workload type, rather than inferring tags from component name parts. Cleans up unused imports for clarity.

Aligns tests with changes to CI/CD configuration, including renaming UI fields, updating workflow selection logic, and switching boolean widget expectations. Improves cursor expiration handling assertions and mocks for increased clarity and accuracy. Reflects recent logic changes to section generation and parameter spreading, ensuring tests accurately validate intended behaviors.

Refactors test and source files for improved readability, including consistent formatting in function arguments and object initializations. Enhances error handling by making log messages for expired pagination tokens more descriptive, aiding in debugging and operational clarity.

Introduces a new backend module and service factory to support immediate delta mutations in the catalog, enabling real-time entity ingestion for OpenChoreo without relying on legacy scheduled providers. Updates configuration example and dependencies to facilitate large-scale local testing and integration with scaffolder actions.

Replaces hardcoded page limit with a shared constant to ensure consistency and easier maintenance. Reflects updated OpenAPI schema cap and avoids potential request errors from exceeding limits.

Aligns the default pagination limit with standard system values, potentially improving consistency and compatibility with related systems.

Increases the maximum allowed value for the limit parameter from 500 to 512 in both the API specification and documentation. Aligns configuration with backend capabilities and clarifies usage for clients.

Adds mocked headers with content-length values to test HTTP responses to better simulate real-world scenarios and enable more accurate testing of code that depends on response headers.