From 47020c62c62b3dde9132363640dab0a00f314324 Mon Sep 17 00:00:00 2001 From: shangeyao Date: Tue, 23 Jun 2026 22:14:06 +0800 Subject: [PATCH 1/2] [Build] Add AGENTS.md for AI agent coding conventions Add AGENTS.md file documenting project architecture, module boundaries, high-sensitivity areas, design patterns, coding conventions (Java/Scala/TypeScript), build commands, PR conventions, AI-generated PR disclosure template, and boundaries for contributors and AI agents. --- AGENTS.md | 239 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 239 insertions(+) create mode 100644 AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..cbf7ed3f36 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,239 @@ + + +# Apache StreamPark — Agent Instructions + +Project conventions, architecture, and coding patterns for the StreamPark codebase. + +## Architecture + +### Module Boundaries + +StreamPark is a Maven multi-module project with four top-level modules. Each has a clear responsibility boundary. + +- **`streampark-common`** (`streampark-common/`): Shared foundation layer. Contains configuration management (`ConfigKeys`, `ConfigOption`), utility classes (`Utils`, `HadoopUtils`, `JsonUtils`, `YarnUtils`), file system abstraction (`FsOperator`, `HdfsOperator`, `LfsOperator`), and shared enums (`FlinkDeployMode`, `ApplicationType`, etc.). Must be engine-agnostic — no Flink or Spark core API dependencies. All other modules depend on this module. + +- **`streampark-flink`** (`streampark-flink/`): Flink development framework and runtime integration. Contains the core development API (`FlinkStreaming`, `FlinkTable`, `FlinkSQL`), version-specific shims layers (`streampark-flink-shims_flink-1.xx`), job submission clients (`streampark-flink-client`), Kubernetes integration (`streampark-flink-kubernetes`), application packer (`streampark-flink-packer`), and connectors. The shims proxy (`FlinkShimsProxy`) is the central mechanism for multi-version Flink support. + +- **`streampark-spark`** (`streampark-spark/`): Spark development framework. Optional module, activated via `-Pspark` Maven profile. Contains `SparkStreaming`, `SparkBatch` core traits, Spark SQL client, and connectors. Follows the same lifecycle pattern as Flink modules. + +- **`streampark-console`** (`streampark-console/`): Web management platform. Contains two submodules: + - **`streampark-console-service`**: Spring Boot 2.7 backend, with `base/` (infrastructure), `core/` (business logic, controllers, services), and `system/` (authentication, user/role/team management) packages. + - **`streampark-console-webapp`**: Vue 3 + Vite frontend, with `api/` (API layer), `views/` (pages), `components/` (reusable UI), `store/` (Pinia state). + +The `streampark-common` module has the strongest stability guarantees — changes here affect all other modules. `streampark-flink/streampark-flink-core` public API (the `FlinkStreaming`, `FlinkTable` traits) should also be treated as stable — breaking changes require careful migration planning. + +### High-Sensitivity Areas + +- **`FlinkShimsProxy`**: The multi-version classloader isolation mechanism. Uses `ChildFirstClassLoader` to dynamically load version-specific shims JARs. Cached per Flink version. Changes here affect all Flink jobs across all versions. Never introduce static state that could leak across classloader boundaries. + +- **`FlinkStreaming` / `FlinkTable` lifecycle**: The `main` -> `init` -> `ready` -> `handle` -> `destroy` lifecycle is the contract all user applications depend on. Changes to the execution order or initialization behavior can break existing applications in production. + +- **`ConfigKeys` / `CommonConfig`**: Central configuration key definitions. Adding, removing, or renaming keys affects application configuration files, the console UI, and deployment scripts. Key names must remain backward-compatible. + +- **`FlinkApplicationController` / `FlinkApplicationManageService` / `FlinkApplicationActionService`**: The core application management flow. Operations (start, stop, cancel, deploy) must be idempotent and handle all Flink states correctly. The `AppChangeEvent` annotation triggers state synchronization. + +- **SQL parsing and validation** (`FlinkSql`, `FlinkSqlService`, `SqlConvertUtils`): SQL validation must be version-aware (Flink 1.12-1.20 have different SQL syntax). The `sql-rev.dict` file handles MySQL-to-PostgreSQL dialect conversion. + +- **Kubernetes integration** (`FlinkK8sWatchController`): Uses Caffeine caches (`TrackIdCache`, `JobStatusCache`, `MetricCache`) for tracking K8s-deployed Flink jobs. Cache invalidation and TTL must be correct to avoid stale state. + +- **Database schema changes**: All schema changes must have corresponding upgrade scripts in `streampark-console/.../script/upgrade/` for both MySQL and PostgreSQL. The `sql-rev.dict` file must be updated if new SQL dialect differences are introduced. + +- **Authentication & Authorization**: `ShiroConfig`, `JWTUtil`, `ShiroRealm` — changes here affect all user access. The `@Permission` annotation and `PermissionAspect` enforce team-level resource isolation. Never weaken RBAC checks. + +## Design Patterns + +- **Lifecycle trait pattern**: `FlinkStreaming`, `FlinkTable`, `SparkStreaming`, `SparkBatch` all follow the same trait-based lifecycle: `main` -> `init` -> `ready` -> `handle` -> `start` -> `destroy`. Users override `handle()` (required) and optionally `ready()`, `config()`, `destroy()`. Never add mandatory lifecycle methods to existing traits. + +- **Shims / Proxy pattern**: `FlinkShimsProxy.proxy(flinkVersion, func)` isolates version-specific Flink API calls behind a `ChildFirstClassLoader`. Each Flink version has its own shims module (`streampark-flink-shims_flink-1.xx`) with the same interface. New shims methods must be added to all version modules. + +- **Service layer separation**: Console services are split by responsibility — `FlinkApplicationManageService` (CRUD), `FlinkApplicationActionService` (start/stop/cancel), `FlinkApplicationInfoService` (query/info). Follow this pattern when adding new application operations. + +- **Implicit enrichment**: Scala `implicit` conversions are used to extend Flink/Spark APIs (e.g., `DataStreamExt` adds methods to `DataStream`). New implicit conversions must be scoped to avoid polluting the global namespace. + +- **MyBatis-Plus entity pattern**: Entities extend `BaseEntity` (auto-fill `createTime`/`modifyTime`). Mappers extend MyBatis-Plus `BaseMapper`. Pagination uses `MybatisPager` + `PaginationInterceptor`. Follow existing patterns rather than introducing new ORM approaches. + +- **Enum-based configuration**: Both Java enums (`FlinkDeployMode`, `ApplicationType`) and Scala enumeratum enums (`ApiType`, `PlannerType`) are used. Prefer Scala `enumeratum` for new Scala-side enums — it provides better type safety and serialization. + +- **REST response pattern**: All controller methods return `RestResponse.success(data)` or `RestResponse.fail(...)`. Never return raw objects. Use `@Permission` annotation for access control, `@AppChangeEvent` for state-change auditing. + +- **File system abstraction**: Use `FsOperator` (with `HdfsOperator` / `LfsOperator` implementations) for file operations. Never use raw `java.io.File` or Hadoop `FileSystem` directly in business logic. + +## Coding Conventions + +### Java (Backend) + +- **Formatting**: Eclipse formatter via Spotless (`tools/checkstyle/spotless_streampark_formatter.xml`). Run `./mvnw spotless:apply` before committing. +- **Import order**: `org.apache.streampark`, `org.apache.streampark.shaded`, `org.apache`, `javax`, `java`, `scala`, `\#` (all others). +- **Static checks**: Checkstyle (`tools/checkstyle/checkstyle.xml`) + Spotless. No wildcard imports. No `@author` tags. No JUnit 4 imports. +- **Lombok**: Use `@Slf4j`, `@Data`, `@Builder` where appropriate. Do not use `@EqualsAndHashCode` on JPA/Hibernate entities. +- **Testing**: JUnit 5 (`org.junit.jupiter`) + AssertJ. Use `@Test` (not `@Test` from JUnit 4). Use `assertThat(...).isEqualTo(...)` style. Test classes should be in the same package as the code under test. +- **Package structure**: Controllers in `controller/`, service interfaces in `service/`, implementations in `service/impl/`, entities in `entity/`, mappers in `mapper/`, enums in `enums/`. + +### Scala (Framework / Common) + +- **Formatting**: Scalafmt 3.7.5 (`tools/checkstyle/.scalafmt.conf`). Max column 160. Run `./mvnw spotless:apply` to format. +- **Import ordering**: `org.apache.streampark.*` first, then other third-party, then `javax.*`, `java.*`, `scala.*`. +- **Static checks**: Scalastyle (`tools/checkstyle/scalastyle-config.xml`). No wildcard imports. No `println` statements (use `Logger` trait). +- **Testing**: ScalaTest 3.2.9. Use `FlatSpec` or `FunSuite` style consistent with existing tests. +- **Style**: Use `val` over `var`. Prefer `Option` over `null` in public APIs. Use `lazy val` for expensive initialization. Use pattern matching instead of `isInstanceOf`/`asInstanceOf`. + +### TypeScript / Vue (Frontend) + +- **Formatting**: ESLint + Prettier. Run `pnpm lint:eslint` and `pnpm lint:prettier`. +- **Vue 3 Composition API**: Use `