From eea16d0da57d886f8a2b533f13d4f3a72cadab8f Mon Sep 17 00:00:00 2001 From: ruanwenjun Date: Wed, 22 Apr 2026 14:28:30 +0800 Subject: [PATCH] [Chore] Initialize CLAUDE.md --- .gitignore | 1 + CLAUDE.md | 161 +++++++++++++++++++ dolphinscheduler-alert/CLAUDE.md | 44 +++++ dolphinscheduler-api-test/CLAUDE.md | 54 +++++++ dolphinscheduler-api/CLAUDE.md | 61 +++++++ dolphinscheduler-authentication/CLAUDE.md | 27 ++++ dolphinscheduler-bom/CLAUDE.md | 42 +++++ dolphinscheduler-common/CLAUDE.md | 53 ++++++ dolphinscheduler-dao-plugin/CLAUDE.md | 37 +++++ dolphinscheduler-dao/CLAUDE.md | 41 +++++ dolphinscheduler-datasource-plugin/CLAUDE.md | 55 +++++++ dolphinscheduler-dist/CLAUDE.md | 53 ++++++ dolphinscheduler-e2e/CLAUDE.md | 65 ++++++++ dolphinscheduler-eventbus/CLAUDE.md | 38 +++++ dolphinscheduler-extract/CLAUDE.md | 39 +++++ dolphinscheduler-master/CLAUDE.md | 64 ++++++++ dolphinscheduler-meter/CLAUDE.md | 40 +++++ dolphinscheduler-microbench/CLAUDE.md | 36 +++++ dolphinscheduler-registry/CLAUDE.md | 44 +++++ dolphinscheduler-scheduler-plugin/CLAUDE.md | 38 +++++ dolphinscheduler-service/CLAUDE.md | 41 +++++ dolphinscheduler-spi/CLAUDE.md | 39 +++++ dolphinscheduler-standalone-server/CLAUDE.md | 43 +++++ dolphinscheduler-storage-plugin/CLAUDE.md | 46 ++++++ dolphinscheduler-task-executor/CLAUDE.md | 49 ++++++ dolphinscheduler-task-plugin/CLAUDE.md | 60 +++++++ dolphinscheduler-tools/CLAUDE.md | 53 ++++++ dolphinscheduler-ui/CLAUDE.md | 73 +++++++++ dolphinscheduler-worker/CLAUDE.md | 59 +++++++ dolphinscheduler-yarn-aop/CLAUDE.md | 36 +++++ 30 files changed, 1492 insertions(+) create mode 100644 CLAUDE.md create mode 100644 dolphinscheduler-alert/CLAUDE.md create mode 100644 dolphinscheduler-api-test/CLAUDE.md create mode 100644 dolphinscheduler-api/CLAUDE.md create mode 100644 dolphinscheduler-authentication/CLAUDE.md create mode 100644 dolphinscheduler-bom/CLAUDE.md create mode 100644 dolphinscheduler-common/CLAUDE.md create mode 100644 dolphinscheduler-dao-plugin/CLAUDE.md create mode 100644 dolphinscheduler-dao/CLAUDE.md create mode 100644 dolphinscheduler-datasource-plugin/CLAUDE.md create mode 100644 dolphinscheduler-dist/CLAUDE.md create mode 100644 dolphinscheduler-e2e/CLAUDE.md create mode 100644 dolphinscheduler-eventbus/CLAUDE.md create mode 100644 dolphinscheduler-extract/CLAUDE.md create mode 100644 dolphinscheduler-master/CLAUDE.md create mode 100644 dolphinscheduler-meter/CLAUDE.md create mode 100644 dolphinscheduler-microbench/CLAUDE.md create mode 100644 dolphinscheduler-registry/CLAUDE.md create mode 100644 dolphinscheduler-scheduler-plugin/CLAUDE.md create mode 100644 dolphinscheduler-service/CLAUDE.md create mode 100644 dolphinscheduler-spi/CLAUDE.md create mode 100644 dolphinscheduler-standalone-server/CLAUDE.md create mode 100644 dolphinscheduler-storage-plugin/CLAUDE.md create mode 100644 dolphinscheduler-task-executor/CLAUDE.md create mode 100644 dolphinscheduler-task-plugin/CLAUDE.md create mode 100644 dolphinscheduler-tools/CLAUDE.md create mode 100644 dolphinscheduler-ui/CLAUDE.md create mode 100644 dolphinscheduler-worker/CLAUDE.md create mode 100644 dolphinscheduler-yarn-aop/CLAUDE.md diff --git a/.gitignore b/.gitignore index fcf292d66eda..47e516d7c7d1 100644 --- a/.gitignore +++ b/.gitignore @@ -55,3 +55,4 @@ dolphinscheduler-master/logs dolphinscheduler-api/logs __pycache__ ds_schema_check_test +docs/superpowers/ diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 000000000000..d5e10261b67c --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,161 @@ +# CLAUDE.md — Apache DolphinScheduler + +Apache DolphinScheduler is a distributed, visual DAG workflow-scheduling platform. This is the monorepo: backend servers (master / worker / api / alert), a Vue 3 frontend, plugin families for tasks / datasources / storage / alerting / scheduling, and the release tooling. + +**This file is an index.** Each module has its own `CLAUDE.md` with the details — do not duplicate module contents here. + +--- + +## Tech stack (project-wide) + +- **Java 1.8** (do not assume 11+ APIs; `dolphinscheduler-api-test` is the only Java 11 island). +- **Spring Boot 2.6.1** across servers, **Jetty** (Tomcat is excluded transitively). +- **MyBatis-Plus** for ORM; **HikariCP** for the metadata DB pool, **Druid** inside user-facing datasource plugins. +- **Quartz** for cron scheduling (via `scheduler-plugin`). +- **Netty / gRPC** for inter-server RPC (see `extract-base`). +- **Vue 3 + Vite + TypeScript + Naive UI** for the frontend. +- **Maven** multi-module reactor (26 modules in root `pom.xml` + 2 test modules). +- **Zookeeper 3.8** by default for the registry (Etcd and JDBC also supported). + +## Runnable services + +A production deployment runs **four independent services** (plus an external registry and metadata DB). A fifth entry point — `StandaloneServer` — embeds all four in one JVM for development. + +| Service | Module | Main class | Default ports | +|---------|--------|------------|---------------| +| **API** | [`dolphinscheduler-api`](dolphinscheduler-api/CLAUDE.md) | `org.apache.dolphinscheduler.api.ApiApplicationServer` | `12345` (HTTP / UI + REST) | +| **Master** | [`dolphinscheduler-master`](dolphinscheduler-master/CLAUDE.md) | `org.apache.dolphinscheduler.server.master.MasterServer` | `5679` (RPC) | +| **Worker** | [`dolphinscheduler-worker`](dolphinscheduler-worker/CLAUDE.md) | `org.apache.dolphinscheduler.server.worker.WorkerServer` | `1235` (RPC) | +| **Alert** | [`dolphinscheduler-alert`](dolphinscheduler-alert/CLAUDE.md) (→ `-alert-server`) | `org.apache.dolphinscheduler.alert.AlertServer` | `50053` (HTTP), `50052` (RPC) | +| Standalone (dev only) | [`dolphinscheduler-standalone-server`](dolphinscheduler-standalone-server/CLAUDE.md) | `org.apache.dolphinscheduler.StandaloneServer` | `12345` + `50052` (API + alert; master/worker use in-JVM calls) | + +Every service is a `@SpringBootApplication` on Jetty and implements `IStoppable`. Scale Master / Worker / Alert horizontally; coordination happens via the registry (Zookeeper by default). API is stateless and also scales horizontally behind a load balancer. + +Ports are overridable via `server.port` / service-specific keys in each service's `application.yaml`. + +## Build & run + +```bash +# Full build (release profile; produces dist tarball) +./mvnw clean install -Prelease + +# Zookeeper 3.4 legacy +./mvnw clean install -Prelease -Dzk-3.4 + +# Skip UI build (faster iteration on backend only) +./mvnw -pl '!dolphinscheduler-ui' clean install + +# Build one module (+ its required siblings) +./mvnw -pl dolphinscheduler-master -am clean install + +# Format (Spotless is configured) +./mvnw spotless:apply + +# Standalone server (after building) +cd dolphinscheduler-standalone-server/target && ./bin/start.sh +``` + +Binary artifact: `dolphinscheduler-dist/target/apache-dolphinscheduler-*-bin.tar.gz`. + +## Test + +```bash +# Unit tests for one module +./mvnw -pl dolphinscheduler-master test + +# API integration tests (separate reactor, requires Docker) +mvn -pl dolphinscheduler-api-test/dolphinscheduler-api-test-case test + +# E2E browser tests (Selenium + Docker) +mvn -pl dolphinscheduler-e2e/dolphinscheduler-e2e-case test + +# Apple Silicon: add -Dm1_chip=true to the Docker-driven suites +``` + +--- + +## Module index + +Click into a module's `CLAUDE.md` for details. Each description is one line here on purpose. + +### Core execution + +- [`dolphinscheduler-master`](dolphinscheduler-master/CLAUDE.md) — workflow orchestration engine; consumes `Command`s, runs the DAG state machine, dispatches to workers. +- [`dolphinscheduler-worker`](dolphinscheduler-worker/CLAUDE.md) — runs physical tasks dispatched from master; hosts task plugins. +- [`dolphinscheduler-task-executor`](dolphinscheduler-task-executor/CLAUDE.md) — reusable task-lifecycle framework embedded by the worker. +- [`dolphinscheduler-alert`](dolphinscheduler-alert/CLAUDE.md) — alert server + channel plugins (email, Feishu, DingTalk, …). + +### API layer + +- [`dolphinscheduler-api`](dolphinscheduler-api/CLAUDE.md) — REST API server (entry point for UI, Python SDK, external clients). +- [`dolphinscheduler-api-test`](dolphinscheduler-api-test/CLAUDE.md) — integration tests against the REST API (Docker Compose + Testcontainers). +- [`dolphinscheduler-authentication`](dolphinscheduler-authentication/CLAUDE.md) — Actuator-endpoint auth + AWS credential helpers (NOT the main login path). + +### Shared libraries + +- [`dolphinscheduler-common`](dolphinscheduler-common/CLAUDE.md) — foundation utilities (everything depends on this). +- [`dolphinscheduler-dao`](dolphinscheduler-dao/CLAUDE.md) — MyBatis DAO layer + SQL migration scripts. +- [`dolphinscheduler-service`](dolphinscheduler-service/CLAUDE.md) — business logic between DAO and the servers. +- [`dolphinscheduler-spi`](dolphinscheduler-spi/CLAUDE.md) — Service-Provider Interface root (every plugin depends on this). +- [`dolphinscheduler-extract`](dolphinscheduler-extract/CLAUDE.md) — RPC interface contracts between servers. +- [`dolphinscheduler-eventbus`](dolphinscheduler-eventbus/CLAUDE.md) — in-process event-bus abstractions. +- [`dolphinscheduler-registry`](dolphinscheduler-registry/CLAUDE.md) — pluggable registry (Zookeeper / Etcd / JDBC). +- [`dolphinscheduler-meter`](dolphinscheduler-meter/CLAUDE.md) — metrics (Prometheus) + server load-protection primitives. + +### Plugin families + +- [`dolphinscheduler-task-plugin`](dolphinscheduler-task-plugin/CLAUDE.md) — task-type plugins (shell, SQL, Spark, Flink, K8s, EMR, …). 33 concrete plugins. +- [`dolphinscheduler-datasource-plugin`](dolphinscheduler-datasource-plugin/CLAUDE.md) — user-facing datasource plugins (MySQL, Hive, Trino, Snowflake, …). 28 concrete plugins. +- [`dolphinscheduler-storage-plugin`](dolphinscheduler-storage-plugin/CLAUDE.md) — resource storage (S3, HDFS, OSS, GCS, ABS, OBS, COS). +- [`dolphinscheduler-scheduler-plugin`](dolphinscheduler-scheduler-plugin/CLAUDE.md) — cron scheduler (Quartz today). +- [`dolphinscheduler-dao-plugin`](dolphinscheduler-dao-plugin/CLAUDE.md) — metadata-DB dialect support (MySQL / PostgreSQL / H2). + +### Build, ops, tools + +- [`dolphinscheduler-bom`](dolphinscheduler-bom/CLAUDE.md) — Maven BOM; central dependency version pinning. +- [`dolphinscheduler-dist`](dolphinscheduler-dist/CLAUDE.md) — assembles the release tarball + Docker images. +- [`dolphinscheduler-standalone-server`](dolphinscheduler-standalone-server/CLAUDE.md) — all-in-one JVM with H2 (dev / smoke tests). +- [`dolphinscheduler-tools`](dolphinscheduler-tools/CLAUDE.md) — CLIs for schema upgrade + resource / lineage migration. +- [`dolphinscheduler-microbench`](dolphinscheduler-microbench/CLAUDE.md) — JMH micro-benchmarks. +- [`dolphinscheduler-yarn-aop`](dolphinscheduler-yarn-aop/CLAUDE.md) — AspectJ weaver capturing YARN ApplicationIds. + +### Frontend & E2E + +- [`dolphinscheduler-ui`](dolphinscheduler-ui/CLAUDE.md) — Vue 3 frontend. +- [`dolphinscheduler-e2e`](dolphinscheduler-e2e/CLAUDE.md) — Selenium browser tests. + +--- + +## Architecture overview (one paragraph) + +A **user** hits the UI, which calls the API server. The API server writes to the **metadata DB** and, for runtime operations (start / kill / pause workflow), talks to the **master** over RPC. The master consumes `t_ds_command` rows, runs the workflow state machine, and dispatches tasks to **workers**. Workers execute task plugins (shell, SQL, Spark, …) and stream lifecycle events back to master. Failures and SLA breaches flow to the **alert server**, which fans out through alert plugins. **Registry** (Zookeeper / Etcd / JDBC) provides service discovery, leader election, and distributed locks. **Storage plugins** back the resource center and distributed-task artifacts. **Quartz** (via scheduler plugin) fires scheduled workflows, which become new `Command` rows. + +## Where things live (quick lookup) + +| Looking for… | Start here | +|--------------|------------| +| A REST endpoint | `dolphinscheduler-api/src/main/java/.../api/controller/` | +| Workflow execution logic | `dolphinscheduler-master/src/main/java/.../server/master/engine/` | +| Task execution logic | `dolphinscheduler-worker` + the specific `task-plugin/` | +| How "X" is stored | `dolphinscheduler-dao/src/main/java/.../dao/entity/` | +| SQL schema / upgrade | `dolphinscheduler-dao/src/main/resources/sql/` | +| RPC contract between servers | `dolphinscheduler-extract/dolphinscheduler-extract-` | +| UI page source | `dolphinscheduler-ui/src/views//` | +| API call in the UI | `dolphinscheduler-ui/src/service/modules/.ts` | +| Version of a dependency | `dolphinscheduler-bom/pom.xml` | + +## Project-wide conventions + +- **Formatting**: `./mvnw spotless:apply`. CI will fail PRs that aren't formatted. Java imports are ordered; license headers are enforced. +- **Commit style**: `[Type-ISSUE_ID] [Scope] Subject`, e.g. `[Fix-18168] [Worker] ...`. Scopes match module names. +- **Branching**: `dev` is the main integration branch (not `main`/`master`). +- **PRs must link a GitHub issue** and keep their scope tight — one module / one concern. +- **Do not break wire / DB compatibility** silently. Changes to `extract-*` RPC interfaces, `dao` entities, enum values, and `spi.DbType` ripple to deployed clusters mid-upgrade. +- **Only one registry / storage / DB dialect is active at runtime**. Code paths that check "which one" belong inside the plugin SPI, not sprinkled through services. + +## External references + +- Release docs (version-specific): https://dolphinscheduler.apache.org/en-us/docs +- GitHub issues: https://github.com/apache/dolphinscheduler/issues +- Python SDK: https://dolphinscheduler.apache.org/python/main/index.html +- Contribution guide: [`docs/docs/en/contribute/join/contribute.md`](docs/docs/en/contribute/join/contribute.md) diff --git a/dolphinscheduler-alert/CLAUDE.md b/dolphinscheduler-alert/CLAUDE.md new file mode 100644 index 000000000000..b4e9fda772fd --- /dev/null +++ b/dolphinscheduler-alert/CLAUDE.md @@ -0,0 +1,44 @@ +# CLAUDE.md — dolphinscheduler-alert + +Alert / notification subsystem. Master and worker publish alert events (task failure, timeout, SLA breach, …); this subsystem evaluates them and dispatches via configured channel plugins (email, webhook, Feishu, DingTalk, PagerDuty, …). + +**This directory is a Maven parent POM.** + +## Sub-modules + +- **`dolphinscheduler-alert-server`** — the runnable alert server. `@SpringBootApplication` `AlertServer` + HA coordinator (`AlertHAServer`) + RPC server (`AlertRpcServer`) + event loop. +- **`dolphinscheduler-alert-plugins`** — parent of the concrete channel plugins: + - `dolphinscheduler-alert-email`, `-http`, `-feishu`, `-dingtalk`, `-wechat`, `-webexteams`, `-pagerduty`, `-telegram`, `-slack`, `-script`, etc. (see directory listing for the current set). Each plugin implements the `AlertPlugin` SPI and is loaded dynamically at runtime. + +## How alerting works (end-to-end) + +1. Master/worker calls into `dolphinscheduler-extract-alert` RPC with an `AlertRequest`. +2. `AlertRpcServer` receives it, persists/enqueues onto `AlertEventPendingQueue`. +3. `AlertEventFetcher` pulls pending events in `AlertEventLoop`. +4. `AlertSender` looks up the alert group's configured plugins, invokes each. +5. Delivery result is recorded to the DB for audit. + +## Extension points + +- **`AlertPlugin` SPI** — new channel? Implement the interface in a new sub-module of `dolphinscheduler-alert-plugins` and it will be discovered by `AlertPluginManager`. +- Alert rule logic is currently fixed (type-based); extending rules means editing `AlertEventLoop` / `AlertSender`. + +## Gotchas + +- **Alert server is separate from master/worker** by design. Running it as an embedded part of master is not supported — it has its own HA, its own port, its own `application.yaml`. +- **`AlertEventPendingQueue` is DB-backed** (not just in-memory) so restarting the alert server does not lose queued alerts. +- **HA pattern mirrors master/worker**: only the leader pulls from the queue; others stand by. +- **Plugin config is per-alert-group**, stored in the DB. Adding a plugin implementation does not enable it for anyone until an admin creates an alert group referencing it. +- **Channel-specific rate limits / retries** live *inside* each plugin, not in the server. Don't add a global retry loop in `AlertSender`. + +## Tests + +- `alert-server/src/test/java` — unit tests for config, RPC, event queue, sender. +- Each plugin has its own `src/test/java` with mocked channel calls. + +## Related modules + +- `dolphinscheduler-extract-alert` — RPC contracts this server implements. +- `dolphinscheduler-dao` — persists alert events + audit. +- `dolphinscheduler-registry-all`, `dolphinscheduler-meter`, `dolphinscheduler-spi`. +- Callers: `dolphinscheduler-master`, `dolphinscheduler-worker`. diff --git a/dolphinscheduler-api-test/CLAUDE.md b/dolphinscheduler-api-test/CLAUDE.md new file mode 100644 index 000000000000..2d661b7301f6 --- /dev/null +++ b/dolphinscheduler-api-test/CLAUDE.md @@ -0,0 +1,54 @@ +# CLAUDE.md — dolphinscheduler-api-test + +Integration test harness for the REST API. **Not** bundled into the release — this module exists to run curl-style black-box tests against a full DolphinScheduler stack booted via Docker Compose + Testcontainers. + +**This is a Maven parent POM.** This module is **not** declared in the root `pom.xml` `` — run it explicitly (e.g. `mvn -pl dolphinscheduler-api-test test -am`) or via CI workflows. + +## Sub-modules + +- **`dolphinscheduler-api-test-core`** — the reusable framework: + - `@DolphinScheduler` annotation (marks a test class, accepts `composeFiles`). + - `DolphinSchedulerExtension` — JUnit 5 extension that starts the Compose stack, injects `RequestClient` and page objects. + - `RequestClient` — thin HTTP client with session handling. + - Page-object base classes (`LoginPage`, `WorkflowDefinitionPage`, `ProjectPage`, `TenantPage`, `UserPage`, `WorkerGroupPage`). +- **`dolphinscheduler-api-test-case`** — the actual test classes. `WorkflowDefinitionAPITest`, `TenantAPITest`, `SchedulerAPITest`, etc. Each class uses `@DolphinScheduler(composeFiles = { "docker/basic/docker-compose.yaml" })` to declare its environment. + +## Stack bootstrap + +Compose files live under `src/test/resources/docker//docker-compose.yaml`: + +- `basic/` — standard API + Postgres. +- `tenant/` — scenario with multi-tenant configuration. +- `oidc-login/` — API with Keycloak for OIDC flow (`realm-export.json` alongside). + +The JUnit extension uses Testcontainers Compose to wait for port readiness before returning. + +## Running + +``` +# Full suite +mvn -pl dolphinscheduler-api-test/dolphinscheduler-api-test-case test + +# Single class (Mac M1) +mvn -pl dolphinscheduler-api-test/dolphinscheduler-api-test-case test \ + -Dtest=WorkflowDefinitionAPITest -Dm1_chip=true +``` + +Flags: + +- `-Dm1_chip=true` — forces `arm64/v8` platform for Docker on Apple Silicon. +- `-Dlocal=true` — skips Testcontainers and points at a locally-running DolphinScheduler instead. + +## Gotchas + +- **Java 11 required** to compile this module (rest of the repo targets 1.8). +- **`@Order` is mandatory** on test methods within a class — tests are state-dependent and must run in declared order. +- **`@DisableIfTestFails` (from junit-pioneer)** cascades a failure to dependent tests in the same class; don't silently disable without understanding the chain. +- **`sessionId` is threaded through page objects**: `LoginPage.login(...)` returns a session ID that subsequent page objects must receive. Don't instantiate page objects without first logging in. +- **This module is excluded from the root reactor build** — CI drives it via a dedicated workflow. Check `.github/workflows/` before assuming a PR runs these. + +## Related modules + +- `dolphinscheduler-api` — the system under test. +- `dolphinscheduler-dao` — depended on to reuse entity DTOs. +- `dolphinscheduler-e2e` — complementary browser-level tests (selenium against UI). diff --git a/dolphinscheduler-api/CLAUDE.md b/dolphinscheduler-api/CLAUDE.md new file mode 100644 index 000000000000..1f9db04fc833 --- /dev/null +++ b/dolphinscheduler-api/CLAUDE.md @@ -0,0 +1,61 @@ +# CLAUDE.md — dolphinscheduler-api + +REST API server. Entry point for the UI and external clients (curl, Python SDK). Uses Spring Boot with **Jetty** (not Tomcat) and springdoc-openapi for Swagger UI. + +## Main package + +`org.apache.dolphinscheduler.api` + +## Entry point + +`ApiApplicationServer` — `@SpringBootApplication`. On startup it: +1. Loads `DataSourcePluginManager` and `TaskPluginManager` (plugin discovery). +2. Binds to the port in `server.port` (default 12345). +3. Starts the Py4J gateway used by the Python SDK. + +## Key sub-packages + +- `api.controller` — 30+ `@RestController` classes, one per domain (workflows, tasks, users, projects, tenants, resources, data sources, alerts, monitoring, …). All URLs rooted at `/dolphinscheduler/*`. +- `api.service` / `api.service.impl` — business-logic layer wrapping `dolphinscheduler-service` and adding API-level concerns (auth checks, DTO mapping). +- `api.security` — pluggable authenticators: `PASSWORD` (default), `LDAP`, `OIDC`, `CASDOOR`, `SSO`. Selected by `security.authentication.type`. +- `api.interceptor` — `LoginHandlerInterceptor` (session cookie check), `RateLimitInterceptor`, `LocaleChangeInterceptor`. +- `api.configuration` — Spring config beans: Swagger, OAuth2, task-type catalog. +- `api.dto` — request/response DTOs. +- `api.exceptions` — `ApiExceptionHandler` (`@RestControllerAdvice`) maps exceptions → structured JSON responses. + +## Extension points + +- `Authenticator` — implement a new one to support additional login backends (added cases go into `AuthenticationType` enum + registered in `SecurityConfig`). +- Controllers auto-pick up new task types via `TaskTypeConfiguration` reading `task-type-config.yaml` / `dynamic-task-type-config.yaml`. + +## Configuration + +`src/main/resources`: + +- `application.yaml` — server port, datasource, registry, security mode, CORS, OpenAPI. +- `task-type-config.yaml`, `dynamic-task-type-config.yaml` — the catalog of task types exposed to the UI. +- `i18n/messages_*.properties` — English + Simplified Chinese server-side messages. +- `swagger.properties` — springdoc config (UI lives at `/dolphinscheduler/swagger-ui/index.html`). +- `logback-spring.xml` — logging. + +## Gotchas + +- **Jetty, not Tomcat**. `spring-boot-starter-tomcat` is excluded transitively via `dolphinscheduler-meter`. Avoid accidentally pulling Tomcat in. +- **Session-based auth**: `LoginHandlerInterceptor` reads the `sessionId` cookie. There is no JWT by default. OIDC/CASDOOR paths still set a session. +- **OIDC requires `casdoor-spring-boot-starter`**; `OAuth2Configuration` is `@ConditionalOnProperty(security.authentication.type = OIDC|CASDOOR)`. +- **Python SDK integration** uses Py4J gateway, not REST. If a Python SDK change misbehaves, check `api-server` logs for Py4J init messages. +- **Controllers mix `@PostMapping` with form params and JSON bodies inconsistently** — this is legacy. Follow whatever shape the adjacent endpoint uses rather than converting to JSON across the board. +- **Swagger annotations are required** on new endpoints (`@Operation`, `@Parameter`). Missing ones break auto-generated docs the UI team consumes. + +## Tests + +- Unit tests in `src/test/java`. +- **Integration tests live in `dolphinscheduler-api-test`** (separate module, Docker Compose + Testcontainers). + +## Related modules + +- `dolphinscheduler-service`, `dolphinscheduler-dao` — primary deps. +- `dolphinscheduler-extract-master` / `-worker` / `-alert` — RPC into servers. +- `dolphinscheduler-authentication` (actuator sub-module) — secures `/actuator/**`. +- `dolphinscheduler-ui` — primary consumer. +- `dolphinscheduler-api-test` — integration harness. diff --git a/dolphinscheduler-authentication/CLAUDE.md b/dolphinscheduler-authentication/CLAUDE.md new file mode 100644 index 000000000000..3c63d9382b35 --- /dev/null +++ b/dolphinscheduler-authentication/CLAUDE.md @@ -0,0 +1,27 @@ +# CLAUDE.md — dolphinscheduler-authentication + +Parent POM grouping two unrelated authentication helpers. They share no code — they're bundled here because both are about injecting security behavior into the server stack. + +**This directory is a Maven parent POM.** + +## Sub-modules + +- **`dolphinscheduler-actuator-authentication`** — Secures Spring Boot Actuator endpoints (`/actuator/**` and `/dolphinscheduler/actuator/**`) with HTTP Basic auth. Ships `ActuatorAuthenticationAutoConfiguration` + `ActuatorSecurityProperties` (`management.security.*` properties). Enabled when `management.security.enabled=true`. Excludable endpoints (health/info) via config list. +- **`dolphinscheduler-aws-authentication`** — AWS credential provider abstraction used by AWS-based datasource/task plugins (EMR, S3, SageMaker, DMS, DataSync, etc.). Exposes `AWSCredentialsProviderFactor` with two strategies: `STATIC` (access key + secret) and `INSTANCE_PROFILE` (EC2/EKS IAM role). + +## Gotchas + +- **These two sub-modules are unrelated.** Don't treat "authentication" as a coherent module — the actuator one is about operator access; the AWS one is about cloud-task credentials. They don't share any class. +- **API login auth is NOT here**. Session/password/LDAP/OIDC/CASDOOR login for the main API lives in `dolphinscheduler-api/src/main/java/org/apache/dolphinscheduler/api/security/`. If a user says "auth", check which one they mean first. +- Actuator: the sample config uses `{noop}` password encoder. This is **development-only**; production must switch to `{bcrypt}` or an external IdP. +- AWS: `STATIC` credentials sit in plaintext config. Always prefer `INSTANCE_PROFILE` on real infrastructure. When reading code that takes an `AwsConfig` map, expect either style. + +## Tests + +Minimal — auto-config wiring only. End-to-end auth is covered by `dolphinscheduler-api-test`. + +## Related modules + +- `dolphinscheduler-meter` — exposes the actuator endpoints that `-actuator-authentication` secures. +- `dolphinscheduler-task-plugin` / `-datasource-plugin` AWS members — consume `-aws-authentication`. +- `dolphinscheduler-api` — for the real user-login code path (separate from this module). diff --git a/dolphinscheduler-bom/CLAUDE.md b/dolphinscheduler-bom/CLAUDE.md new file mode 100644 index 000000000000..d11899a51e86 --- /dev/null +++ b/dolphinscheduler-bom/CLAUDE.md @@ -0,0 +1,42 @@ +# CLAUDE.md — dolphinscheduler-bom + +Maven **Bill of Materials**. Pins versions for every third-party library the project uses, so individual modules can `` without specifying a version. + +## What goes here, what does NOT + +**Goes here**: +- `` entries for any lib used in more than one module. +- `` pinning version numbers with consistent naming (``, ``, …). +- Profile-controlled version switches (e.g. `zk-3.8` vs `zk-3.4`). + +**Does NOT go here**: +- Plugin versions (those live in the root `pom.xml`). +- Dependencies specific to one plugin (e.g. Spark client version → stays in `task-spark/pom.xml`). +- Anything the module defines for itself. + +## Key version properties (rough guide) + +- Spring Boot: 2.7.x (pinned by root pom, referenced here). +- MyBatis Plus: 3.5.2. +- gRPC: 1.41.0. +- Netty: 4.1.53. +- Hadoop: 3.2.4 (client side). +- AspectJ: 1.9.7. +- JDBC drivers: MySQL, PostgreSQL, Oracle, SQL Server, Snowflake, Databend, … + +When in doubt, **grep here first** for a dependency's current version before searching individual `pom.xml` files. + +## Profiles + +- `zk-3.8` (default) vs `zk-3.4` — switches Zookeeper + Curator versions. Active at build time via `-Pzk-3.4` if a legacy ZK cluster is targeted. + +## Gotchas + +- **`htrace-core` is explicitly excluded** from the Hadoop transitive graph as a CVE mitigation. Keep the exclusion even if someone reports the dep is "missing". +- **Duplicate property bug**: `zeppelin-client.version` was historically defined twice (lines ~120 and ~123 of an older revision). If you see a similar duplication when editing, consolidate — the last one wins silently. +- **No Java code** — `packaging: pom`. Don't add a `src/` directory. +- **Version bumps here ripple to every module**; bump deliberately and prefer one-dep-at-a-time commits for bisect-ability. + +## Related modules + +Every module that needs a pinned version imports this BOM in its own ``. diff --git a/dolphinscheduler-common/CLAUDE.md b/dolphinscheduler-common/CLAUDE.md new file mode 100644 index 000000000000..1b5cce2126c9 --- /dev/null +++ b/dolphinscheduler-common/CLAUDE.md @@ -0,0 +1,53 @@ +# CLAUDE.md — dolphinscheduler-common + +Foundation utility library. Every other backend module in the repo transitively depends on this one, so treat it as the lowest layer: no internal dependencies, minimal behavior, stable API. + +## What lives here + +- Shared utilities: string / date / file / map / JSON helpers, SSL, encryption, thread helpers. +- Global enums and constants (workflow status, task types, flag values, date formats). +- Shell command execution primitives (`ShellExecutor`, `AbstractShell`). +- Property delegation layer (`IPropertyDelegate`, `ImmutablePropertyDelegate`) used to read YAML/properties uniformly. +- Generic DAG data structure (`common.graph.DAG`) — used by master to traverse workflow graphs. +- Lifecycle interface (`IStoppable`) — implemented by every long-lived server (Master, Worker, Alert, API). + +## Main package + +`org.apache.dolphinscheduler.common` + +## Key sub-packages + +- `common.utils` — stateless helpers; the most-touched package in the module. +- `common.constants` — `Constants`, `TenantConstants`, `DateConstants`. +- `common.enums` — `WorkflowExecutionStatus`, `TaskExecutionStatus`, `Flag`, etc. Changes here ripple across every module — be careful when renaming. +- `common.config` — property-source abstractions consumed by Spring `@ConfigurationProperties` classes elsewhere. +- `common.shell` — blocking shell executor, used by task plugins. +- `common.graph` — generic DAG; not workflow-specific despite the usage. +- `common.lifecycle` — `IStoppable` only. + +## Extension points + +None. This module exposes no SPI; it is a leaf utility library. + +## Configuration + +`src/main/resources`: + +- `common.properties` — logging + cloud-storage credentials read by services. +- `remote-logging.yaml` — config for remote log fetch. +- `resource-center.yaml` — resource-center defaults. + +## Gotchas + +- Cloud-storage SDKs (Aliyun OSS, Huawei OBS, Tencent COS, Azure Blob) are declared with `true` / broad exclusions. Do **not** rely on them being on the classpath from here — the concrete storage plugins in `dolphinscheduler-storage-plugin` bring them in. +- Enum renames are breaking: workflow/task status enums are serialized into the database via MyBatis type handlers in `dolphinscheduler-dao`. Always grep for the enum value before renaming. +- `IStoppable` is the canonical shutdown contract across every server; do not invent a new lifecycle interface. + +## Tests + +Standard `src/test/java`. No special infra. + +## Related modules + +- `dolphinscheduler-spi` — uses `common` as `provided`; keep new SPI-facing types minimal here. +- Every `server-*` / `dao` / `service` / `plugin` module depends on this. diff --git a/dolphinscheduler-dao-plugin/CLAUDE.md b/dolphinscheduler-dao-plugin/CLAUDE.md new file mode 100644 index 000000000000..b5910dd984ab --- /dev/null +++ b/dolphinscheduler-dao-plugin/CLAUDE.md @@ -0,0 +1,37 @@ +# CLAUDE.md — dolphinscheduler-dao-plugin + +Plugin family for **database dialects** supporting the core DolphinScheduler metadata DB. Handles dialect-specific SQL generation, MyBatis-Plus `DbType` selection, and schema monitoring. + +**This directory is a Maven parent POM.** Not to be confused with `dolphinscheduler-datasource-plugin`, which is about *user-configured* external datasources — this module is about the *internal* metadata DB. + +## Sub-modules + +- **`dolphinscheduler-dao-api`** — SPI: `DaoPluginConfiguration`, `DatabaseDialect`, `DatabaseMonitor`, `DatabaseEnvironmentCondition`. +- **`dolphinscheduler-dao-plugin-all`** — uber bundle depended on by `dolphinscheduler-dao`. +- Concrete dialects: + - `dolphinscheduler-dao-mysql` — MySQL 5.7+ (production). + - `dolphinscheduler-dao-postgresql` — PostgreSQL 9.6+ (production). + - `dolphinscheduler-dao-h2` — H2 (dev / tests / standalone server). + +## How the right dialect is picked + +Each sub-module registers a `@AutoConfiguration` class (Spring Boot 2.7 style) that is `@Conditional(DatabaseEnvironmentCondition.class)`. `DatabaseEnvironmentCondition` looks at `spring.datasource.driver-class-name` and matches it to the dialect. + +Switching the DB type therefore only requires changing the driver + URL in `application.yaml` — no pom changes. + +## Gotchas + +- **This is not a user-facing SPI**. There are exactly three supported internal DBs; adding a fourth (e.g. MariaDB, OceanBase for the metadata DB) requires coordinated changes in `dolphinscheduler-dao` SQL scripts and `dolphinscheduler-tools` upgraders. +- **MyBatis-Plus `DbType` (`com.baomidou.mybatisplus.annotation.DbType`) is NOT the same enum as `dolphinscheduler-spi`'s `DbType`**. Internal DB uses the MyBatis-Plus one; external datasources use the spi one. When editing code here, make sure you're importing the right one. +- **Dialect-specific SQL**: pagination (MySQL `LIMIT` vs PostgreSQL `OFFSET … LIMIT`), upsert behavior, JSON column handling. The `DatabaseDialect` interface is the authoritative place to vary SQL between backends — don't add `if (dbType == X)` branches in mappers. +- **H2 is only for dev/test**. Production deployments should not run on H2. The standalone server is the only shipping configuration that uses it. + +## Tests + +Each dialect sub-module has its own `src/test/java` exercising the dialect behavior, typically against an embedded driver or Testcontainers. + +## Related modules + +- `dolphinscheduler-dao` — primary consumer (depends on `dao-plugin-all`). +- `dolphinscheduler-tools` — DB schema upgrader; knows about the same three dialects. +- `dolphinscheduler-standalone-server` — uses `-dao-h2` by default. diff --git a/dolphinscheduler-dao/CLAUDE.md b/dolphinscheduler-dao/CLAUDE.md new file mode 100644 index 000000000000..fdb937a8c379 --- /dev/null +++ b/dolphinscheduler-dao/CLAUDE.md @@ -0,0 +1,41 @@ +# CLAUDE.md — dolphinscheduler-dao + +MyBatis-based data-access layer. Holds every entity, mapper, repository, and the SQL migration scripts shipped with a release. + +## Main package + +`org.apache.dolphinscheduler.dao` + +## Key sub-packages + +- `dao.entity` — persistence POJOs (`Command`, `WorkflowInstance`, `TaskInstance`, `User`, `Project`, `Tenant`, `DataSource`, `Cluster`, etc.). These map 1:1 to DB tables. +- `dao.mapper` — MyBatis `@Mapper` interfaces. One per table/entity (`CommandMapper`, `WorkflowInstanceMapper`, `TaskInstanceMapper`, …). +- `dao.repository` — higher-level repository abstractions wrapping mappers; what `service`/`master` layers should call, **not** the mappers directly. +- `dao.model` — DAO-specific DTOs (query projections, aggregates). +- `dao.utils` — SQL helpers. + +## SQL schema + +`src/main/resources/sql/`: + +- `dolphinscheduler_mysql.sql`, `dolphinscheduler_postgresql.sql`, `dolphinscheduler_h2.sql` — fresh install. +- `upgrade//` — version-to-version migration DDL. Pairs with the upgraders in `dolphinscheduler-tools`. + +## Gotchas + +- **Mappers are not enough**: new code should go through `dao.repository`. Calling mappers directly from `master`/`service` is legacy and should not spread. +- **Three DB dialects supported** (MySQL, PostgreSQL, H2). Any mapper XML must be dialect-neutral or rely on the dialect abstraction in `dolphinscheduler-dao-plugin`. Grep for `` when in doubt — avoid adding more of those. +- **Entity field renames are breaking**: MyBatis maps DB columns to field names via `@TableField` / naming conventions. Rename the column OR the field — never only one. +- **Schema file excluded from jar**: `*.sql` is excluded from the built jar to keep it small; `dolphinscheduler-tools` repackages the SQL separately for the upgrade CLI. +- `HikariCP` is the pool. Don't switch to Druid — datasource-plugin uses Druid internally for its own plugin connections, but the core DolphinScheduler metadata DB uses HikariCP. + +## Tests + +`src/test/java` — a mix of unit and Testcontainers-based integration tests. The database-dialect-specific paths are also exercised by `dolphinscheduler-dao-plugin/*/src/test`. + +## Related modules + +- `dolphinscheduler-dao-plugin` — dialect implementations; depended on here as `dolphinscheduler-dao-plugin-all`. +- `dolphinscheduler-common` — utilities. +- `dolphinscheduler-task-api` — task-related entities cross-reference it. +- `dolphinscheduler-tools` — reads the SQL files here to run schema upgrades. diff --git a/dolphinscheduler-datasource-plugin/CLAUDE.md b/dolphinscheduler-datasource-plugin/CLAUDE.md new file mode 100644 index 000000000000..d5916b5b2d6e --- /dev/null +++ b/dolphinscheduler-datasource-plugin/CLAUDE.md @@ -0,0 +1,55 @@ +# CLAUDE.md — dolphinscheduler-datasource-plugin + +Plugin family for **datasources** — the configurable DB/warehouse/query-engine connections a user can register (MySQL, PostgreSQL, Hive, Trino, Redshift, Snowflake, …). Used by the SQL-style task plugins, the API for connection testing, and the data-lineage tool. + +**This directory is a Maven parent POM.** + +## Sub-modules + +### Framework + +- **`dolphinscheduler-datasource-api`** — SPI and base types: `DataSourceProcessor`, `AbstractDataSourceProcessor`, `BaseDataSourceParamDTO`, `BaseHdfsConnectionParam`, and the `DataSourcePluginManager` that loads plugins. +- **`dolphinscheduler-datasource-all`** — uber module bundling every plugin; depended on by `master`, `worker`, `api` so they can use any datasource at runtime. + +### Concrete plugins (28 sub-modules) + +Relational: `-mysql`, `-postgresql`, `-oracle`, `-sqlserver`, `-db2`, `-oceanbase`, `-dameng`, `-hana`, `-azure-sql`, `-vertica`. + +OLAP / warehouses: `-clickhouse`, `-doris`, `-starrocks`, `-redshift`, `-snowflake`, `-databend`, `-dolphindb`. + +Big-data engines: `-hive`, `-spark`, `-presto`, `-trino`, `-kyuubi`, `-athena`. + +Cloud / other: `-sagemaker`, `-k8s`, `-aliyunserverlessspark`, `-ssh`, `-zeppelin`. + +## SPI contract + +Two interfaces are **both** implemented by each plugin: + +1. `DataSourceChannelFactory` (from `dolphinscheduler-spi`, loaded via `PrioritySPIFactory`) — creates `DataSourceChannel`s for low-level connection handling. +2. `DataSourceProcessor` (from `datasource-api`, loaded via standard `ServiceLoader`) — validates + builds connection params, handles the UI-form contract. + +Both are wired into plugins with `@AutoService(...)`. Discovery happens in `DataSourcePluginManager`. + +## How it plugs into the rest of the system + +- **API**: when the user opens the "Create Datasource" dialog, the API asks each `DataSourceProcessor` for its param descriptor and serves it to the UI. +- **Worker**: SQL-family task plugins (`task-sql`, `task-procedure`, `task-datax`, …) look up the configured datasource by id, ask `datasource-api` for a JDBC connection via the channel, run the statement. + +## Gotchas + +- **Dual SPI loading is intentional**: channels are priority-ordered (multiple providers can coexist); processors are keyed by `DbType` (one per type). Don't merge them. +- **`DbType` enum lives in `dolphinscheduler-spi`**. Adding a new datasource means: new enum value in spi + new plugin sub-module. Leaving the enum unchanged will cause `DataSourcePluginManager` to silently ignore the plugin. +- **Passwords are encrypted** via `PasswordUtils` in `common`. Never store or log the plaintext form; always go through `PasswordUtils.encodePassword` / `decodePassword`. +- **JDBC connection pooling** is **Druid** (not HikariCP). This is different from the metadata DB (`dolphinscheduler-dao` uses HikariCP). Be careful with which you're tuning. +- **Hive / Spark / Presto plugins pull massive dep trees** — exclusions in their poms are load-bearing. Adding a new version bump here can blow up the classpath. + +## Tests + +Each plugin has `src/test/java`. Most use an embedded H2 or a mocked driver; a few use Testcontainers (e.g. `-postgresql`, `-mysql`). + +## Related modules + +- `dolphinscheduler-spi` — base SPI (`DataSourceChannelFactory`, `DbType`). +- `dolphinscheduler-task-plugin` — primary consumer (SQL family plugins). +- `dolphinscheduler-api`, `dolphinscheduler-service` — consumers for configuration + validation. +- `dolphinscheduler-common` — `PasswordUtils`. diff --git a/dolphinscheduler-dist/CLAUDE.md b/dolphinscheduler-dist/CLAUDE.md new file mode 100644 index 000000000000..9213d0a36143 --- /dev/null +++ b/dolphinscheduler-dist/CLAUDE.md @@ -0,0 +1,53 @@ +# CLAUDE.md — dolphinscheduler-dist + +The **release assembler**. Produces the binary tarball (`apache-dolphinscheduler--bin.tar.gz`) and the source tarball, optionally a Docker image. Has no Java code — it is a Maven assembly + shell-script orchestration module. + +## Artifacts produced + +- `apache-dolphinscheduler-${version}-bin.tar.gz` — standalone-server + master + worker + api + alert + UI + tools + plugin jars + scripts + configs. +- `apache-dolphinscheduler-${version}-src.tar.gz` — ASF-compliant source tarball. +- Docker images — when built with `-P docker`. + +## Assembly descriptors + +`src/main/assembly/`: + +- `dolphinscheduler-bin.xml` — file layout of the binary tarball. +- `dolphinscheduler-src.xml` — file layout of the source tarball. + +## Scripts + +`src/main/scripts/` (approx): + +- `assembly-plugins.sh` — copies plugin jars into the right `lib/plugin/` subdirectories. +- `docker-build.sh` — builds Docker images (invoked when `-P docker` and Docker CLI available). + +## Build + +``` +# Binary tarball (from repo root) +./mvnw -pl dolphinscheduler-dist -am clean package + +# With Docker images +./mvnw -pl dolphinscheduler-dist -am -P docker clean package +``` + +Output lands in `dolphinscheduler-dist/target/`. + +## Gotchas + +- **`-P docker` requires a working Docker CLI + BuildKit**. CI sets `DOCKER_BUILDKIT=1` explicitly. +- **`-am` is almost always needed** (`also make`) — this module reaches into every server, plugin, UI, and tool, so their sibling modules must be built first. +- **The binary tarball layout is operator-facing**. Operators depend on `bin/`, `conf/`, `lib/`, `libs/plugin/` existing at known paths. Don't reorganize without announcing it. +- **UI is built inside `dolphinscheduler-ui`** (Node / pnpm) during `./mvnw package`; this module picks up the `dist/` folder. A broken UI build breaks the dist build. +- **Plugin jars are sorted into `libs/plugin//` by `assembly-plugins.sh`** — new plugin families need to be added to this script. +- Source tarball rules are **ASF-compliant**: excludes binaries, `.git*`, etc. Don't introduce large binaries into the repo. + +## Tests + +None. Verification is end-to-end via `dolphinscheduler-e2e` which runs against a built tarball. + +## Related modules + +- Dependencies at package-time: `dolphinscheduler-standalone-server`, `-api`, `-alert-server`, `-ui`, `-tools`. +- Plugin families are picked up transitively via the `-all` uber modules (`task-all`, `datasource-all`, `storage-all`, `alert-all`, `scheduler-all`, `registry-all`, `dao-plugin-all`). diff --git a/dolphinscheduler-e2e/CLAUDE.md b/dolphinscheduler-e2e/CLAUDE.md new file mode 100644 index 000000000000..8a16820db199 --- /dev/null +++ b/dolphinscheduler-e2e/CLAUDE.md @@ -0,0 +1,65 @@ +# CLAUDE.md — dolphinscheduler-e2e + +End-to-end **browser** tests. Selenium drives a real Chrome instance against a full DolphinScheduler stack booted with Docker Compose via Testcontainers. Complements the REST-level tests in `dolphinscheduler-api-test`. + +**This is a Maven parent POM.** Like `dolphinscheduler-api-test`, it is **not** declared in the root `pom.xml` `` — run it explicitly or via dedicated CI. + +## Sub-modules + +- **`dolphinscheduler-e2e-core`** — the framework: + - `@DolphinScheduler` annotation — declares `composeFiles` and Selenium setup. + - `DolphinSchedulerExtension` — JUnit 5 extension managing Testcontainers Compose + `BrowserWebDriverContainer` (Selenium 4, headless Chrome). Records videos. + - Page-object base classes using Selenium `@FindBy` (`LoginPage`, `NavBarPage`, `UserPage`, `TenantPage`, `SecurityPage`, …). +- **`dolphinscheduler-e2e-case`** — the test classes (`*E2ETest`) using page objects. One class per feature area. + +## Test framework versions + +- JUnit 5 (Jupiter). +- Selenium 4.21. +- Testcontainers 1.21 (Compose + BrowserWebDriver modules). +- AssertJ + Awaitility. + +## Running + +``` +# Full suite (from repo root) +mvn -pl dolphinscheduler-e2e/dolphinscheduler-e2e-case test + +# Single test class (Apple Silicon) +mvn -pl dolphinscheduler-e2e/dolphinscheduler-e2e-case test \ + -Dtest=UserE2ETest -Dm1_chip=true +``` + +Flags: + +- `-Dm1_chip=true` — pulls arm64 docker images. +- `-Dlocal=true` — skip Testcontainers; point at a locally running stack (for fast iteration in the IDE). + +## How a test runs + +1. `@DolphinScheduler(composeFiles = {...})` starts Compose (DolphinScheduler + DB + browser container). +2. `DolphinSchedulerExtension` injects a `RemoteWebDriver` and page objects into the test. +3. `LoginPage.login(...)` authenticates, returns a session. +4. Subsequent page objects are built around that session; `@Order` sequences the steps. +5. On completion, the Selenium container ships the recorded MP4 into a temp dir. + +## Gotchas + +- **Chrome runs inside Docker** (Selenium BrowserWebDriverContainer); headless is implicit. You don't need a local Chromedriver. +- **Video recording is RECORD_ALL by default** — check the temp directory referenced in the Maven log for `.mp4` files when debugging a failure. +- **`@Order` is mandatory** — tests assume sequential state (just like `dolphinscheduler-api-test`). +- **`@DisableIfTestFails`** (junit-pioneer) cascades a failure to dependent tests. +- **Apple Silicon**: without `-Dm1_chip=true` the default images are amd64 and run under qemu — unbearably slow. +- **Flaky UI waits**: use `Awaitility` (already on the classpath), not `Thread.sleep`. If you see a `sleep` in a page object, consider replacing it. +- **Local mode (`-Dlocal=true`)**: you need to start the backend + UI yourself beforehand. The extension will skip Compose but still boot a local Chrome container. + +## Compose files + +`src/test/resources/docker//docker-compose.yaml` — one scenario per test class typically. Keep them minimal; only add a service if a test needs it. + +## Related modules + +- `dolphinscheduler-ui` — the frontend under test. +- `dolphinscheduler-api` — the backend under test. +- `dolphinscheduler-api-test` — sibling test harness at the REST layer. +- `dolphinscheduler-dist` — produces the tarball some Compose scenarios install. diff --git a/dolphinscheduler-eventbus/CLAUDE.md b/dolphinscheduler-eventbus/CLAUDE.md new file mode 100644 index 000000000000..5e6b9b96e2b4 --- /dev/null +++ b/dolphinscheduler-eventbus/CLAUDE.md @@ -0,0 +1,38 @@ +# CLAUDE.md — dolphinscheduler-eventbus + +A **local, in-process** event-bus abstraction with optional delay-queue semantics. Used internally by master/worker/task-executor/alert to decouple producers from consumers without introducing a real message broker. + +## What this is NOT + +This is **not** a distributed event bus. Events do not cross JVM boundaries. For cross-process notification use the RPC interfaces in `dolphinscheduler-extract`. + +## Main package + +`org.apache.dolphinscheduler.eventbus` + +## Key types + +- `IEvent` — marker interface for every event. +- `IEventBus` — producer/consumer contract: `publish`, `poll`, `take`, `peek`, `remove`, `isEmpty`. +- `AbstractDelayEvent` — event with a fire-at timestamp (implements `java.util.concurrent.Delayed`). +- `AbstractDelayEventBus` — default in-memory implementation backed by `DelayQueue` (or plain `BlockingQueue` for non-delayed buses). + +## Gotchas + +- `AbstractDelayEvent.getDelay` is called by `DelayQueue` on every comparison. Keep it cheap and **side-effect free** — no DB reads, no clock skew corrections. +- Events are lost if the JVM restarts — the bus has no persistence. Consumers must be designed to re-derive state from the DB on startup. +- Subclasses of `AbstractDelayEventBus` are consumed as Spring beans in master/worker. Names (bean type) matter — grep before renaming. + +## Typical usage pattern + +Each subsystem defines its own bus: `TaskExecutorEventBus` in `task-executor`, `AlertEventLoop` + `AlertEventPendingQueue` in `alert-server`, lifecycle-event bus in `master.engine`. The pattern is: domain-specific `IEvent` hierarchy → dedicated `AbstractDelayEventBus` subclass → single producer thread per bus. + +## Tests + +Standard `src/test/java`. + +## Related modules + +- `dolphinscheduler-task-executor` — heaviest consumer; defines the task-lifecycle event hierarchy on top of this. +- `dolphinscheduler-master` — uses it inside the workflow execution engine. +- `dolphinscheduler-alert` — the alert server's event loop is built on top of this. diff --git a/dolphinscheduler-extract/CLAUDE.md b/dolphinscheduler-extract/CLAUDE.md new file mode 100644 index 000000000000..3488589bb314 --- /dev/null +++ b/dolphinscheduler-extract/CLAUDE.md @@ -0,0 +1,39 @@ +# CLAUDE.md — dolphinscheduler-extract + +RPC **interface** definitions for inter-server calls (master ↔ worker ↔ alert ↔ api). Contains contracts only — the concrete Spring `@RpcService` implementations live in the caller's module. + +**This directory is a Maven parent POM.** + +## Sub-modules + +- **`dolphinscheduler-extract-base`** — RPC transport, framing, serialization; `@RpcService` annotation; `IRpcRequest` / `IRpcResponse`; client/server bootstrap. Everything else depends on this. +- **`dolphinscheduler-extract-common`** — contracts usable by every role (e.g. `ILogService` for log fetching). +- **`dolphinscheduler-extract-master`** — interfaces callers use to talk **to** master (`ITaskInstanceController`, `IWorkflowControlClient`, `IMasterContainerService`, `IWorkflowMetricService`). Implemented inside `dolphinscheduler-master/rpc`. +- **`dolphinscheduler-extract-worker`** — interfaces to talk **to** worker (`IPhysicalTaskExecutorOperator`, `IStreamingTaskInstanceOperator`, `ITaskExecutorQueryClient`). Implemented inside `dolphinscheduler-worker/rpc`. +- **`dolphinscheduler-extract-alert`** — interfaces to talk **to** the alert server. Implemented inside `dolphinscheduler-alert-server`. + +## The rule + +``` +A server depends on extract- to CALL that role. +A server depends on extract- to IMPLEMENT its own RPC surface. +``` + +Example: master depends on `extract-worker` (to dispatch tasks) **and** on `extract-master` (to declare what it implements). + +## Gotchas + +- **Interface-only**: do not put implementation helpers in `-master`/`-worker`/`-alert`. If a helper is shared, it goes in `-base` or `-common`. +- **Serialization is in `-base`**: changing method signatures in a sub-interface is a wire-protocol change, not just a Java change. Bump carefully — rolling upgrades exist in the wild. +- **`@RpcService` bean discovery**: only Spring beans annotated with `@RpcService` are exported. If a new method doesn't appear on the remote side, check the caller's server module registered the bean. +- Don't add `dolphinscheduler-common` dependencies to `-base` — it's kept lean on purpose (like `-spi`). + +## Tests + +Per sub-module `src/test/java`. Integration-style RPC tests live in the implementing modules (master/worker/alert), not here. + +## Related modules + +- `dolphinscheduler-master` / `-worker` / `-alert-server` — the implementers. +- `dolphinscheduler-api` / `-service` — callers (API reaches into master/worker for runtime control). +- `dolphinscheduler-microbench` — RPC benchmarks against `extract-base`. diff --git a/dolphinscheduler-master/CLAUDE.md b/dolphinscheduler-master/CLAUDE.md new file mode 100644 index 000000000000..57b1e6bc0b39 --- /dev/null +++ b/dolphinscheduler-master/CLAUDE.md @@ -0,0 +1,64 @@ +# CLAUDE.md — dolphinscheduler-master + +The **Master** server. Owns workflow orchestration: consumes `Command` rows, runs the workflow state machine, dispatches tasks to workers over RPC, handles failover. Runs as a Spring Boot application, scales horizontally (multiple masters coordinate via the registry). + +## Entry point + +`MasterServer` — `@SpringBootApplication`, implements `IStoppable`. Default port set in `application.yaml`. + +## Main package + +`org.apache.dolphinscheduler.server.master` + +## Key sub-packages + +- `server.master.engine` — the workflow execution engine: command handlers, workflow/task state machines, lifecycle event handlers, event bus. **Where the orchestration logic actually lives.** +- `server.master.rpc` — `MasterRpcServer` + RPC implementations (`TaskInstanceControllerImpl`, workflow control, master-to-master). Implements the contracts from `dolphinscheduler-extract-master`. +- `server.master.cluster` — worker cluster view, load balancing, metadata tracking. Decides *which* worker a task is dispatched to. +- `server.master.registry` — `MasterRegistryClient`: masters' own registration + discovery. +- `server.master.failover` — master and worker failover: detects a dead peer and recovers in-flight workflows. +- `server.master.runner` — workflow and task execution contexts (per-workflow and per-task runtime state holders). +- `server.master.metrics` — Micrometer gauges/counters for master health. + +## HA and coordination + +- `MasterCoordinator` extends `AbstractHAServer` and elects a leader via the registry. Cron scheduling triggers only fire on the leader. +- `MasterRegistryClient` registers an ephemeral node; peers receive a disconnect event and trigger failover. + +## Extension points + +Exposed interfaces: + +- `ITaskGroupCoordinator`, `IWorkflowSerialCoordinator` — pluggable concurrency/serialization strategies. +- `IWorkflowRepository` — where workflow runtime state lives (defaults to in-memory + DB). +- `ILifecycleEventHandler`, `ILifecycleEventType` — add a new lifecycle event without editing the core state machine. + +## Gotchas + +- **Command-driven execution**. Nothing runs until a row is inserted into the `t_ds_command` table. If a workflow "does nothing" after a click, follow the trail: controller → `CommandService.insertCommand` → master command consumer → `WorkflowEngine`. +- **State-machine pattern is central**. Do not sneak an ad-hoc state transition into a service; add a lifecycle event and handler so the whole engine sees it. +- **`delight-nashorn-sandbox`** is used for unsafe scripting (e.g. conditional branch expressions). Upgrading the dep is sensitive — test the condition/switch task flows. +- **Scheduler integration via `SchedulerApi`** (from `dolphinscheduler-scheduler-plugin`). The only current impl is Quartz — but do not couple directly to `Scheduler` (Quartz). +- **Cross-master coordination uses the registry**, not RPC. Adding a new coordination primitive → put the key scheme in `server.master.cluster` and document it. +- **Failover is the highest-risk code path**. Any change in `server.master.failover` must be exercised against `AbstractMasterIntegrationTestCase` scenarios. +- **Heavy use of async tasks** via event buses. Do not introduce `Thread.sleep` in handlers — publish a delayed event instead. + +## Configuration + +`src/main/resources`: + +- `application.yaml` — server port, worker-group defaults, timeouts, cluster settings. +- `logback-spring.xml`, `banner.txt`. + +## Tests + +`src/test/java` — unit + integration. Integration tests extend `AbstractMasterIntegrationTestCase`; they simulate distributed scenarios including failover. + +## Related modules + +- `dolphinscheduler-extract-master` — the RPCs this module implements. +- `dolphinscheduler-extract-worker` — the RPCs this module calls. +- `dolphinscheduler-task-executor` — shared task-lifecycle event model (lifecycle events received from workers). +- `dolphinscheduler-service` — `ProcessService` + `CommandService`. +- `dolphinscheduler-registry-all`, `dolphinscheduler-scheduler-all`, `dolphinscheduler-storage-api`, `dolphinscheduler-datasource-api` — runtime deps. +- `dolphinscheduler-eventbus` — in-process event bus inside the engine. diff --git a/dolphinscheduler-meter/CLAUDE.md b/dolphinscheduler-meter/CLAUDE.md new file mode 100644 index 000000000000..36c54fd15a16 --- /dev/null +++ b/dolphinscheduler-meter/CLAUDE.md @@ -0,0 +1,40 @@ +# CLAUDE.md — dolphinscheduler-meter + +Metrics collection + Prometheus exposure + server-load-protection primitives. Auto-configured via Spring Boot so that any server (master, worker, api, alert) gets `/actuator/prometheus` for free just by depending on this module. + +## Main package + +`org.apache.dolphinscheduler.meter` + +## Key sub-packages + +- `meter.metrics` — `MetricsProvider`, `SystemMetrics`, `DefaultMetricsProvider`. +- `meter.loadprotection` (if present) — `ServerLoadProtection`, `BaseServerLoadProtection`. + +## Key classes + +- `MeterAutoConfiguration` — Spring Boot auto-configuration entry; pulled in by any server that puts this module on the classpath. +- `MetricsProvider` — SPI for custom metrics suppliers. +- `SystemMetrics` — CPU / memory / disk via Micrometer. +- `ServerLoadProtection` — interface the worker uses to reject new tasks under load; the default impl reads thresholds from config. + +## Gotchas + +- **Jetty, not Tomcat**. This module (and everything depending on it) excludes `spring-boot-starter-tomcat` and brings `spring-boot-starter-jetty`. Introducing a dependency that transitively re-adds Tomcat will trigger port-binding conflicts at runtime — use ``. +- Grafana dashboards live under `grafana/` and `grafana-demo/` — **example-only**, excluded from the jar. Production operators customize their own. +- The server-load-protection thresholds are consumed by `dolphinscheduler-worker` (`WorkerServerLoadProtection`). Changes to the interface here cascade to worker-side overrides. +- Prometheus scrape endpoint is at `/actuator/prometheus` (Spring Boot Actuator default). Auth is applied by `dolphinscheduler-actuator-authentication` when present on the classpath. + +## Configuration + +Consumers provide `management.endpoints.web.exposure.include=prometheus,health,info,metrics` in their `application.yaml`. This module does not ship its own YAML. + +## Tests + +Minimal — mostly auto-config wiring. + +## Related modules + +- `dolphinscheduler-common` — compile dep. +- `dolphinscheduler-actuator-authentication` (in `dolphinscheduler-authentication/`) — secures the actuator endpoints this module exposes. +- Every server module includes meter for metrics. diff --git a/dolphinscheduler-microbench/CLAUDE.md b/dolphinscheduler-microbench/CLAUDE.md new file mode 100644 index 000000000000..e7db93f31b98 --- /dev/null +++ b/dolphinscheduler-microbench/CLAUDE.md @@ -0,0 +1,36 @@ +# CLAUDE.md — dolphinscheduler-microbench + +JMH (Java Microbenchmark Harness) micro-benchmarks. Used to measure RPC throughput and enum / hot-path utility performance. **Not** run as part of the regular build. + +## Main package + +`org.apache.dolphinscheduler.microbench` + +## Key sub-packages / classes + +- `microbench.base.AbstractBaseBenchmark` — JMH `@Setup`/`@TearDown` plumbing. 5 warmup + 10 measurement iterations by default. Runs JMH via `Runner` from a JUnit `@Test` method (so the JMH launch is triggered by `mvn test` when you opt in). +- `microbench.rpc.RpcBenchMarkTest` — Netty-based RPC ping/pong benchmarks, exercises `dolphinscheduler-extract-base`. +- `microbench.common.EnumBenchMark` — enum lookup / switch benchmarks. + +## How to run + +``` +# Build the uber-jar +./mvnw -pl dolphinscheduler-microbench -am package + +# Run (Main-Class: org.openjdk.jmh.Main) +java -jar dolphinscheduler-microbench/target/benchmarks.jar [regex] +``` + +Or run an individual `*BenchMarkTest` class from the IDE — `AbstractBaseBenchmark` invokes the JMH `Runner` inside the `@Test` method. + +## Gotchas + +- **JMH annotation processor is required** (`jmh-generator-annprocess` at `provided` scope). IDE setup must enable annotation processing for this module or `@Benchmark` methods won't be picked up. +- **Incremental compilation is disabled** in this module's pom because JMH's processor regenerates everything — do not re-enable. +- **Results are noisy** on non-dedicated hardware. Don't ship benchmark deltas from a laptop as evidence of a performance regression / improvement. +- **Module is excluded from the release tarball** (it's a developer tool). It is in the root reactor build so `mvn compile` covers it, but its `@Test` entry points are fast and harmless. + +## Related modules + +- `dolphinscheduler-extract-base` — the RPC framework under benchmark. diff --git a/dolphinscheduler-registry/CLAUDE.md b/dolphinscheduler-registry/CLAUDE.md new file mode 100644 index 000000000000..aa356b86ffa1 --- /dev/null +++ b/dolphinscheduler-registry/CLAUDE.md @@ -0,0 +1,44 @@ +# CLAUDE.md — dolphinscheduler-registry + +Pluggable registry abstraction for service discovery, metadata storage, ephemeral node management, and distributed locks. Backends: Zookeeper (default), Etcd, JDBC. + +**This directory is a Maven parent POM — no code lives here directly.** + +## Sub-modules + +- **`dolphinscheduler-registry-api`** — the `Registry` SPI interface, `Event`, `ConnectionListener`, `SubscribeListener`. Depend on this if you want to *consume* the registry. +- **`dolphinscheduler-registry-plugins`** — parent of the three concrete implementations: + - `dolphinscheduler-registry-zookeeper` (default, uses Curator) + - `dolphinscheduler-registry-etcd` + - `dolphinscheduler-registry-jdbc` (uses the main DolphinScheduler DB; useful when Zookeeper/Etcd aren't available) + - `dolphinscheduler-registry-it` (integration tests exercising all three) +- **`dolphinscheduler-registry-all`** — uber module that bundles every implementation; depended on by the servers so a user can switch registry via config without touching pom.xml. + +## SPI contract + +`Registry` (in `-api`): +- Connection lifecycle: `start`, `isConnected`. +- KV: `put`, `get`, `delete`. +- Subscription: `subscribe(path, listener)`. +- Locking: `lock(key)`, `tryLock`, `unlock`. +- Connection state: `addConnectionStateListener`. + +## Which backend is active at runtime? + +Selected by Spring properties (`registry.type=zookeeper|etcd|jdbc`) — implementations are `@ConditionalOnProperty`. Only one `Registry` bean is created per JVM. + +## Gotchas + +- **Zookeeper version is profile-driven**: root POM defines `zk-3.8` (default) and `zk-3.4` Maven profiles that flip Curator + Zookeeper versions (see `dolphinscheduler-bom/pom.xml`). Do not pin a Zookeeper version in this module. +- **Ephemeral nodes on disconnect**: worker/master/alert register as ephemeral; a long ZK session loss triggers failover in master. Changing registration TTL semantics here affects the HA story — cross-check with `dolphinscheduler-master`'s `MasterRegistryClient` and the failover package. +- **`jdbc` backend** shares the main DolphinScheduler datasource — do not point it at a separate DB. +- **Distributed lock keys**: whichever backend is active, `lock` semantics must be fair and re-entrant per the tests in `-it`. Don't reimplement lock logic in callers. + +## Tests + +`dolphinscheduler-registry-it` spins up each backend (Testcontainers for ZK/etcd/MySQL) and runs a shared contract suite. Any new `Registry` method must be exercised in `-it`. + +## Related modules + +- `dolphinscheduler-master` / `-worker` / `-api` / `-alert-server` — depend on `registry-all` and consume the `Registry` bean. +- `dolphinscheduler-service` — depends on `registry-api`. diff --git a/dolphinscheduler-scheduler-plugin/CLAUDE.md b/dolphinscheduler-scheduler-plugin/CLAUDE.md new file mode 100644 index 000000000000..798c75958d6e --- /dev/null +++ b/dolphinscheduler-scheduler-plugin/CLAUDE.md @@ -0,0 +1,38 @@ +# CLAUDE.md — dolphinscheduler-scheduler-plugin + +Plugin family for the **cron-trigger scheduler** that fires workflows at their scheduled times. Only one implementation today (Quartz), but the SPI is in place so alternatives can be added. + +**This directory is a Maven parent POM.** + +## Sub-modules + +- **`dolphinscheduler-scheduler-api`** — the `SchedulerApi` interface (`start`, `insertOrUpdateScheduleTask`, `deleteScheduleTask`, `close`) + related DTOs. +- **`dolphinscheduler-scheduler-quartz`** — Quartz-based implementation. `QuartzScheduler` (SchedulerApi impl), `QuartzSchedulerAutoConfiguration`, `QuartzSchedulerDataSourceAutoConfiguration`, `QuartzTriggerBuilder`. +- **`dolphinscheduler-scheduler-all`** — uber module consumed by master. + +## Who triggers what + +1. User defines a schedule in the UI → API persists a row. +2. Master's leader starts the scheduler; `SchedulerApi.insertOrUpdateScheduleTask` registers the Quartz job. +3. When Quartz fires, it inserts a `t_ds_command` row (WorkflowScheduler → CommandService). Master's command consumer picks it up and runs the workflow. + +## Gotchas + +- **Only the master leader runs the scheduler**. Non-leader masters hold `SchedulerApi.close()`-like quiet state. Electing a new leader must re-register all schedules. +- **Separate Quartz datasource**: `QuartzSchedulerDataSourceAutoConfiguration` configures its own datasource pointing at the same DB as DolphinScheduler but with Quartz's own tables (`QRTZ_*`). Upgrades must run Quartz's own DDL as well as DolphinScheduler's. +- **Job key scheme**: `jobKey` concatenates `projectId_scheduleId`. Renaming this scheme breaks in-flight scheduled tasks — avoid. +- **Cron parsing happens in two places**: `dolphinscheduler-service/cron` uses `cron-utils` for *display* (next fire time in UI); Quartz internally uses Quartz cron syntax for *firing*. The two are mostly compatible but DOW conventions differ slightly — always validate with both. +- **Do not couple to Quartz types outside this module**. Callers must depend on `SchedulerApi`, never on `org.quartz.Scheduler` directly. + +## Extension points + +A new scheduler implementation would: (1) create a sibling sub-module; (2) implement `SchedulerApi`; (3) register via Spring Boot `spring.factories` / `AutoConfiguration.imports`; (4) add itself to `scheduler-all`. + +## Tests + +Inside `-quartz/src/test/java`. + +## Related modules + +- `dolphinscheduler-master` — consumes `scheduler-all`. +- `dolphinscheduler-service` — uses `CronService` for next-fire-time display (independent of this module's runtime). diff --git a/dolphinscheduler-service/CLAUDE.md b/dolphinscheduler-service/CLAUDE.md new file mode 100644 index 000000000000..8f4e374d7738 --- /dev/null +++ b/dolphinscheduler-service/CLAUDE.md @@ -0,0 +1,41 @@ +# CLAUDE.md — dolphinscheduler-service + +Business-logic layer sitting **between** `dao` and the server modules (`api`, `master`, `worker`, `alert-server`). Owns orchestration concerns: workflow lifecycle, command processing, cron scheduling, alert dispatch, parameter expansion. + +## Main package + +`org.apache.dolphinscheduler.service` + +## Key sub-packages + +- `service.process` — `ProcessServiceImpl`: the single largest service, used by master & api. It is the de-facto orchestration facade; nearly every workflow operation passes through it. +- `service.command` — `CommandServiceImpl`: enqueue/consume `Command` rows that trigger workflow runs (start, restart, pause, recover). +- `service.cron` — `CronService`: cron parsing + next-fire calculation (uses the `cron-utils` lib, **not** Quartz expressions directly). +- `service.alert` — `AlertService`, `AlertNotificationService`: bridge between workflow/task state changes and the alert server. +- `service.expand` — `CuringParamsServiceImpl`: expand `${paramName}` placeholders in task parameters at execution time. +- `service.utils` — general service-layer helpers. +- `service.model` — service-layer DTOs. + +## Gotchas + +- **`ProcessServiceImpl` is a god-class** by design; it survived multiple refactor attempts. If you are about to add a new method, consider whether a thinner, purpose-specific service would be more appropriate — but expect most historical changes to land here. +- **Service methods are transactional**: `@Transactional(rollbackFor = Exception.class)` is sprinkled on write methods. Adding a new write method without this annotation is almost always a bug. +- **Depends on RPC interfaces, not implementations**: this module pulls `dolphinscheduler-extract-master` / `-extract-worker` — it can *call* master/worker RPCs but does not implement them. The real implementations live in those server modules. +- **Cron semantics**: DolphinScheduler uses Quartz under the hood for triggering, but cron *parsing* here uses `cron-utils`. The two have slightly different DOW (day-of-week) conventions; see `CronService` for the normalization. +- **Parameter expansion order**: project params → workflow params → task params → built-in params. Changing the precedence in `CuringParamsServiceImpl` is a contract change visible to every task. + +## Configuration + +None shipped; consumers pass standard Spring profiles. + +## Tests + +Standard `src/test/java`. Some tests mock `curator-test` for registry; most are pure unit. + +## Related modules + +- `dolphinscheduler-dao` — primary dependency. +- `dolphinscheduler-spi`, `dolphinscheduler-registry-api` — plugin + service discovery. +- `dolphinscheduler-extract-master` / `-extract-worker` — RPC clients. +- `dolphinscheduler-task-api` — task-related DTOs. +- Consumers: `dolphinscheduler-api`, `dolphinscheduler-master`. diff --git a/dolphinscheduler-spi/CLAUDE.md b/dolphinscheduler-spi/CLAUDE.md new file mode 100644 index 000000000000..ed6f992d3395 --- /dev/null +++ b/dolphinscheduler-spi/CLAUDE.md @@ -0,0 +1,39 @@ +# CLAUDE.md — dolphinscheduler-spi + +Service-Provider-Interface contracts shared between the core and all plugin families (task, datasource, alert, scheduler, storage, dao, registry). Intentionally kept **tiny** so that plugin authors can depend on it without pulling in the whole backend. + +## Main package + +`org.apache.dolphinscheduler.spi` + +## Key sub-packages + +- `spi.plugin` — `PrioritySPI`, `SPIIdentify`, `PrioritySPIFactory`. The priority-loading mechanism every plugin opts into. +- `spi.datasource` — `DataSourceClient`, `BaseConnectionParam`, `DataSourceChannel`, `DataSourceChannelFactory`. Core datasource abstractions (the plugin modules implement these). +- `spi.enums` — `DbType`, `DbConnectType`, `Flag`, `ResourceType`. Cross-module enums with wire-format implications. +- `spi.params` — `PluginParamsTransfer`, `InputParam`, `RadioParam`, `SelectParam`, etc. Describe the UI form a plugin exposes to the frontend. + +## Key interfaces / classes + +- `PrioritySPI` — root marker interface for everything loaded via `PrioritySPIFactory`; subtypes return a priority so multiple implementations of the same SPI can coexist and the highest priority wins. +- `DataSourceChannelFactory` — how datasource plugins register themselves. +- `PluginParamsTransfer` — used by API + UI to render plugin configuration forms dynamically. + +## Gotchas + +- **Minimal dependencies on purpose**: only `slf4j-api` is `compile`; `dolphinscheduler-common` is `provided`. Resist the urge to add rich deps here — plugin authors consume this module transitively and each dep forces a version on them. +- **Plugin priority semantics**: higher integer wins. When you see two plugin implementations collide at runtime, check each impl's `getIdentify().getPriority()`. +- **`DbType` is exposed to end users** via stored configuration; renaming a value is a DB migration, not just a refactor. + +## Extension points + +This **is** the extension-point module. Every plugin module consumes an SPI defined here or in `*-api` sub-modules of the plugin families (`task-api`, `datasource-api`, `dao-api`, etc.). + +## Tests + +Standard `src/test/java`. + +## Related modules + +- `dolphinscheduler-common` — consumed `provided` only. +- `dolphinscheduler-task-plugin` / `dolphinscheduler-datasource-plugin` / `dolphinscheduler-storage-plugin` / `dolphinscheduler-dao-plugin` / `dolphinscheduler-scheduler-plugin` — all implement SPIs rooted here. diff --git a/dolphinscheduler-standalone-server/CLAUDE.md b/dolphinscheduler-standalone-server/CLAUDE.md new file mode 100644 index 000000000000..55c12de007f9 --- /dev/null +++ b/dolphinscheduler-standalone-server/CLAUDE.md @@ -0,0 +1,43 @@ +# CLAUDE.md — dolphinscheduler-standalone-server + +**Single-JVM** DolphinScheduler: master + worker + api + alert all embedded in one process with an **H2 in-memory** database. Intended for local development, smoke tests, and lightweight demos — **not** for production. + +## Entry point + +`StandaloneServer` — a thin `@SpringBootApplication` whose only job is to bring up the combined context. + +## Main package + +`org.apache.dolphinscheduler` + +## Configuration + +`src/main/resources/`: + +- `application.yaml` — H2 in-memory URL with `MODE=MySQL` for dialect compatibility, Spring profiles, embedded server ports. +- `logback-spring.xml`. +- `start.sh`, `jvm_args_env.sh`, `dolphinscheduler_env.sh` — startup + JVM args, packaged into the tarball. + +## Dependency scope + +**All server + plugin dependencies are declared `provided` in this module's `pom.xml`**. The standalone jar itself ships empty of those classes. At runtime they come from the classpath assembled by `dolphinscheduler-dist`. + +This means: running the jar directly (`java -jar`) without the dist's `lib/` does **not** work. Use the `start.sh` in the tarball, or run from an IDE with the reactor modules on the classpath. + +## Gotchas + +- **H2 loses state on restart** by default (in-memory URL). This is a feature for smoke tests; if you want persistence swap to file-based H2 or MySQL — and then this is no longer "standalone-server" in spirit. +- **`MODE=MySQL` on the H2 URL** is required; otherwise MyBatis-Plus generated SQL misbehaves. +- **Quartz tables (`QRTZ_*`) are auto-created** on first boot via the Quartz JDBC store. +- **Assembly excludes `*.yaml` and `*.xml` from the jar** — configs live in `conf/` in the tarball so operators can edit them. +- **Do not copy application.yaml changes from master/worker/api here blindly**. The standalone profile flattens ports and disables a few HA paths (leader election short-circuits). + +## Tests + +No `src/test/java` in this module. Standalone-server health is exercised by the UI + API test suites running against it. + +## Related modules + +- `dolphinscheduler-master`, `-worker`, `-api`, `-alert-server` — embedded. +- `-alert-all`, `-task-all`, `-datasource-all`, `-storage-all` — provided plugin bundles. +- `dolphinscheduler-dist` — packages this along with plugin jars into the tarball. diff --git a/dolphinscheduler-storage-plugin/CLAUDE.md b/dolphinscheduler-storage-plugin/CLAUDE.md new file mode 100644 index 000000000000..64dac49d8961 --- /dev/null +++ b/dolphinscheduler-storage-plugin/CLAUDE.md @@ -0,0 +1,46 @@ +# CLAUDE.md — dolphinscheduler-storage-plugin + +Plugin family for **resource storage** — where uploaded files, task resources, logs, and workflow artifacts live. Swappable between cloud blob stores and HDFS. + +**This directory is a Maven parent POM.** + +## Sub-modules + +- **`dolphinscheduler-storage-api`** — SPI: `StorageOperator`, `StorageOperatorFactory`, `AbstractStorageOperator`, `StorageType`, `StorageConfiguration`. +- **`dolphinscheduler-storage-all`** — uber bundle for all implementations. +- Concrete plugins: `-s3` (AWS S3), `-hdfs` (Hadoop HDFS), `-oss` (Aliyun), `-gcs` (Google Cloud Storage), `-abs` (Azure Blob), `-obs` (Huawei), `-cos` (Tencent). + +## SPI contract + +`StorageOperator` (the core API): + +- Path management: `getStorageBaseDirectory`, `mkdir`, `exists`, `delete`, `listStorageEntity`. +- I/O: `upload`, `download`, `fetchFileContent`. +- Tenancy: every method takes a `tenantCode`; multi-tenant isolation is baked in. + +Each concrete plugin ships a `StorageOperatorFactory` annotated with `@AutoService(StorageOperatorFactory.class)`. + +## Selection at runtime + +Only **one** storage backend is active per cluster. `StorageConfiguration` reads `resource.storage.type` from config, iterates `ServiceLoader`, matches on `StorageType`, and produces the single `StorageOperator` bean. + +Switching backends mid-life requires manual data migration — the system does not handle that. + +## Gotchas + +- **Tenant directory layout is part of the public contract**: `getStorageBaseDirectory(tenantCode)` determines where the UI, workers, and task plugins look for files. Changing the layout is a data-migration event. +- **`FileAlreadyExistsException`** semantics: `mkdir` on an existing dir throws, not no-ops. Callers must handle this — many do, but new call sites should too. +- **HDFS plugin pulls a very heavy Hadoop client tree**; exclusions in `-hdfs/pom.xml` are load-bearing. Watch out for transitive conflicts with `task-mr`, `task-spark`, `task-hivecli`. +- **S3 plugin is also exercised by the worker** for distributed-task artifact handling (not only as the resource store). This is the most battle-tested code path. +- **OBS listStorageEntity** had a bug where subdirectories were not returned — fixed recently (see commit `94bfbb048a`); if you see similar symptoms in a new plugin, compare against S3/OSS as reference impls. +- **Credentials**: cloud plugins use the SDK default chain when not explicitly configured. In AWS plugins, prefer IAM instance profile over static keys (see `dolphinscheduler-aws-authentication`). + +## Tests + +Per plugin in `src/test/java`, commonly with mocked SDK clients. A few use Testcontainers (`-s3` with LocalStack). + +## Related modules + +- `dolphinscheduler-aws-authentication` — AWS plugins' credential source. +- `dolphinscheduler-common` — utilities. +- `dolphinscheduler-worker`, `dolphinscheduler-api`, `dolphinscheduler-master` — runtime consumers. diff --git a/dolphinscheduler-task-executor/CLAUDE.md b/dolphinscheduler-task-executor/CLAUDE.md new file mode 100644 index 000000000000..89894d700532 --- /dev/null +++ b/dolphinscheduler-task-executor/CLAUDE.md @@ -0,0 +1,49 @@ +# CLAUDE.md — dolphinscheduler-task-executor + +Reusable task-execution framework. Defines how a worker **runs**, **tracks**, and **reports** a task instance — independent of any specific task type (shell, Spark, …). The worker (`dolphinscheduler-worker`) embeds this module; the task-type behavior comes from `dolphinscheduler-task-plugin`. + +## Main package + +`org.apache.dolphinscheduler.task.executor` + +## Key sub-packages + +- `task.executor.container` — execution models: `ExclusiveThreadTaskExecutorContainer` (one thread per task) and `SharedThreadTaskExecutorContainer` (thread-pooled, for lightweight tasks). The container owns the lifecycle. +- `task.executor.eventbus` — `TaskExecutorEventBus` + `TaskExecutorEventBusCoordinator`; an in-process delay bus for task-lifecycle transitions (built on `dolphinscheduler-eventbus`). +- `task.executor.listener` — lifecycle listeners (start, finish, fail, timeout, kill). +- `task.executor.operations` — operation requests/responses carried over RPC (dispatch, kill, pause, reassign). Sibling to the wire types in `dolphinscheduler-extract-worker`. +- `task.executor.dto` — task state, execution context DTOs. +- `task.executor.exceptions` — `TaskExecutionException` and friends. +- `task.executor.worker` — worker-thread implementations backing the containers. + +## Key types + +- `ITaskEngine` + `TaskEngine` — facade the worker uses to submit/control tasks. +- `TaskEngineBuilder` — constructs a configured `TaskEngine` at startup. +- `TaskExecutorRepository` — in-memory registry of running tasks. +- `AbstractTaskExecutor`, `AbstractTaskExecutorContainer` — extension points for new execution strategies. +- `TaskExecutorLifecycleEventRemoteReporter` — ships lifecycle events back to master via RPC. + +## Extension points / SPI + +Exposed interfaces (consumed by `dolphinscheduler-worker`): + +- `ITaskExecutor`, `ITaskExecutorContainer`, `ITaskExecutorContainerProvider`, `ITaskExecutorFactory`. +- `ITaskExecutorEventBusCoordinator`, `ITaskExecutorLifecycleEventListener`. +- `ITaskExecutorRepository`, `ITaskExecutorStateTracker`, `ITaskExecutorWorker`. + +## Gotchas + +- **No Spring here**. This module is a plain library with no `@Component` / `@Configuration`. All beans are wired inside `dolphinscheduler-worker`. Do not add Spring deps. +- **State transitions are table-driven** via `TaskExecutorStateMappings`. Adding a new state or transition in one place without updating the mapping table will silently drop events. +- **No `src/main/resources`** — adding config here is wrong; config belongs in the hosting server (worker). +- **Lifecycle events are the source of truth** for master's view of task state. If master thinks a task is stuck, the most likely cause is a lifecycle event that was never published from the executor here. +- **Module has no tests** — coverage comes from `dolphinscheduler-worker` integration tests. Changes here need worker-side verification. + +## Related modules + +- `dolphinscheduler-eventbus` — underlies the task-lifecycle bus. +- `dolphinscheduler-task-api` — task DTOs/contracts. +- `dolphinscheduler-common` — utilities. +- `dolphinscheduler-worker` — the one and only consumer today. +- `dolphinscheduler-master` — receives lifecycle events (via RPC) from this module's reporter. diff --git a/dolphinscheduler-task-plugin/CLAUDE.md b/dolphinscheduler-task-plugin/CLAUDE.md new file mode 100644 index 000000000000..c957422db1b7 --- /dev/null +++ b/dolphinscheduler-task-plugin/CLAUDE.md @@ -0,0 +1,60 @@ +# CLAUDE.md — dolphinscheduler-task-plugin + +Plugin family for **task types** — shell, SQL, Spark, Flink, Python, HTTP, K8s, EMR, DataX, … — executed by the worker. The runtime `dolphinscheduler-task-executor` drives lifecycle; each plugin here supplies the "what to actually do" part. + +**This directory is a Maven parent POM.** + +## Sub-modules + +### Framework + +- **`dolphinscheduler-task-api`** — the SPI and shared types. `TaskChannelFactory`, `TaskChannel`, `AbstractTask`, `TaskExecutionContext`, parameter DTOs. **Plugin authors depend on this**. +- **`dolphinscheduler-task-all`** — uber module aggregating every plugin; depended on by `dolphinscheduler-worker` so the worker can run any task type. + +### Concrete plugins (one sub-module each) + +Shell family: `task-shell`, `task-remoteshell`, `task-python`, `task-java`, `task-sql`, `task-procedure`. + +Big-data: `task-spark`, `task-flink`, `task-flink-stream`, `task-mr`, `task-hivecli`, `task-seatunnel`, `task-datax`, `task-chunjun`, `task-sqoop`, `task-linkis`. + +Cloud: `task-k8s`, `task-kubeflow`, `task-emr`, `task-emr-serverless`, `task-sagemaker`, `task-dms`, `task-datasync`, `task-datafactory`, `task-aliyunserverlessspark`. + +ML/Notebook: `task-jupyter`, `task-zeppelin`, `task-dinky`, `task-mlflow`, `task-openmldb`, `task-dvc`. + +Network: `task-http`, `task-grpc`. + +(See the directory listing for the complete live set.) + +## SPI contract + +Each plugin ships: + +1. A `TaskChannelFactory` implementation annotated with `@AutoService(TaskChannelFactory.class)`. The annotation processor generates `META-INF/services/org.apache.dolphinscheduler.plugin.task.api.TaskChannelFactory` at compile time. +2. A `TaskChannel` implementation that returns a concrete `AbstractTask` subclass given an `ITaskExecutionContext`. +3. A parameter DTO + UI-form description (via `PluginParamsTransfer`, inherited from `dolphinscheduler-spi`). + +## How plugins are discovered + +`TaskPluginManager` (in `task-api`) runs a `PrioritySPIFactory` at startup; factories register themselves in a `Map`. Startup happens inside **`dolphinscheduler-api`** (for form metadata) and **`dolphinscheduler-worker`** (for actual execution). + +## Gotchas + +- **`@AutoService`** requires the Google `auto-service` annotation processor configured in the plugin's `pom.xml`. Copy a working sibling exactly. +- **`TaskExecutionContext` carries secrets** (passwords, tokens, access keys). Never log it raw. Every plugin has its own redaction discipline — mirror the neighbors. +- **Plugin naming**: the `name()` a factory returns becomes the task type as stored in the DB. Changing it is a data-migration event. +- **Classpath isolation is not implemented**: every plugin shares the worker's classloader. Version conflicts between plugins (e.g. two plugins pulling different Jackson) must be resolved at the parent `pom.xml`. +- **Do not create a new plugin outside this directory.** The build scripts + `task-all` rely on the directory pattern. +- **Each plugin's `pom.xml` declares its own external deps with precise versions.** Prefer `dolphinscheduler-bom` for versions shared with the core; large client libraries (Spark, Flink, Hadoop) are plugin-local. + +## Tests + +Each plugin has `src/test/java` with mocked clients (e.g., mocked `SparkSubmit`, mocked `HttpClient`). End-to-end task runs are exercised by `dolphinscheduler-e2e` with real backends when possible. + +## Related modules + +- `dolphinscheduler-spi` — base SPI. +- `dolphinscheduler-task-executor` — the lifecycle framework plugins execute *inside*. +- `dolphinscheduler-worker` — loads `task-all` at runtime. +- `dolphinscheduler-api` — loads plugin metadata for UI forms (no execution). +- `dolphinscheduler-datasource-plugin` — SQL / big-data tasks consume datasource plugins. +- `dolphinscheduler-storage-plugin` — artifact storage for task inputs/outputs. diff --git a/dolphinscheduler-tools/CLAUDE.md b/dolphinscheduler-tools/CLAUDE.md new file mode 100644 index 000000000000..bdbf886da483 --- /dev/null +++ b/dolphinscheduler-tools/CLAUDE.md @@ -0,0 +1,53 @@ +# CLAUDE.md — dolphinscheduler-tools + +CLI tools shipped alongside the server. Each tool is a separate `@SpringBootApplication` main class, selected at runtime via a Spring profile passed on the command line. Wrapped by shell scripts in the tarball (`bin/`). + +## Shipped tools + +| Tool | Main class | Profile | Script | +|------|------------|---------|--------| +| Schema init / upgrade | `UpgradeDolphinScheduler` | `upgrade` | `bin/upgrade-schema.sh` | +| Lineage data migration | `MigrateLineage` | `migrate-lineage` | `bin/migrate-lineage.sh` | +| Resource data migration | `MigrateResource` | `migrate-resource` | `bin/migrate-resource.sh` | + +## Main package + +`org.apache.dolphinscheduler.tools` + +## Key sub-packages + +- `tools.datasource` — `UpgradeDolphinScheduler`, `DolphinSchedulerManager` (the upgrade brain). +- `tools.datasource.upgrader` — version-specific upgraders, one class per version bump (e.g. `V320DolphinSchedulerUpgrader`). Implements `DolphinSchedulerUpgrader`. +- `tools.lineage` — `MigrateLineage` + supporting code. +- `tools.resource` — `MigrateResource` + supporting code. + +## How schema upgrade works + +1. `DolphinSchedulerManager` checks whether the metadata DB has existing DolphinScheduler tables. +2. If empty → **init** path: runs `sql/dolphinscheduler_.sql` from `dolphinscheduler-dao/src/main/resources/sql/`. +3. If populated → **upgrade** path: inspects the current version row, runs each `DolphinSchedulerUpgrader` in sequence up to the target version. + +Upgraders are discovered by scanning the `tools.datasource.upgrader` package. + +## Gotchas + +- **Adding a release version** means adding a new `VDolphinSchedulerUpgrader` plus any DDL under `dolphinscheduler-dao/src/main/resources/sql/upgrade//`. Skipping either half silently produces a broken upgrade. +- **Upgraders must be idempotent in principle** but in practice are not — operators run them exactly once. Design accordingly; add guards only when re-running is a realistic scenario. +- **Do not reach into `dolphinscheduler-dao` entities from upgraders**. Upgrades run against older schemas where those entities may not map; use raw SQL through the dialect-aware helpers. +- **Spring profiles are required on the command line**. Running the jar with no profile boots an empty Spring context and does nothing useful. +- **Shell scripts source `dolphinscheduler_env.sh`** for env vars; when debugging, check whether the expected JDBC URL is exported. + +## Configuration + +`application.yaml` — primarily datasource config so the tool can connect. + +## Tests + +`src/test/java` uses Testcontainers (MySQL + PostgreSQL) to run end-to-end schema init + upgrade against a real DB. `SchemaUtilsTest` is the main suite. + +## Related modules + +- `dolphinscheduler-dao` — source of SQL scripts + entities (read-only here). +- `dolphinscheduler-storage-all` — needed by `MigrateResource` to read/write stored artifacts. +- `dolphinscheduler-dao-plugin` — dialect selection. +- `dolphinscheduler-dist` — packages the tool jar + scripts into the tarball. diff --git a/dolphinscheduler-ui/CLAUDE.md b/dolphinscheduler-ui/CLAUDE.md new file mode 100644 index 000000000000..dc5f9d933793 --- /dev/null +++ b/dolphinscheduler-ui/CLAUDE.md @@ -0,0 +1,73 @@ +# CLAUDE.md — dolphinscheduler-ui + +Web frontend. **Vue 3 + Vite + TypeScript**, Naive UI as the component library, AntV X6 for the DAG editor, ECharts for dashboards. Built separately from the Java reactor and bundled into the dist tarball. + +## Tech stack + +- Vue 3 (Composition API) + TypeScript. +- Build: Vite (gzip compression in prod). +- State: Pinia. +- Routing: Vue Router (5 top-level route groups). +- HTTP: axios (single wrapper in `src/service/service.ts`). +- i18n: `vue-i18n` with locale files under `src/locales/{en_US,zh_CN}/`. +- UI: Naive UI (v2.33.x), AntV X6 (DAG), D3, ECharts. + +## Recommended toolchain + +- Node **16.x** (not 18+ as of this writing). +- `pnpm` **7.x**. + +(See the module's README for current pinned versions; `.nvmrc` / `packageManager` in `package.json` are authoritative.) + +## Scripts + +``` +pnpm install +pnpm run dev # Vite dev server (default :5173), proxies /dolphinscheduler → backend +pnpm run build:prod # vue-tsc type check + Vite production build → dist/ +pnpm run lint # ESLint with --fix over .ts, .tsx, .vue +pnpm run prettier # Prettier format over src/ +``` + +The backend URL used by `pnpm run dev` is `VITE_APP_DEV_WEB_URL` in `.env.development`. Default expects `dolphinscheduler-api` on port 12345. + +## Top-level src layout + +- `assets/` — static images + fonts. +- `components/` — reusable UI parts (form widgets, data-display, DAG canvas pieces). +- `layouts/` — app shell / page chrome. +- `locales/` — i18n translations (`en_US`, `zh_CN`). +- `router/` — Vue Router config, one module per top-level feature. +- `service/` — axios instance + one file per backend resource (login, dag-menu, datasource, k8s, monitor, …). +- `store/` — Pinia stores: `user`, `project`, `locales`, `theme`, `timezone`, `route`, `ui-setting`, `file`. +- `views/` — page components: `home`, `projects`, `datasource`, `monitor`, `resource`, `security`, `login`, `profile`, `password`, `about`, `ui-setting`. +- `utils/` — helpers. + +## Backend integration + +- Axios base URL: `/dolphinscheduler` in dev (proxied), `VITE_APP_PROD_WEB_URL` in prod (usually same-origin behind a reverse proxy). +- Request interceptor injects `sessionId` header and a `language` cookie. +- Response interceptor unwraps `{ code, msg, data }`; `code != 0` throws; `401 / 504` redirect to `/login`. +- **There is no generated OpenAPI SDK**. Backend method signatures drift independently from these TypeScript wrappers — regressions usually show as 4xx / 5xx after a backend controller change. + +## i18n + +Two languages today: `en_US`, `zh_CN`. Language toggle stored in the `language` cookie (`js-cookie`). + +## Gotchas + +- **No unit tests** are configured (no `*.spec.ts` / `*.test.ts` found). End-to-end coverage comes from `dolphinscheduler-e2e`. +- **Node version drift is the #1 breakage**. Node 18/20 with modern OpenSSL breaks older webpack/vite configs — stick to the pinned Node. +- **DAG editor (AntV X6) is the single most complex view** — `views/projects/workflow/components/dag/`. Touch carefully. +- **Gzip pre-compression in Vite** is enabled for production; when debugging why a file isn't loaded, check whether the server serves `.gz` variants. +- **The built `dist/` is what `dolphinscheduler-dist` picks up**. Ensure `pnpm run build:prod` has been run before packaging from the repo. + +## Tests + +None inside this module. See `dolphinscheduler-e2e` for Selenium-driven browser tests. + +## Related modules + +- `dolphinscheduler-api` — the backend this UI calls. +- `dolphinscheduler-dist` — bundles `dist/` into the tarball under `ui/`. +- `dolphinscheduler-e2e` — tests the integrated UI+API. diff --git a/dolphinscheduler-worker/CLAUDE.md b/dolphinscheduler-worker/CLAUDE.md new file mode 100644 index 000000000000..0c98938b3486 --- /dev/null +++ b/dolphinscheduler-worker/CLAUDE.md @@ -0,0 +1,59 @@ +# CLAUDE.md — dolphinscheduler-worker + +The **Worker** server. Receives task-dispatch RPCs from the master, spins up the right task plugin, runs the physical task, and ships lifecycle events back. Multiple workers scale horizontally; each registers with the registry and advertises its worker group. + +## Entry point + +`WorkerServer` — `@SpringBootApplication`, implements `IStoppable`. + +## Main package + +`org.apache.dolphinscheduler.server.worker` + +## Key sub-packages + +- `server.worker.executor` — the bridge to `dolphinscheduler-task-executor`. `PhysicalTaskEngineDelegator`, `PhysicalTaskExecutorFactory`, `PhysicalTaskExecutorLifecycleEventReporter`. +- `server.worker.rpc` — `WorkerRpcServer` + operators implementing `IPhysicalTaskExecutorOperator`, `IStreamingTaskInstanceOperator`, `ITaskExecutorQueryClient` (all from `dolphinscheduler-extract-worker`). +- `server.worker.registry` — `WorkerRegistryClient`: ephemeral registration, health status reporting, worker-group membership. +- `server.worker.task` — task-context utilities shared by all task plugins. +- `server.worker.metrics` — Worker Micrometer metrics. +- `server.worker.config` — worker config + **load protection**. + +## Load protection + +`WorkerServerLoadProtection` watches CPU / memory / task-count and rejects new dispatches above thresholds (driven by properties in `application.yaml`). Rejected dispatches bounce back to the master, which picks a different worker. + +## How a task runs + +1. Master sends a `DispatchTaskRequest` → `WorkerRpcServer` receives. +2. `PhysicalTaskExecutorFactory` builds a `PhysicalTaskExecutor` (lives in `task-executor`) around the task plugin (from `task-plugin`). +3. `TaskExecutorContainer` runs it; lifecycle events land on the in-process bus. +4. `PhysicalTaskExecutorLifecycleEventReporter` publishes events back to master over RPC. + +## Gotchas + +- **Tasks run in the worker JVM** for most plugins (shell, SQL, Python, HTTP, …). Plugins that submit remotely (Spark, Flink, K8s, EMR) still need the worker to stay alive until the remote job completes — a worker restart mid-flight triggers the master's failover path. +- **AWS S3 integration is direct** (via `dolphinscheduler-storage-s3`) for distributed-task artifact handling. Credentials follow `dolphinscheduler-aws-authentication` rules. +- **`dolphinscheduler-yarn-aop`** is on the worker's classpath: it weaves `YarnClientImpl.submitApplication` to capture `ApplicationId`s into `appInfo.log`. If YARN-based task plugins (MR, Spark-on-YARN) lose their application ID, check AspectJ weaving is active. +- **Worker group names are not free-form** in practice: master-side scheduling decisions key off them. Coordinate naming with master operators. +- **Load protection thresholds live in `application.yaml`**, **not** in `dolphinscheduler-meter`. Meter defines the interface; worker owns the numbers. + +## Configuration + +`src/main/resources`: + +- `application.yaml` — worker resource limits, executor thread pools, task-type enable list, load-protection thresholds. +- `logback-spring.xml`, `banner.txt`. + +## Tests + +`src/test/java` — unit tests for registry, RPC, load protection, connection-state handling. End-to-end workflows are exercised by `dolphinscheduler-e2e` and the master's integration tests. + +## Related modules + +- `dolphinscheduler-task-executor` — the lifecycle framework. +- `dolphinscheduler-task-plugin` — the actual task implementations (runtime classpath via `dolphinscheduler-task-all`). +- `dolphinscheduler-extract-worker` — RPCs this module implements. +- `dolphinscheduler-extract-master` / `-extract-alert` — RPCs this module calls. +- `dolphinscheduler-yarn-aop` — AspectJ weaving on the classpath. +- `dolphinscheduler-registry-all`, `dolphinscheduler-storage-api`, `dolphinscheduler-datasource-api`. diff --git a/dolphinscheduler-yarn-aop/CLAUDE.md b/dolphinscheduler-yarn-aop/CLAUDE.md new file mode 100644 index 000000000000..7564d3969135 --- /dev/null +++ b/dolphinscheduler-yarn-aop/CLAUDE.md @@ -0,0 +1,36 @@ +# CLAUDE.md — dolphinscheduler-yarn-aop + +Tiny AspectJ module that weaves into Hadoop YARN's `YarnClientImpl.submitApplication` to capture the `ApplicationId` of every submitted YARN application. The captured IDs are written to `appInfo.log` in the working directory so that the worker (or operators) can track / kill running YARN jobs tied to a task instance. + +## Main package + +`org.apache.dolphinscheduler.aop` + +## What it actually does + +A single aspect `YarnClientAspect`: + +- `@AfterReturning` on `YarnClientImpl.submitApplication(...)` — records the returned `ApplicationId` to `appInfo.log`. +- `@AfterReturning` on `YarnClientImpl.getApplicationReport(...)` with a `cflow` predicate so the pointcut only fires when called *within* a `submitApplication` context (avoids noise from general status polling). + +## Weaving + +- **Compile-time weaving** via `aspectj-maven-plugin`. +- The produced jar is a normal jar; YARN-based task plugins (`task-mr`, `task-spark`, `task-sqoop`, …) put it on the worker classpath. +- For runtime weaving into third-party code (e.g. the YARN client loaded by Spark's classloader), load-time weaving with `-javaagent:aspectjweaver.jar` is also supported — operators may need to enable it depending on the task plugin. + +## Gotchas + +- **`appInfo.log` is written to the current working directory of the worker**. Operators who run multiple workers on one host with the same `cwd` will collide — each worker should have its own `cwd`. +- **Do not add more pointcuts here**. The surface is minimal on purpose: adding broad AspectJ pointcuts to Hadoop code is fragile across YARN versions. +- **AspectJ version is pinned in `dolphinscheduler-bom`** (`aspectj.version`, currently 1.9.7). Upgrading AspectJ is sensitive — test every YARN-based task plugin. +- **Tests use AspectJ syntax to mock YARN**. `YarnClientMoc`, `YarnClientAspectMoc`, `YarnClientMocTest` let the aspect be exercised without a real YARN cluster. + +## Tests + +`src/test/java` — AspectJ-woven mock classes verify the aspect fires correctly. + +## Related modules + +- `dolphinscheduler-worker` — runtime consumer (this jar is on the worker's classpath). +- `dolphinscheduler-task-plugin` → `task-mr`, `task-spark`, `task-sqoop`, `task-hivecli` — the YARN-submitting task plugins that benefit from this.