Skip to content

ATLAS-5312: Add delete propagation support via propagateDelete flag o…#670

Draft
bhor-sanket wants to merge 2 commits into
apache:masterfrom
bhor-sanket:ATLAS-5312
Draft

ATLAS-5312: Add delete propagation support via propagateDelete flag o…#670
bhor-sanket wants to merge 2 commits into
apache:masterfrom
bhor-sanket:ATLAS-5312

Conversation

@bhor-sanket

@bhor-sanket bhor-sanket commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

…n relationship endDefs

What changes were proposed in this pull request?

Background :

Entity relationship model relevant to this change:

hive_db ──(COMPOSITION)──► hive_table ──(COMPOSITION)──► hive_column                                                                                                                                     
                                │                                                                                                                                                                        
                                │ trino_table_hive_table (ASSOCIATION — cross-system alias)                                                                                                              
                                ▼                                                                                                                                                                        
trino_schema ──(AGGREGATION)──► trino_table ──(COMPOSITION)──► trino_column                                                                                                                              
       via trino_table_schema  

Event flow — Trino CLI:

  • When a Trino user executes DROP SCHEMA sales CASCADE, the Trino hook sends a single ENTITY_DELETE_V2 notification to Atlas for the trino_schema entity (e.g., qualifiedName=hive.sales@inst1). Trino has already dropped the schema and all tables/columns under it in its catalog — the hook only reports the root entity deletion.

Event flow — Hive client:

  • When a Hive user executes DROP DATABASE sales CASCADE, Hive internally drops the database and all tables/columns under it. The Hive Atlas hook then sends multiple ENTITY_DELETE_V2 notifications — one for each entity deleted: hive_db and every hive_table under it . Atlas receives and processes each of these delete events individually.
  • However, for each deleted hive_table, any trino_table entities linked to it via the trino_table_hive_table ASSOCIATION relationship remain untouched — because the Hive hook has no knowledge of Trino entities and sends no delete event for them. These trino_table entities (and their trino_column children) become stale metadata representing assets that no longer exist in either Hive or Trino.

Problem Statement :

Problem 1 — Orphaned entities from AGGREGATION relationships (Trino schema drop):
When Trino executes DROP SCHEMA sales CASCADE, Atlas receives ENTITY_DELETE_V2 for trino_schema. Atlas deletes the schema entity but because trino_table_schema is an AGGREGATION relationship (not COMPOSITION), the trino_table entities under that schema are NOT deleted. These tables — and their trino_column children — remain as orphaned stale metadata visible in Atlas UI. The metadata no longer reflects reality in the source system.

Problem 2 — Stale cross-system aliases (Hive database/table drop):

When Hive executes DROP DATABASE sales CASCADE, Atlas correctly cascades deletes through COMPOSITION: hive_db → hive_table → hive_column. However, the trino_table entities that are aliases of those hive_tables (linked via trino_table_hive_table ASSOCIATION relationship) remain intact. These Trino entities represent the same logical asset as the deleted Hive table — they cannot meaningfully exist without their Hive source — yet Atlas leaves them as stale metadata because no Delete mechanism exists for ASSOCIATION relationships.

Combined scenario — full orphan chain:
Deleting hive_db → hive_table is deleted (COMPOSITION) → linked trino_table is NOT deleted (ASSOCIATION) → trino_column under that trino_table is also NOT deleted. Result: stale Trino metadata at multiple levels, all pointing to non-existent Hive entities.

How the patch resolves it :

  • Introduces a propagateDelete boolean flag on AtlasRelationshipEndDef (mirroring the existing propagateRename pattern). When an entity is deleted and its type's relationship endDef has propagateDelete=true, the delete cascades to related entities through the configured relationship edges. The flag is resolved at typedef-time into pre-computed deletePropagationTargets on AtlasEntityType, making runtime lookup efficient.
  • The typedef configuration IS the cascade signal — no hook-side changes required. Hooks continue sending standard ENTITY_DELETE_V2 events. The model definition declares which relationships should propagate deletes (e.g., trino_table_schema.endDef2.propagateDelete=true means "when a trino_schema is deleted, propagate to trino_table").

How was this patch tested?

Setup :

  • Configured Trino hook with Atlas running on Docker
  • Applied the patch changes on the Atlas setup

use-cases validation :

Delete Propagation Validation

  1. Drop Trino schema with CASCADE (DROP SCHEMA sales CASCADE from Trino CLI)
    - Verified that trino_schema entity is deleted
    - Verified that all trino_table entities under that schema are deleted via propagation
    - Verified that all trino_column entities under those tables are deleted (recursive cascade)
  2. Drop Hive table from Hive client (DROP TABLE default.orders)
    - Verified that hive_table entity is deleted
    - Verified that linked trino_table (alias via trino_table_hive_table) is deleted via cross-system propagation
    - Verified that trino_column entities under that trino_table are also deleted
  3. Drop Hive database with CASCADE from Hive client (DROP DATABASE sales CASCADE)
    - Verified that hive_db and all hive_table entities are deleted (reported by Hive hook)
    - Verified that linked trino_schema (alias via trino_schema_hive_db) is deleted via propagation
    - Verified that trino_table entities under that trino_schema are deleted via propagation
    - Verified that trino_column entities under those tables are also deleted
  4. Idempotent delete — entity already in DELETED state
    - Verified no error when propagation encounters an already-deleted target vertex

Regression Testing

  • Create Trino schema and tables
    • Verified creation of Trino schema, table, and Hive entities along with their relationships in Atlas
  • Drop Trino table (without schema cascade)
    • Verified deletion of only the trino_table and its columns; schema remains intact
  • Rename propagation still works
    • Verified that propagateRename functionality is unaffected after handler rename to Jump to bottom (ctrl+End) ↓ ationPropagationPatchHandler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant