Fixes #24559: Add Greenplum 7 partition support#27288
Fixes #24559: Add Greenplum 7 partition support#27288RajdeepKushwaha5 wants to merge 8 commits intoopen-metadata:mainfrom
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
There was a problem hiding this comment.
Pull request overview
Adds Greenplum 7 compatibility to the ingestion connector by detecting the Greenplum major version at runtime and switching catalog queries to PostgreSQL 10+ declarative partitioning equivalents, preventing failures caused by removed GP6 catalog tables.
Changes:
- Implement Greenplum major version detection via
SELECT version()with per-source caching and GP6/GP7 query branching. - Introduce Greenplum 7+ table listing and partition metadata queries (while preserving the existing GP6 query path).
- Add unit tests covering version detection, caching/fallback behavior, and GP6/GP7 partition detail parsing.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| ingestion/src/metadata/ingestion/source/database/greenplum/metadata.py | Adds version detection + conditional logic to choose GP6 vs GP7+ query paths for table listing and partition details. |
| ingestion/src/metadata/ingestion/source/database/greenplum/queries.py | Splits queries into GP6/GP7 variants; adds SELECT version() query and GP7+ partition/table-name queries. |
| ingestion/tests/unit/topology/database/test_greenplum.py | Adds unit tests for version parsing/caching/fallback and partition details for GP6/GP7 paths. |
160fdce to
df11c66
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
@RajdeepKushwaha5 address the review comments raised here #27288 (comment) |
df11c66 to
57a7949
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
57a7949 to
f901064
Compare
244ba77 to
e831870
Compare
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
e831870 to
3c4bbf3
Compare
…tion keys When _build_partition_result receives rows from the partition catalog but all column_name values are NULL (expression-based partition keys), the table IS partitioned - we just cannot surface the column details yet. Previously returned (False, None) which caused expression-key partitioned tables to be ingested as TableType.Regular. Now returns (True, None) so they are correctly ingested as TableType.Partitioned. Updated test name and assertion to reflect the corrected behavior. Addresses Copilot review comment.
… queries Qualify pg_class, pg_namespace, and pg_partitioned_table with pg_catalog. in both GREENPLUM_PARTITION_DETAILS_V6 and GREENPLUM_PARTITION_DETAILS_V7 to avoid search_path dependency and prevent potential name shadowing by user-defined objects with the same names. Addresses Copilot review comment.
…7-partition-support # Conflicts: # ingestion/src/metadata/ingestion/source/database/greenplum/metadata.py
| " SELECT version() output, defaulting to 7" | ||
| ) | ||
| logger.debug("Full version() output: %s", version_string) | ||
| self._greenplum_version = 7 | ||
| except SQLAlchemyError as exc: | ||
| logger.debug(traceback.format_exc()) | ||
| logger.warning( | ||
| "Could not determine Greenplum version (%s), defaulting to 7", | ||
| exc, | ||
| ) | ||
| self._greenplum_version = 7 |
There was a problem hiding this comment.
The version-detection fallback sets _greenplum_version = 7 when SELECT version() is unparseable or raises SQLAlchemyError. That can cause a regression for Greenplum 6 clusters in the (rare but possible) case where version detection fails while the existing self.connection remains usable, because the GP7 table/partition queries reference catalogs/columns that don’t exist on GP6. Consider a safer fallback strategy (e.g., try the GP7 query and on UndefinedTable/UndefinedColumn fall back to GP6, or default to GP6 unless GP7 is positively detected).
| " SELECT version() output, defaulting to 7" | |
| ) | |
| logger.debug("Full version() output: %s", version_string) | |
| self._greenplum_version = 7 | |
| except SQLAlchemyError as exc: | |
| logger.debug(traceback.format_exc()) | |
| logger.warning( | |
| "Could not determine Greenplum version (%s), defaulting to 7", | |
| exc, | |
| ) | |
| self._greenplum_version = 7 | |
| " SELECT version() output, defaulting conservatively to 6" | |
| ) | |
| logger.debug("Full version() output: %s", version_string) | |
| self._greenplum_version = 6 | |
| except SQLAlchemyError as exc: | |
| logger.debug(traceback.format_exc()) | |
| logger.warning( | |
| "Could not determine Greenplum version (%s), defaulting conservatively to 6", | |
| exc, | |
| ) | |
| self._greenplum_version = 6 |
| table_names_query = ( | ||
| GREENPLUM_GET_TABLE_NAMES_V7 | ||
| if self._is_v7_or_later() | ||
| else GREENPLUM_GET_TABLE_NAMES_V6 | ||
| ) | ||
| result = self.connection.execute( | ||
| sql.text(GREENPLUM_GET_TABLE_NAMES), | ||
| sql.text(table_names_query), | ||
| {"schema": schema_name}, | ||
| ) |
There was a problem hiding this comment.
query_table_names_and_types now switches between GP6 and GP7 table-listing queries based on _is_v7_or_later(), but there’s no unit test asserting that the correct SQL is chosen/executed for each major version. Adding a small test that sets _greenplum_version to 6/7 and verifies self.connection.execute receives the expected query string would help prevent regressions in the table discovery path (the main failure reported in #24559).
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
|
…-listing query branch tests Address reviewer feedback on PR open-metadata#27288: - Default to Greenplum 6 (conservative) when version detection fails or output is unparseable, since GP7 queries reference catalogs/columns that do not exist on GP6 (was defaulting to 7, which could regress GP6 clusters). - Add unit tests verifying query_table_names_and_types selects GREENPLUM_GET_TABLE_NAMES_V6 vs GREENPLUM_GET_TABLE_NAMES_V7 based on detected major version, covering the table discovery path that originally failed in open-metadata#24559. - Apply ruff format to satisfy py-checkstyle CI.
Code Review ✅ Approved 4 resolved / 4 findingsGreenplum 7 partition support added, with resolutions for SQLAlchemy 2.0 deprecations, potential NoneType errors in regex, and identifier quoting issues. No open issues remain. ✅ 4 resolved✅ Bug:
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
| }, | ||
| } | ||
|
|
||
| PartitionRow = namedtuple("PartitionRow", ["schema", "table_name", "partition_strategy", "column_name"]) |
There was a problem hiding this comment.
This namedtuple(...) assignment is long enough to likely exceed the project's formatter/linter line-length limit. Please wrap it across multiple lines (or switch to typing.NamedTuple with one field per line) so black/CI formatting checks don't fail.
| PartitionRow = namedtuple("PartitionRow", ["schema", "table_name", "partition_strategy", "column_name"]) | |
| PartitionRow = namedtuple( | |
| "PartitionRow", | |
| ["schema", "table_name", "partition_strategy", "column_name"], | |
| ) |
| """Expression-based partition keys yield NULL column_name rows; the | ||
| table should still be reported as partitioned (is_partitioned=True) | ||
| because a row exists in the partitioning catalog, even though we | ||
| cannot resolve the column names yet. partition details are None.""" |
There was a problem hiding this comment.
Docstring has minor grammar/formatting issues (extra double-space and sentence casing: "yet. partition details..."). Please clean this up for readability.
| cannot resolve the column names yet. partition details are None.""" | |
| cannot resolve the column names yet. Partition details are None.""" |



PR Description
Describe your changes:
Fixes #24559
Problem:
Greenplum 7 removed the GP-specific catalog tables
pg_partitionandpg_partition_rulethat the connector relied on for table listing and partition detection. This causedUndefinedTableerrors when ingesting metadata from GP7 instances.Root Cause:
GP7 adopted PostgreSQL 10+ declarative partitioning, replacing:
pg_partition/pg_partition_rule→pg_partitioned_table+relispartitioncolumn inpg_classparkind/paratts→partstrat/partattrsSolution:
Added runtime version detection and dual-path query support:
SELECT version()— parsesGreenplum Database X.Y.Zfrom the output using regex (note:SHOW server_versionreturns the underlying PostgreSQL version, not Greenplum's, so it cannot be used)pg_partition_ruleJOIN for table names,pg_partitionwithparkind/parattsfor partition detailsrelispartition = falsefor table names,pg_partitioned_tablewithpartstrat/partattrsfor partition detailsTesting:
andruche/greenplum:7):SELECT version()correctly returnsGreenplum Database 7.0.0pg_partitionandpg_partition_ruledo NOT exist in GP7pg_partitioned_tableEXISTS in GP7Files changed:
ingestion/src/metadata/ingestion/source/database/greenplum/queries.py— added V7 queries andSELECT version()queryingestion/src/metadata/ingestion/source/database/greenplum/metadata.py— added version detection, query branching, V6/V7 partition detail methodsingestion/tests/unit/topology/database/test_greenplum.py— added 8 new tests (version detection + partition details)Type of change:
Checklist:
Fixes <issue-number>: <short explanation>Summary by Gitar
_build_partition_resulthelper to handle expression-based partition keys, ensuring tables are correctly marked as partitioned.SQLAlchemyErrorhandling during version detection to ensure graceful fallback to GP6.pg_catalogschema-qualification to partition queries to ensure consistent metadata resolution.This will update automatically on new commits.