Skip to content

Enhancement request: Feature Store _lookup_feature_view_metadata fails to parse views created outside Python API #214

@StianSMN

Description

@StianSMN

Feature Store: _lookup_feature_view_metadata fails to parse views created outside Python API

Problem

FeatureStore.list_feature_views() and get_feature_view() fail with "Failed to parse query text for FeatureView" when the underlying view was created outside the Python API (e.g., via dbt or raw SQL), even when all required tags and metadata are correctly set.

Root Causes

1. SHOW VIEWS text column is stale after ALTER VIEW ... SET TAG

The library reads view definitions from SHOW VIEWStext column (feature_store.py, _lookup_feature_view_metadata). This column stores the original CREATE statement as-executed and is never updated by subsequent ALTER VIEW ... SET TAG commands. If a user creates a view and then adds Feature Store tags via ALTER VIEW, the text column still contains the original DDL without tags, causing the regex to fail.

In contrast, GET_DDL('VIEW', ...) returns the reconstructed current DDL including tags added via ALTER.

2. Regex _DT_OR_VIEW_QUERY_PATTERN is too strict

The regex (line ~119 in feature_store.py) has three limitations:

  • Does not handle WITH TAG: Snowflake's CREATE VIEW syntax allows both TAG (...) and WITH TAG (...) — the WITH keyword is optional per Snowflake docs. However, the regex only matches TAG, not WITH TAG. When views are created with WITH TAG (which is what GET_DDL() and the Snowflake UI "Get DDL" produce), parsing fails.

  • Only handles COMMENT ... TAG ordering: The regex expects COMMENT = '...' TAG (...), but GET_DDL() output and views created with certain SQL formatting produce TAG (...) COMMENT='...' (reversed order).

  • Requires spaces around = in COMMENT: The regex expects COMMENT = '...' (with spaces), but GET_DDL() produces COMMENT='...' (no spaces).

Current regex (~line 119):

_DT_OR_VIEW_QUERY_PATTERN = re.compile(
    r"""CREATE\ (OR\ REPLACE\ )?(?P<obj_type>(DYNAMIC\ ICEBERG\ TABLE|DYNAMIC\ TABLE|VIEW))\ .*
        COMMENT\ =\ '(?P<comment>.*)'\s*
        TAG.*?{fv_metadata_tag}\ =\ '(?P<fv_metadata>.*?)',?.*?
        AS\ (?P<query>.*)
    """.format(fv_metadata_tag=_FEATURE_VIEW_METADATA_TAG),
    flags=re.DOTALL | re.IGNORECASE | re.X,
)

Reproduction:

-- Method 1: CREATE with WITH TAG (valid Snowflake syntax, fails in Python)
CREATE OR REPLACE VIEW mydb.myschema.my_fv$1 (col1, col2)
WITH TAG (mydb.myschema.SNOWML_FEATURE_VIEW_METADATA = '{"entities": ["MY_ENTITY"], "timestamp_col": "TS", "is_tiled": false}')
COMMENT = 'my feature view'
AS SELECT * FROM my_table;

-- Method 2: CREATE then ALTER (tags not in SHOW VIEWS text)
CREATE OR REPLACE VIEW mydb.myschema.my_fv$2 (col1, col2)
COMMENT = 'my feature view'
AS SELECT * FROM my_table;
ALTER VIEW mydb.myschema."my_fv$2" SET TAG mydb.myschema.SNOWML_FEATURE_VIEW_METADATA = '{"entities": ["MY_ENTITY"], "timestamp_col": "TS", "is_tiled": false}';
fs.list_feature_views()  # RuntimeError: Failed to parse query text

Suggested Fix

  1. Fall back to GET_DDL() when SHOW VIEWS text doesn't match the regex. This handles the ALTER VIEW ... SET TAG case where the original DDL text is stale.

  2. Add additional regex patterns to handle:

  • WITH TAG (optional WITH keyword): (?:WITH\ )?TAG
  • Reversed clause order: TAG (...) COMMENT='...'
  • Flexible spacing: COMMENT\s*=\s*'

Proof of Concept

The following monkey-patch resolves all cases and has been tested successfully in a Snowflake Notebook:

import re
from snowflake.ml.feature_store import feature_store as fs_module

_STANDARD_PATTERN = fs_module._DT_OR_VIEW_QUERY_PATTERN
_FV_META_TAG = fs_module._FEATURE_VIEW_METADATA_TAG

# Pattern for GET_DDL format: TAG before COMMENT, WITH TAG, no spaces around =
_GET_DDL_PATTERN = re.compile(
    r"""CREATE\ (OR\ REPLACE\ )?(?P<obj_type>(DYNAMIC\ ICEBERG\ TABLE|DYNAMIC\ TABLE|VIEW))\ .*
        (?:WITH\ )?TAG.*?{tag}\s*=\s*'(?P<fv_metadata>.*?)'.*?\)
        \s*COMMENT\s*=\s*'(?P<comment>.*?)'\s*
        [Aa][Ss]\ (?P<query>.*)
    """.format(tag=_FV_META_TAG),
    flags=re.DOTALL | re.IGNORECASE | re.X,
)

# Pattern for COMMENT before TAG, but with optional WITH and flexible spacing
_RELAXED_COMMENT_FIRST = re.compile(
    r"""CREATE\ (OR\ REPLACE\ )?(?P<obj_type>(DYNAMIC\ ICEBERG\ TABLE|DYNAMIC\ TABLE|VIEW))\ .*
        COMMENT\s*=\s*'(?P<comment>.*?)'\s*
        (?:WITH\ )?TAG.*?{tag}\s*=\s*'(?P<fv_metadata>.*?)',?.*?
        AS\ (?P<query>.*)
    """.format(tag=_FV_META_TAG),
    flags=re.DOTALL | re.IGNORECASE | re.X,
)

def _patched_lookup(self, row, fv_name):
    if len(row["text"]) == 0:
        # Shared feature view - use tag lookup (unchanged from original)
        res = self._lookup_tags(
            domain="table", obj_name=fv_name,
            filter_fns=[lambda d: d["tagName"] == fs_module._FEATURE_VIEW_METADATA_TAG]
        )
        fv_metadata = fs_module._FeatureViewMetadata.from_json(res[0]["tagValue"])
        query = f"SELECT * FROM {self._get_fully_qualified_name(fv_name)}"
        return (fv_metadata, query)

    # Try all patterns on SHOW VIEWS text
    for pattern in [_STANDARD_PATTERN, _RELAXED_COMMENT_FIRST]:
        m = pattern.match(row["text"])
        if m:
            return (fs_module._FeatureViewMetadata.from_json(m.group("fv_metadata")), m.group("query"))

    # Fallback: GET_DDL returns current DDL including tags added via ALTER
    db = row["database_name"]
    schema = row["schema_name"]
    name = row["name"]
    fqn = f'{db}.{schema}."{name}"'
    ddl_text = self._session.sql(f"SELECT GET_DDL('VIEW', '{fqn}')").collect()[0][0]

    for pattern in [_STANDARD_PATTERN, _RELAXED_COMMENT_FIRST, _GET_DDL_PATTERN]:
        m = pattern.match(ddl_text)
        if m:
            return (fs_module._FeatureViewMetadata.from_json(m.group("fv_metadata")), m.group("query"))

    raise RuntimeError(f"Failed to parse query text for FeatureView {fv_name}")

fs_module.FeatureStore._lookup_feature_view_metadata = _patched_lookup

Impact

This limitation prevents users from managing Feature Store views via dbt, Terraform, or any SQL-based workflow. Users are forced to use the Python API exclusively to create feature views, which limits integration with existing data engineering pipelines.

Environment

  • snowflake-ml-python: 1.25.0
  • Snowflake Notebooks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions