Skip to content

feat: Add complex type support (Map, JSON, Struct) with schema validation#5974

Open
ntkathole wants to merge 3 commits intofeast-dev:masterfrom
ntkathole:fix_complex_type
Open

feat: Add complex type support (Map, JSON, Struct) with schema validation#5974
ntkathole wants to merge 3 commits intofeast-dev:masterfrom
ntkathole:fix_complex_type

Conversation

@ntkathole
Copy link
Member

@ntkathole ntkathole commented Feb 16, 2026

What this PR does / why we need it:

This PR adds comprehensive support for complex data types in Feast and implements schema validation across all compute engines:

Commit 1 (970650b): Fix Map/Dict support and implement schema validation

  • Fixed Map/Dict type support across offline stores, online stores, and type mappings
  • Added enable_validation parameter to FeatureView, BatchFeatureView, and StreamFeatureView
  • Implemented schema validation nodes in Local, Ray, and Spark compute engines (missing columns, type mismatch warnings)
  • Added Spark and Milvus map type mappings
  • Updated documentation for Map type backend mappings

Commit 2 (ac28d4d): Add JSON and Struct complex data types

  • Added JSON, JSON_LIST, STRUCT, STRUCT_LIST to the protobuf ValueType enum and Value message
  • Introduced Json primitive type and Struct class (schema-aware structured type with named, typed fields)
  • Implemented full type mapping pipeline: Feast ↔ Proto ↔ PyArrow ↔ backend-native types (PostgreSQL, BigQuery, Snowflake, Redshift, Spark, MSSQL, Athena, DynamoDB, Milvus)
  • Added JSON well-formedness validation at both proto conversion level (always active) and validation node level (when enable_validation=True)
  • Implemented Spark-native type validation using from_feast_to_spark_type() and _spark_types_compatible() — eliminates the PyArrow-to-Spark type mismatch gap
  • Persisted Struct field schemas through the registry via Field tags (feast:struct_schema)
  • Added Go SDK type conversion support for JSON and Struct types
  • Added automatic serialization/deserialization for JSON type (Python dict/list ↔ JSON string)
  • Single _validate_schema() method per engine (missing columns + type checks + JSON content validation)

Commit 3 (a131b73): Modified default template with different types

  • Added Json, Map and Struct in driver template

Open with Devin

@ntkathole ntkathole self-assigned this Feb 16, 2026
@ntkathole ntkathole changed the title feat: Fix Map/Dict support and implement schema validation feat: Add complex type support (Map, JSON, Struct) with schema validation Feb 16, 2026
@ntkathole ntkathole marked this pull request as ready for review February 16, 2026 15:49
@ntkathole ntkathole requested a review from a team as a code owner February 16, 2026 15:49
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@ntkathole ntkathole force-pushed the fix_complex_type branch 2 times, most recently from a19f25a to 85287f3 Compare February 16, 2026 17:11
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

ntkathole and others added 3 commits February 18, 2026 22:54
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 20 additional findings in Devin Review.

Open in Devin Review

Comment on lines +1258 to +1259
if pa_type_as_str.startswith("map<"):
return "super"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 pa_to_redshift_value_type crashes with KeyError for JSON (large_string) and Struct PyArrow types

The pa_to_redshift_value_type function handles map< types (added in this PR at line 1258-1259) but does not handle large_string (JSON) or struct< (Struct) PyArrow types. When these types are encountered, they fall through to the type_map dict lookup at line 1281, which raises a KeyError.

Root Cause and Impact

The PR adds JSON and Struct support across backends, including mapping Redshift's json and super types to Feast types in redshift_to_feast_value_type (sdk/python/feast/type_map.py:1187-1188). However, the reverse direction (pa_to_redshift_value_type) was not updated to handle the corresponding PyArrow types:

  • pyarrow.large_string() (str representation: "large_string") is the PyArrow type for JSON
  • pyarrow.struct(...) (str representation: "struct<...>") is the PyArrow type for Struct

Neither has a matching branch, so both hit type_map[pa_type_as_str] at line 1281, causing a KeyError crash. This would occur during materialization or table creation on Redshift when feature views contain JSON or Struct columns.

For comparison, the MSSQL equivalent (pa_to_mssql_type at sdk/python/feast/type_map.py:1139-1144) correctly handles all three new complex types.

Suggested change
if pa_type_as_str.startswith("map<"):
return "super"
if pa_type_as_str.startswith("map<"):
return "super"
if pa_type_as_str == "large_string":
return "super"
if pa_type_as_str.startswith("struct<") or pa_type_as_str.startswith("struct{"):
return "super"
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments