feat: Add complex type support (Map, JSON, Struct) with schema validation#5974
feat: Add complex type support (Map, JSON, Struct) with schema validation#5974ntkathole wants to merge 3 commits intofeast-dev:masterfrom
Conversation
05522ce to
970650b
Compare
ac28d4d to
286ea19
Compare
a19f25a to
85287f3
Compare
85287f3 to
32d47b6
Compare
32d47b6 to
d721030
Compare
d721030 to
ac1e138
Compare
ac1e138 to
dca70ae
Compare
dca70ae to
065855d
Compare
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
a131b73 to
75c239f
Compare
| if pa_type_as_str.startswith("map<"): | ||
| return "super" |
There was a problem hiding this comment.
🔴 pa_to_redshift_value_type crashes with KeyError for JSON (large_string) and Struct PyArrow types
The pa_to_redshift_value_type function handles map< types (added in this PR at line 1258-1259) but does not handle large_string (JSON) or struct< (Struct) PyArrow types. When these types are encountered, they fall through to the type_map dict lookup at line 1281, which raises a KeyError.
Root Cause and Impact
The PR adds JSON and Struct support across backends, including mapping Redshift's json and super types to Feast types in redshift_to_feast_value_type (sdk/python/feast/type_map.py:1187-1188). However, the reverse direction (pa_to_redshift_value_type) was not updated to handle the corresponding PyArrow types:
pyarrow.large_string()(str representation:"large_string") is the PyArrow type for JSONpyarrow.struct(...)(str representation:"struct<...>") is the PyArrow type for Struct
Neither has a matching branch, so both hit type_map[pa_type_as_str] at line 1281, causing a KeyError crash. This would occur during materialization or table creation on Redshift when feature views contain JSON or Struct columns.
For comparison, the MSSQL equivalent (pa_to_mssql_type at sdk/python/feast/type_map.py:1139-1144) correctly handles all three new complex types.
| if pa_type_as_str.startswith("map<"): | |
| return "super" | |
| if pa_type_as_str.startswith("map<"): | |
| return "super" | |
| if pa_type_as_str == "large_string": | |
| return "super" | |
| if pa_type_as_str.startswith("struct<") or pa_type_as_str.startswith("struct{"): | |
| return "super" |
Was this helpful? React with 👍 or 👎 to provide feedback.
What this PR does / why we need it:
This PR adds comprehensive support for complex data types in Feast and implements schema validation across all compute engines:
Commit 1 (
970650b): Fix Map/Dict support and implement schema validationenable_validationparameter toFeatureView,BatchFeatureView, andStreamFeatureViewCommit 2 (
ac28d4d): Add JSON and Struct complex data typesJSON,JSON_LIST,STRUCT,STRUCT_LISTto the protobufValueTypeenum andValuemessageJsonprimitive type andStructclass (schema-aware structured type with named, typed fields)enable_validation=True)from_feast_to_spark_type()and_spark_types_compatible()— eliminates the PyArrow-to-Spark type mismatch gapfeast:struct_schema)_validate_schema()method per engine (missing columns + type checks + JSON content validation)Commit 3 (
a131b73): Modified default template with different types