Skip to content

[Variant] Support Shredded Objects in variant_get: access as Some(DataType::Struct) (nested shredding) #8153

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Note this is likely one of the most complex parts of implementing Shredded Variants , so it is not a good first task

Please see more commentary on

We are trying to support the general case of the variant_get function, which allows runtime dynamic access to Variants (either shredded or unshredded).

This ticket tracks
Support variant_get for Some(DataType::Struct) (nested shredding)

The idea here is that the user would specify a "shredding schema" (similar to what @friendlymatthew is sketching out in #7921) and the variant_get kernel would produce a VariantArray with the defined schema, extracting fields as necessary

Implementing this functionality will likely require the basic representation for shredded Variant arrays along with path traversal in variant_get. However, it does NOT cover the following (which are / will be broken into separate tickets)

  • Support for retrieving as a specific non Struct data type (e.g. Some(DataType::Utf8))
  • Retrieving any arbitrary path and returning what is there (no type specified)
  • Retrieving any arbitrary path as a Variant (aka "unshredding")

Describe the solution you'd like
@scovich sketched out a high level design for Shredded Objects (see Representing Variant In Arrow Proposal: "Shredding an Object" and Variant Shredding::Objects) in this PR

So roughly that means supporting

// get the named field of variant object as a typed field 
variant_get(array, "$.field_name", DataType::Struct <....>)

Where $.field_name represents some arbitrary VariantPath such as a for field "a", or a.b for field "b" of field "a"

And DataType::Struct is a "shredding schema" that reflects both value and typed_value

This should work for:

  1. Variants where the field_name is in a typed_value
  2. Variants where the field_name is not in the typed value

Describe alternatives you've considered

  1. Add a test that manually constructs a shredded variant array (follow the example in the arrow proposal)
  2. Add a test that calls variant_get appropriately
  3. Implement the code

I suggest getting this working for non-nested obejcts first, and then working on nesting / pathing as a second pR

Additional context

Reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions