feat: CSV array type support via string-to-list post-processing#472
Open
shirly121 wants to merge 5 commits into
Open
feat: CSV array type support via string-to-list post-processing#472shirly121 wants to merge 5 commits into
shirly121 wants to merge 5 commits into
Conversation
…sing - CSV ConvertOptions: override list columns to large_utf8 for Arrow CSV Reader - ArrowTypeCaster framework: string->list type conversion for full_read and batch_read - ArrowTypeCaster supports nested lists, dates, timestamps, intervals - CSV-safe schema for projection (createCsvSafeSchema) - Add C++ unit tests for CSV array reading (test_reader.cc) - Add C++ sniffer test for list-like column inference (test_sniffer.cc) - Add Python end-to-end tests for LOAD FROM CSV with CAST to array types - Add Python sniffer test for CAST to list type (xfail) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ArrowArrayContextColumn::get_elem() did not support arrow::Type::LIST, causing "Unsupported arrow type: list<item: float>" when executing CAST(col, 'FLOAT[]') on CSV-loaded data. Extract scalar value logic into get_arrow_scalar_value() helper and add recursive list/fixed-size list handling for both get_elem() and nested list elements. Also remove xfail marker from test_csv_cast_list since it now passes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…assertions Add early return in Database.close() when _database is already None, preventing "I/O operation on closed file" from __del__ during interpreter shutdown. Also replace string-based assertions in test_load_array with exact type comparisons (datetime.date, datetime.datetime). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ove unused castBatch - Remove duplicated LIST/FIXED_SIZE_LIST handling in get_elem, delegate to get_arrow_scalar_value which already covers all types - Remove unused ArrowTypeCaster::castBatch (replaced by LazyTypeCastRecordBatch) - Fix clang-format 10.0.1 formatting in options.h Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tamp in appendScalarValue Replace neug::common::Date::fromCString and neug::common::Timestamp::fromCString with neug::Date and neug::DateTime from utils/property/types.h, aligning with the project's standard type system. Adjust timestamp unit conversions accordingly (milli_second base instead of microsecond base). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
COPY FROMwith array-typed columnsappendScalarValueto useneug::Date/DateTimeand simplifyget_elemby reusingget_arrow_scalar_valueDatabase.close()against double-close errorsChanges
src/utils/reader/arrow_type_cast.cc+include/neug/utils/reader/arrow_type_cast.h: New string-to-list type cast logic for CSV array parsingsrc/utils/reader/reader.cc: Integrate post-processing step after CSV read to convert string columns to list typessrc/utils/reader/options.cc+include/neug/utils/reader/options.h: Add list type options for CSV formatsrc/execution/common/columns/arrow_context_column.cc: Add list type handling in Arrow context columntools/python_bind/neug/database.py: Guard against double-close inDatabase.close()tests/: Add comprehensive C++ and Python tests for array loading and snifferTest plan
test_load_array.pywith 250+ lines covering various array type scenariostest_reader.ccwith C++ unit tests for arrow type cast🤖 Generated with Claude Code
Fixes #441