fix(file-based): support selecting Excel worksheets#1024
Draft
devin-ai-integration[bot] wants to merge 4 commits into
Draft
fix(file-based): support selecting Excel worksheets#1024devin-ai-integration[bot] wants to merge 4 commits into
devin-ai-integration[bot] wants to merge 4 commits into
Conversation
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1778706873-fix-excel-multisheet#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1778706873-fix-excel-multisheetPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sheet_namesetting to file-based Excel streams so users can keep the existing first-sheet behavior, choose a worksheet by name or zero-indexed position, or use*to read all worksheets.catalog,config, andstate; read tests now pass discovered/configured runtime args into those constructors for connectors such assource-google-drive, while declarative source factories remain compatible.Resolves https://github.com/airbytehq/oncall/issues/12598:
Review & Testing Checklist for Human
sheet_namevalues"0", worksheet names, and"*"match the desired product behavior for file-based Excel streams.source-google-drive.Declarative-First Evaluation
SharePoint Enterprise is a Python file-based CDK source, not a manifest-only connector. The required behavior happens before record-level declarative transformations because the Excel worksheet must be selected while parsing the workbook. Declarative options such as
RecordFilter,AddFields,RemoveFields, paginator settings, substream routers, HTTP error handlers, and$refoverrides cannot change which Excel worksheet pandas parses.Breaking Change Evaluation
This is not a breaking change. The new config field is optional and defaults to
"0", preserving the previous pandas default of reading the first worksheet. No stream fields, primary keys, cursors, existing config fields, or state formats are removed or changed.Test Coverage
poetry run pytest unit_tests/sources/file_based/file_types/test_excel_parser.py unit_tests/sources/file_based/test_file_based_scenarios.py unit_tests/test/test_standard_tests.py -qpoetry run ruff check airbyte_cdk/test/standard_tests/declarative_sources.py airbyte_cdk/test/standard_tests/connector_base.py airbyte_cdk/test/standard_tests/source_base.py unit_tests/test/test_standard_tests.py airbyte_cdk/sources/file_based/file_types/excel_parser.py airbyte_cdk/sources/file_based/config/excel_format.py unit_tests/sources/file_based/file_types/test_excel_parser.py unit_tests/sources/file_based/scenarios/csv_scenarios.pypoetry run ruff format --check airbyte_cdk/test/standard_tests/declarative_sources.py airbyte_cdk/test/standard_tests/connector_base.py airbyte_cdk/test/standard_tests/source_base.py unit_tests/test/test_standard_tests.py airbyte_cdk/sources/file_based/file_types/excel_parser.py airbyte_cdk/sources/file_based/config/excel_format.py unit_tests/sources/file_based/file_types/test_excel_parser.py unit_tests/sources/file_based/scenarios/csv_scenarios.pypoetry run mypy --config-file mypy.ini airbyte_cdkNotes
CI initially failed in optional downstream connector standard tests because some file-based and declarative source factories use custom constructor shapes. The branch includes targeted standard-test compatibility fixes and unit coverage for the legacy file-based constructor shape.
Devin session