Skip to content

fix(file-based): support selecting Excel worksheets#1024

Draft
devin-ai-integration[bot] wants to merge 4 commits into
mainfrom
devin/1778706873-fix-excel-multisheet
Draft

fix(file-based): support selecting Excel worksheets#1024
devin-ai-integration[bot] wants to merge 4 commits into
mainfrom
devin/1778706873-fix-excel-multisheet

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented May 13, 2026

Summary

  • Add an optional sheet_name setting to file-based Excel streams so users can keep the existing first-sheet behavior, choose a worksheet by name or zero-indexed position, or use * to read all worksheets.
  • Pass the configured worksheet into both Calamine-first parsing and the OpenPyXL fallback, and merge schemas across parsed worksheets when all sheets are selected.
  • Update the generated file-based spec expectation and add regression coverage for default first-sheet parsing, named-sheet parsing, all-sheet parsing, and multi-sheet schema inference.
  • Update FAST standard-test connector instantiation to support legacy file-based source constructors that require catalog, config, and state; read tests now pass discovered/configured runtime args into those constructors for connectors such as source-google-drive, while declarative source factories remain compatible.

Resolves https://github.com/airbytehq/oncall/issues/12598:

Review & Testing Checklist for Human

  • Confirm sheet_name values "0", worksheet names, and "*" match the desired product behavior for file-based Excel streams.
  • Confirm downstream connectors such as SharePoint Enterprise expose the updated CDK spec after consuming a CDK release with this change.
  • If desired, rerun FAST connector checks for a file-based connector that uses the legacy constructor shape, such as source-google-drive.

Declarative-First Evaluation

SharePoint Enterprise is a Python file-based CDK source, not a manifest-only connector. The required behavior happens before record-level declarative transformations because the Excel worksheet must be selected while parsing the workbook. Declarative options such as RecordFilter, AddFields, RemoveFields, paginator settings, substream routers, HTTP error handlers, and $ref overrides cannot change which Excel worksheet pandas parses.

Breaking Change Evaluation

This is not a breaking change. The new config field is optional and defaults to "0", preserving the previous pandas default of reading the first worksheet. No stream fields, primary keys, cursors, existing config fields, or state formats are removed or changed.

Test Coverage

  • poetry run pytest unit_tests/sources/file_based/file_types/test_excel_parser.py unit_tests/sources/file_based/test_file_based_scenarios.py unit_tests/test/test_standard_tests.py -q
  • poetry run ruff check airbyte_cdk/test/standard_tests/declarative_sources.py airbyte_cdk/test/standard_tests/connector_base.py airbyte_cdk/test/standard_tests/source_base.py unit_tests/test/test_standard_tests.py airbyte_cdk/sources/file_based/file_types/excel_parser.py airbyte_cdk/sources/file_based/config/excel_format.py unit_tests/sources/file_based/file_types/test_excel_parser.py unit_tests/sources/file_based/scenarios/csv_scenarios.py
  • poetry run ruff format --check airbyte_cdk/test/standard_tests/declarative_sources.py airbyte_cdk/test/standard_tests/connector_base.py airbyte_cdk/test/standard_tests/source_base.py unit_tests/test/test_standard_tests.py airbyte_cdk/sources/file_based/file_types/excel_parser.py airbyte_cdk/sources/file_based/config/excel_format.py unit_tests/sources/file_based/file_types/test_excel_parser.py unit_tests/sources/file_based/scenarios/csv_scenarios.py
  • poetry run mypy --config-file mypy.ini airbyte_cdk

Notes

CI initially failed in optional downstream connector standard tests because some file-based and declarative source factories use custom constructor shapes. The branch includes targeted standard-test compatibility fixes and unit coverage for the legacy file-based constructor shape.


Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1778706873-fix-excel-multisheet#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1778706873-fix-excel-multisheet

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

PyTest Results (Fast)

4 071 tests  +12   4 060 ✅ +12   7m 43s ⏱️ -2s
    1 suites ± 0      11 💤 ± 0 
    1 files   ± 0       0 ❌ ± 0 

Results for commit 2882774. ± Comparison against base commit d3d1346.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

PyTest Results (Full)

4 074 tests  +12   4 062 ✅ +12   11m 1s ⏱️ +5s
    1 suites ± 0      12 💤 ± 0 
    1 files   ± 0       0 ❌ ± 0 

Results for commit 2882774. ± Comparison against base commit d3d1346.

♻️ This comment has been updated with latest results.

devin-ai-integration Bot and others added 3 commits May 13, 2026 22:06
Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants