feat(excel): add support for Excel file processing in VisionAgent by danyalxahid-askui · Pull Request #121 · askui/python-sdk

danyalxahid-askui · 2025-08-18T12:54:30Z

Introduced ExcelSource class to handle Excel files as input sources.
Updated AgentBase and related APIs to accept ExcelSource alongside existing image and PDF sources.
Implemented error handling for unsupported Excel processing in specific models.
Added tests for Excel file handling and processing.
Created utility functions for converting Excel content to markdown format.
Added dummy Excel file for testing purposes.

- Introduced `ExcelSource` class to handle Excel files as input sources. - Updated `AgentBase` and related APIs to accept `ExcelSource` alongside existing image and PDF sources. - Implemented error handling for unsupported Excel processing in specific models. - Added tests for Excel file handling and processing. - Created utility functions for converting Excel content to markdown format. - Added dummy Excel file for testing purposes.

adi-wan-askui

Nice work :)

pyproject.toml

src/askui/utils/excel_utils.py

src/askui/models/anthropic/messages_api.py

src/askui/models/askui/google_genai_api.py

tests/e2e/agent/test_get.py

- Changed the type hint for the `root` attribute from `bytes` to `bytes | Path` to accurately reflect that it can be either the underlying Excel bytes or a file path.

…e related references - Renamed the `ExcelSource` class to `OfficeDocumentSource` to better reflect its functionality for handling various office document types. - Updated all references to `ExcelSource` across the codebase to `OfficeDocumentSource`. - Adjusted error messages to specify "Office Document" processing instead of just "Excel". - Enhanced the `Source` type to include `OfficeDocumentSource` for broader compatibility.

- Replaced occurrences of `Img` with `InputSource` in the codebase to standardize the type used for image, file and data uri inputs across various functions and classes. - Updated the `screenshot` parameter in methods to use `InputSource | None` instead of `Img | None` for better clarity and consistency. - Removed the `Excel` and `Pdf` type definitions from their respective modules, consolidating input types under `InputSource`. - Enhanced the `__all__` exports to include the new `InputSource` type for better accessibility.

- Updated docstrings for `source` and `screenshot` parameters in `AgentBase` class to clarify the types of input sources accepted, including image, PDF, and office document files. - Improved error messages in `AnthropicMessagesApi`, `AskUiInferenceApi`, and `OpenRouterModel` classes to provide clearer context regarding unsupported PDF and office document processing. - Reformatted docstring for `InputSource` type to enhance readability.

- Eliminated the file size validation for `OfficeDocumentSource` in the `read()` method, as it was deemed unnecessary for the current implementation.

…t.py` - Modified the test case `test_get_with_xlsx_with_default_model_with_chart_data` to change the query from "What does the chart show?" to "What is the salary of John?". - Updated the assertion to check for "10000" in the response instead of "count of names" to reflect the new query context.

- Introduced new pytest fixtures `path_fixtures_docs` and `path_fixtures_dummy_doc` to provide paths for the docs directory and a dummy document, respectively. - Added a test case `test_get_with_docs_with_default_model` to verify the response from the `VisionAgent` when querying with a dummy document. - Included a dummy document `dummy.docx` for testing purposes.

- Updated the test case `test_get_with_xlsx_with_gemini_model` to change the query to "What is the salary of Doe?" for clarity. - Introduced a new test case `test_get_with_xlsx_with_gemini_model_with_response_schema` to validate the response structure using the `SalaryResponse` schema. - Added `Salary` and `SalaryResponse` classes to define the expected response format. - Included assertions to verify the correctness of the salary data returned for multiple individuals.

- Introduced a new section detailing the use of the `markitdown` library for extracting data from documents like Docs and Excel files. - Highlighted key features of `markitdown`, including LLM-friendly output, inclusion of sheet names, enhanced image descriptions, no local inference requirements, optional dependencies, and Microsoft maintenance.

- Clarified the description of the `InputSource` type to specify that it includes both images and files for `askui.VisionAgent.get()` and images for `askui.VisionAgent.locate()`.

adi-wan-askui

Nicely done 💯

danyalxahid-askui added 2 commits August 18, 2025 14:53

chore(toml): reformatting pyproject.toml

9cb5eb3

danyalxahid-askui marked this pull request as ready for review August 18, 2025 20:35

adi-wan-askui self-requested a review August 19, 2025 11:22

adi-wan-askui suggested changes Aug 20, 2025

View reviewed changes

danyalxahid-askui added 10 commits August 20, 2025 13:26

fix(excel): update type hint for root attribute in ExcelSource class

e285cb9

- Changed the type hint for the `root` attribute from `bytes` to `bytes | Path` to accurately reflect that it can be either the underlying Excel bytes or a file path.

refactor: remove redundant file size check in AskUiGoogleGenAiApi

cfe4c65

- Eliminated the file size validation for `OfficeDocumentSource` in the `read()` method, as it was deemed unnecessary for the current implementation.

docs(source_utils): improve docstring for InputSource type

104e62b

- Clarified the description of the `InputSource` type to specify that it includes both images and files for `askui.VisionAgent.get()` and images for `askui.VisionAgent.locate()`.

danyalxahid-askui requested a review from adi-wan-askui August 20, 2025 14:19

adi-wan-askui approved these changes Aug 20, 2025

View reviewed changes

adi-wan-askui merged commit 82806bf into main Aug 20, 2025
1 check passed

adi-wan-askui deleted the CL-1675-release-2025-09-python-extract-data-from-excel branch August 20, 2025 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(excel): add support for Excel file processing in VisionAgent#121

feat(excel): add support for Excel file processing in VisionAgent#121
adi-wan-askui merged 12 commits intomainfrom
CL-1675-release-2025-09-python-extract-data-from-excel

danyalxahid-askui commented Aug 18, 2025

Uh oh!

adi-wan-askui left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adi-wan-askui left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danyalxahid-askui commented Aug 18, 2025

Uh oh!

adi-wan-askui left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adi-wan-askui left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants