adding source checks and source_text option by djarecka · Pull Request #67 · sensein/structsense

djarecka · 2026-02-20T15:12:07Z

@tekrajchhetri - this is just the beginning of the PR, but I want you to know what I'm thinking about and give me your feedback. As mentioned in #62, I believe that if source doesn't check if the file is an existing file path can lead to running structsense on the name of the path instead of real file by mistake (it definitely happened to me when running the tutorial).

I suggest (and started changing) that:

we have a separate argument source_text if we want to provide the text instead of file,
check if the path exist when source provided
run process_input_data only for files before passing it to StructSenseFlow

Also, I noticed that you say that source can be a folder, but I don't think you support this option, am I right?

btw. I just noticed that there are some additional changes of the formatting due to the pre-commit, I could try to later remove it, but we definitely should run pre-commit on everything once #62 is merged

…into ups/improvement

…if it is an existing path; adding click.Paths to all options that should be existing texts

gemini-code-assist · 2026-02-20T15:12:28Z

Summary of Changes

Hello @djarecka, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors how input sources are handled within the structsense application, particularly for its command-line interface. It introduces a clear distinction between providing raw text directly and specifying a file path, enhancing robustness by validating file existence and streamlining the internal input processing logic. These changes aim to prevent common user errors and improve the overall clarity and reliability of input handling.

Highlights

Input Source Handling Refinement: Introduced a dedicated --source_text CLI option for direct text input, separating it from the --source option which now strictly expects a file path.
File Path Validation: Enhanced the CLI to validate that the path provided to --source exists, preventing structsense from attempting to process non-existent file names as content.
Refactored Input Processing: The process_input_data utility function was streamlined to exclusively handle file-based inputs, with the responsibility of determining input type (file vs. text) and path validation moved to the CLI layer.
Code Formatting: Applied extensive code formatting across app.py, cli.py, and utils.py, likely due to pre-commit hooks, improving readability and consistency.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

src/structsense/app.py
- Updated StructSenseFlow.__init__ to accept source_text and commented out the direct call to process_input_data for input_source.
- Applied extensive code formatting, including line breaks and spacing, to various sections.
src/structsense/cli.py
- Modified the extract command to introduce a new --source_text option for direct text input.
- Changed the --source option to specifically accept a file path and added validation to ensure only one of --source or --source_text is provided.
- Integrated process_input_data call conditionally for file-based --source input.
- Updated option types for --config and --env_file to click.Path(exists=True).
- Applied formatting adjustments to function signatures and imports.
src/utils/utils.py
- Refactored process_input_data to remove logic for detecting raw text and handling non-existent paths, making it solely responsible for reading content from an existing file path.
- Adjusted type hints for source_path in process_input_data.
- Applied minor formatting changes to str_to_bool and check_ollama_health.

Activity

The pull request is currently a Work In Progress (WIP).
The author, djarecka, is actively seeking feedback from @tekrajchhetri.
The author noted that some additional formatting changes were introduced by pre-commit hooks, which they might try to remove later, indicating a focus on functional changes first.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces a separate source_text option and adds path existence checks for the source argument in the CLI, which improves input validation. However, there is a critical bug in StructSenseFlow.__init__ where the variable processed_data is used but its definition has been commented out, leading to a NameError. Additionally, the run_agent command in the CLI remains inconsistent with the updated extract command, and error handling in process_input_data has been reduced. Please address the NameError in app.py as a priority.

src/structsense/app.py

src/structsense/cli.py

src/utils/utils.py

tekrajchhetri · 2026-02-20T16:36:57Z

@tekrajchhetri - this is just the beginning of the PR, but I want you to know what I'm thinking about and give me your feedback. As mentioned in #62, I believe that if source doesn't check if the file is an existing file path can lead to running structsense on the name of the path instead of real file by mistake (it definitely happened to me when running the tutorial).

I suggest (and started changing) that:

we have a separate argument source_text if we want to provide the text instead of file,

check if the path exist when source provided

run process_input_data only for files before passing it to StructSenseFlow

Also, I noticed that you say that source can be a folder, but I don't think you support this option, am I right?

btw. I just noticed that there are some additional changes of the formatting due to the pre-commit, I could try to later remove it, but we definitely should run pre-commit on everything once #62 is merged

We do not support batch content, i.e., like all files in folder but we do support text content (raw) + txt file. all passed with source argument. Initially, we had different argument but we went back.

djarecka · 2026-02-20T17:56:00Z

@tekrajchhetri - I was asking about teh folder, since that was in the description, but will removed.

Could you please comment on the idea of separating the files from the text to avoid confusion. If you prefer we can introduc source_type with options ["file", "text"].

tekrajchhetri · 2026-02-20T20:13:22Z

@djarecka I have no preference on it. Either way is fine.

…into add_source_checks

…es the files that are passed with source argument; removing the text processing from StructSenseFlow and doing the processing before; updating cli.run_agent

… to api (but keep it in a separate function); adding the same arguments to StructSenseFlow as in cli: source and source_text

…ssed properly; removing src/tests from gitignore

codecov-commenter · 2026-02-26T03:44:05Z

Codecov Report

❌ Patch coverage is 52.63158% with 72 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (improvement@9c0c9d3). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/utils/utils.py	23.80%	48 Missing ⚠️
src/structsense/app.py	28.57%	15 Missing ⚠️
src/structsense/cli.py	64.00%	9 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             improvement      #67   +/-   ##
==============================================
  Coverage               ?   13.78%           
==============================================
  Files                  ?       21           
  Lines                  ?     4920           
  Branches               ?        0           
==============================================
  Hits                   ?      678           
  Misses                 ?     4242           
  Partials               ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tekrajchhetri

@djarecka could you also update the readme so that it aligns with the changes made?

djarecka · 2026-03-02T17:56:09Z

@djarecka could you also update the readme so that it aligns with the changes made?

which one? I updated at least one Readme

djarecka added 4 commits February 10, 2026 19:06

Merge branch 'improvement' of https://github.com/sensein/structsense …

30eabaa

…into ups/improvement

Merge branch 'improvement' of https://github.com/sensein/structsense …

4094d08

…into ups/improvement

Merge branch 'improvement' of https://github.com/sensein/structsense …

2498908

…into ups/improvement

[wip] adding source_text option for text and adding check for source …

33927b3

…if it is an existing path; adding click.Paths to all options that should be existing texts

gemini-code-assist bot reviewed Feb 20, 2026

View reviewed changes

src/structsense/app.py Outdated Show resolved Hide resolved

src/structsense/cli.py Show resolved Hide resolved

src/utils/utils.py Outdated Show resolved Hide resolved

djarecka added 7 commits February 23, 2026 10:16

Merge branch 'improvement' of https://github.com/sensein/structsense …

d659a70

…into add_source_checks

changing process_input_source to process_files, since it only process…

f5ee673

…es the files that are passed with source argument; removing the text processing from StructSenseFlow and doing the processing before; updating cli.run_agent

finishing changes to cli.run_agent

edb5fee

moving back processing to the StructSenseFlow to minimize the changes…

0f335c9

… to api (but keep it in a separate function); adding the same arguments to StructSenseFlow as in cli: source and source_text

updating docs and tutorial

36ae0a4

adding simple tests to check that the source and source_text is proce…

3eaaafd

…ssed properly; removing src/tests from gitignore

update poetry.lock

65f7ec0

djarecka changed the title ~~[wip] adding source checks and source_text option~~ adding source checks and source_text option Feb 26, 2026

djarecka requested a review from tekrajchhetri February 26, 2026 03:41

djarecka mentioned this pull request Feb 27, 2026

Add NER tests #79

Open

tekrajchhetri requested changes Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding source checks and source_text option#67

adding source checks and source_text option#67
djarecka wants to merge 11 commits intosensein:improvementfrom
djarecka:add_source_checks

djarecka commented Feb 20, 2026

Uh oh!

gemini-code-assist bot commented Feb 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tekrajchhetri commented Feb 20, 2026

Uh oh!

djarecka commented Feb 20, 2026

Uh oh!

tekrajchhetri commented Feb 20, 2026

Uh oh!

codecov-commenter commented Feb 26, 2026

Uh oh!

tekrajchhetri left a comment

Uh oh!

djarecka commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

djarecka commented Feb 20, 2026

Uh oh!

gemini-code-assist bot commented Feb 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tekrajchhetri commented Feb 20, 2026

Uh oh!

djarecka commented Feb 20, 2026

Uh oh!

tekrajchhetri commented Feb 20, 2026

Uh oh!

codecov-commenter commented Feb 26, 2026

Codecov Report

Uh oh!

tekrajchhetri left a comment

Choose a reason for hiding this comment

Uh oh!

djarecka commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants