Skip to content

adding source checks and source_text option#67

Open
djarecka wants to merge 11 commits intosensein:improvementfrom
djarecka:add_source_checks
Open

adding source checks and source_text option#67
djarecka wants to merge 11 commits intosensein:improvementfrom
djarecka:add_source_checks

Conversation

@djarecka
Copy link
Contributor

@tekrajchhetri - this is just the beginning of the PR, but I want you to know what I'm thinking about and give me your feedback. As mentioned in #62, I believe that if source doesn't check if the file is an existing file path can lead to running structsense on the name of the path instead of real file by mistake (it definitely happened to me when running the tutorial).

I suggest (and started changing) that:

  • we have a separate argument source_text if we want to provide the text instead of file,
  • check if the path exist when source provided
  • run process_input_data only for files before passing it to StructSenseFlow

Also, I noticed that you say that source can be a folder, but I don't think you support this option, am I right?

btw. I just noticed that there are some additional changes of the formatting due to the pre-commit, I could try to later remove it, but we definitely should run pre-commit on everything once #62 is merged

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @djarecka, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors how input sources are handled within the structsense application, particularly for its command-line interface. It introduces a clear distinction between providing raw text directly and specifying a file path, enhancing robustness by validating file existence and streamlining the internal input processing logic. These changes aim to prevent common user errors and improve the overall clarity and reliability of input handling.

Highlights

  • Input Source Handling Refinement: Introduced a dedicated --source_text CLI option for direct text input, separating it from the --source option which now strictly expects a file path.
  • File Path Validation: Enhanced the CLI to validate that the path provided to --source exists, preventing structsense from attempting to process non-existent file names as content.
  • Refactored Input Processing: The process_input_data utility function was streamlined to exclusively handle file-based inputs, with the responsibility of determining input type (file vs. text) and path validation moved to the CLI layer.
  • Code Formatting: Applied extensive code formatting across app.py, cli.py, and utils.py, likely due to pre-commit hooks, improving readability and consistency.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/structsense/app.py
    • Updated StructSenseFlow.__init__ to accept source_text and commented out the direct call to process_input_data for input_source.
    • Applied extensive code formatting, including line breaks and spacing, to various sections.
  • src/structsense/cli.py
    • Modified the extract command to introduce a new --source_text option for direct text input.
    • Changed the --source option to specifically accept a file path and added validation to ensure only one of --source or --source_text is provided.
    • Integrated process_input_data call conditionally for file-based --source input.
    • Updated option types for --config and --env_file to click.Path(exists=True).
    • Applied formatting adjustments to function signatures and imports.
  • src/utils/utils.py
    • Refactored process_input_data to remove logic for detecting raw text and handling non-existent paths, making it solely responsible for reading content from an existing file path.
    • Adjusted type hints for source_path in process_input_data.
    • Applied minor formatting changes to str_to_bool and check_ollama_health.
Activity
  • The pull request is currently a Work In Progress (WIP).
  • The author, djarecka, is actively seeking feedback from @tekrajchhetri.
  • The author noted that some additional formatting changes were introduced by pre-commit hooks, which they might try to remove later, indicating a focus on functional changes first.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a separate source_text option and adds path existence checks for the source argument in the CLI, which improves input validation. However, there is a critical bug in StructSenseFlow.__init__ where the variable processed_data is used but its definition has been commented out, leading to a NameError. Additionally, the run_agent command in the CLI remains inconsistent with the updated extract command, and error handling in process_input_data has been reduced. Please address the NameError in app.py as a priority.

@tekrajchhetri
Copy link
Collaborator

@tekrajchhetri - this is just the beginning of the PR, but I want you to know what I'm thinking about and give me your feedback. As mentioned in #62, I believe that if source doesn't check if the file is an existing file path can lead to running structsense on the name of the path instead of real file by mistake (it definitely happened to me when running the tutorial).

I suggest (and started changing) that:

  • we have a separate argument source_text if we want to provide the text instead of file,
  • check if the path exist when source provided
  • run process_input_data only for files before passing it to StructSenseFlow

Also, I noticed that you say that source can be a folder, but I don't think you support this option, am I right?

btw. I just noticed that there are some additional changes of the formatting due to the pre-commit, I could try to later remove it, but we definitely should run pre-commit on everything once #62 is merged

We do not support batch content, i.e., like all files in folder but we do support text content (raw) + txt file. all passed with source argument. Initially, we had different argument but we went back.

@djarecka
Copy link
Contributor Author

@tekrajchhetri - I was asking about teh folder, since that was in the description, but will removed.

Could you please comment on the idea of separating the files from the text to avoid confusion. If you prefer we can introduc source_type with options ["file", "text"].

@tekrajchhetri
Copy link
Collaborator

@djarecka I have no preference on it. Either way is fine.

…es the files that are passed with source argument; removing the text processing from StructSenseFlow and doing the processing before; updating cli.run_agent
… to api (but keep it in a separate function); adding the same arguments to StructSenseFlow as in cli: source and source_text
…ssed properly; removing src/tests from gitignore
@djarecka djarecka changed the title [wip] adding source checks and source_text option adding source checks and source_text option Feb 26, 2026
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 52.63158% with 72 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (improvement@9c0c9d3). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/utils/utils.py 23.80% 48 Missing ⚠️
src/structsense/app.py 28.57% 15 Missing ⚠️
src/structsense/cli.py 64.00% 9 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             improvement      #67   +/-   ##
==============================================
  Coverage               ?   13.78%           
==============================================
  Files                  ?       21           
  Lines                  ?     4920           
  Branches               ?        0           
==============================================
  Hits                   ?      678           
  Misses                 ?     4242           
  Partials               ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@djarecka djarecka mentioned this pull request Feb 27, 2026
Copy link
Collaborator

@tekrajchhetri tekrajchhetri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djarecka could you also update the readme so that it aligns with the changes made?

@djarecka
Copy link
Contributor Author

djarecka commented Mar 2, 2026

@djarecka could you also update the readme so that it aligns with the changes made?

which one? I updated at least one Readme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants