Skip to content

feat: new formats detection#58

Merged
fgvieira merged 6 commits intosnakemake:masterfrom
tdayris:compression_formats
Apr 24, 2026
Merged

feat: new formats detection#58
fgvieira merged 6 commits intosnakemake:masterfrom
tdayris:compression_formats

Conversation

@tdayris
Copy link
Copy Markdown
Contributor

@tdayris tdayris commented Apr 24, 2026

This is being related to PR on snakemake-wrappers.

This PR adds new formats detection, allows to return compression format without breaking retro-compatibility.

New formats can easily be added in the future with addition in the dictionaries.

Summary by CodeRabbit

  • Bug Fixes

    • Improved formatting of error messages for invalid memory overhead inputs
  • New Features

    • Enhanced file format detection with better handling of compressed file types and explicit FASTA/FASTQ recognition
    • Added an option to control whether compression suffixes are ignored during format detection

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

Warning

Rate limit exceeded

@tdayris has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 36 minutes and 25 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 36 minutes and 25 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 71103bc1-121a-45b5-8759-bccef0efe475

📥 Commits

Reviewing files that changed from the base of the PR and between 8a24b6e and 74bd4a5.

📒 Files selected for processing (1)
  • snakemake_wrapper_utils/snakemake.py
📝 Walkthrough

Walkthrough

Updated two functions in the snakemake module: get_mem now uses a multi-line f-string for its invalid mem_overhead_factor ValueError message; get_format gains an ignore_compression=True parameter and replaces hardcoded extension checks with dictionary-driven normalization for compression and base formats (FASTA/FASTQ), and uses rstrip(".") for the returned extension.

Changes

Cohort / File(s) Summary
Function Updates
snakemake_wrapper_utils/snakemake.py
get_mem: reformats the ValueError message for invalid mem_overhead_factor using a multi-line f-string. get_format: signature changed to get_format(path, ignore_compression=True); replaces hardcoded extension logic with dictionaries mapping compression suffixes and bioinfo base extensions; conditionally strips compression only when ignore_compression is true; requires base extension for compressed paths; returns normalized extension via rstrip(".").

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'feat: new formats detection' is vague and overly broad, failing to highlight the primary change which is extending get_format() with ignore_compression parameter and refactoring format normalization logic. Improve the title to be more specific about the main change, such as 'feat: add ignore_compression parameter to get_format()' or 'refactor: improve format detection with compression handling'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
snakemake_wrapper_utils/snakemake.py (1)

53-55: Misplaced docstring.

Line 55 is not actually a docstring — it follows the from pathlib import Path import on line 53, so Python treats it as a plain string expression statement. Tools like help(get_format), Sphinx, and IDEs will not pick it up. Move the string to be the first statement of the function (or convert from pathlib import Path to a module-level import).

♻️ Proposed fix
 def get_format(path, ignore_compression=True):
-    from pathlib import Path
-
     """Get file format from extension, ignoring common compressions on user request"""
+    from pathlib import Path
+
     if not path:
         raise ValueError("Path cannot be empty")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_wrapper_utils/snakemake.py` around lines 53 - 55, The triple-quoted
string after the import is a misplaced docstring and should be moved so tools
can see it: either relocate that string as the first statement inside the
get_format function (making it the function docstring) or convert the `from
pathlib import Path` line into a module-level import and place the descriptive
string at top-of-file as the module docstring; update the get_format function
signature/location if necessary to ensure help(get_format) and Sphinx/IDEs pick
up the docstring.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@snakemake_wrapper_utils/snakemake.py`:
- Line 58: The code lowercases suffixes via "exts = [s.lower() for s in
Path(path).suffixes]" but the compression mapping contains an uppercase key
'".Z": "zlib"', making ".Z" unreachable; update the mapping in the same scope to
use lowercase keys (e.g. replace '".Z"' with '".z"') or add both lowercase and
uppercase variants for the key so the lookup matches the lowercased suffixes,
and while editing verify the intended format label for the classic ".Z"
compression entry (function/variable around the suffixes/mapping logic).
- Around line 83-84: The dictionary compression_fmt contains misspelled entries
for deflate; update it so the canonical extension ".deflate" maps to the value
"deflate" and remove the incorrect keys ".deflat" and ".defalte" (and the
misspelled value "defalte") so callers of compression_fmt receive the correct
"deflate" string; locate the compression_fmt dict in snakemake.py and replace
the two buggy entries with a single correct mapping ".deflate": "deflate".
- Around line 111-113: The branch handling ignore_compression=False currently
maps only the last suffix (exts[-1]) through compression_fmt then bioinfo_fmt,
losing the underlying base format; change the logic to extract both the
compression suffix and the base suffix (e.g., base_ext = exts[-2] if len(exts) >
1 else exts[-1] and comp_ext = exts[-1] when they differ), map base_ext through
bioinfo_fmt and comp_ext through compression_fmt, and return both pieces (e.g.,
a tuple (base_fmt, compression_fmt_or_None) or a small dataclass) instead of a
single string; update any callers of this function to accept the new return
shape.

---

Nitpick comments:
In `@snakemake_wrapper_utils/snakemake.py`:
- Around line 53-55: The triple-quoted string after the import is a misplaced
docstring and should be moved so tools can see it: either relocate that string
as the first statement inside the get_format function (making it the function
docstring) or convert the `from pathlib import Path` line into a module-level
import and place the descriptive string at top-of-file as the module docstring;
update the get_format function signature/location if necessary to ensure
help(get_format) and Sphinx/IDEs pick up the docstring.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f4adef37-3c71-4866-a7f9-264c956b801e

📥 Commits

Reviewing files that changed from the base of the PR and between 53c8556 and da26ef7.

📒 Files selected for processing (1)
  • snakemake_wrapper_utils/snakemake.py

Comment thread snakemake_wrapper_utils/snakemake.py
Comment thread snakemake_wrapper_utils/snakemake.py Outdated
Comment thread snakemake_wrapper_utils/snakemake.py Outdated
Comment thread snakemake_wrapper_utils/snakemake.py
Co-authored-by: Filipe G. Vieira <1151762+fgvieira@users.noreply.github.com>
Comment thread snakemake_wrapper_utils/snakemake.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
snakemake_wrapper_utils/snakemake.py (1)

100-100: Nit: drop the redundant .keys().

in on a dict already checks keys; .keys() builds an intermediate view unnecessarily.

♻️ Proposed refactor
-    if ignore_compression and exts[-1] in compression_fmt.keys():
+    if ignore_compression and exts[-1] in compression_fmt:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_wrapper_utils/snakemake.py` at line 100, The condition uses an
unnecessary .keys() call: replace the membership test exts[-1] in
compression_fmt.keys() with exts[-1] in compression_fmt to avoid constructing
the intermediate keys view; locate the expression in
snakemake_wrapper_utils/snakemake.py (the if statement that checks
ignore_compression and compression_fmt) and update it accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@snakemake_wrapper_utils/snakemake.py`:
- Line 110: The return currently uses ext.rstrip(".") which removes trailing
dots and leaves a leading dot on unknown extensions (ext variable) when neither
compression_fmt nor bioinfo_fmt map it; change the logic to strip a leading dot
instead (use lstrip on ext) so unmapped extensions like ".xyz" become "xyz"
before returning—update the return in the function that looks up compression_fmt
and bioinfo_fmt to return ext.lstrip(".") where ext is the fallback extension.
- Around line 53-55: The triple-quoted string after the import is not a function
docstring; move that documentation so it is the first statement inside the
get_format function (or make it a module-level docstring at the top of the file)
so get_format.__doc__ is populated; specifically, place the existing string
immediately after the def get_format(...) line (before any code or imports
inside the function) and remove the dead string literal after from pathlib
import Path.

---

Nitpick comments:
In `@snakemake_wrapper_utils/snakemake.py`:
- Line 100: The condition uses an unnecessary .keys() call: replace the
membership test exts[-1] in compression_fmt.keys() with exts[-1] in
compression_fmt to avoid constructing the intermediate keys view; locate the
expression in snakemake_wrapper_utils/snakemake.py (the if statement that checks
ignore_compression and compression_fmt) and update it accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 328ca7ca-5500-4795-b374-06532bbd1154

📥 Commits

Reviewing files that changed from the base of the PR and between da26ef7 and 8a24b6e.

📒 Files selected for processing (1)
  • snakemake_wrapper_utils/snakemake.py

Comment thread snakemake_wrapper_utils/snakemake.py Outdated
Comment thread snakemake_wrapper_utils/snakemake.py Outdated
@fgvieira fgvieira merged commit f85d8bf into snakemake:master Apr 24, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants