Skip to content

feat: Read and write variant files#57

Closed
tdayris wants to merge 5 commits intosnakemake:masterfrom
tdayris:feat_read_write_vcf
Closed

feat: Read and write variant files#57
tdayris wants to merge 5 commits intosnakemake:masterfrom
tdayris:feat_read_write_vcf

Conversation

@tdayris
Copy link
Copy Markdown
Contributor

@tdayris tdayris commented Mar 24, 2026

This is a draft for #55

Tools like picard don't handle writing to stderr, so I chose to use mkfifo instead of pipes.

Does this mean bcftools would have to be included in snakemake-wrappers-utils ?

Usage within wrappers would look like:

from snakemake_wrapper_utils.bcftools import get_bcftools_opts, read_write_variants

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
read_write_commands, input_file, output_file = read_write_variants(snakemake)

shell(
    "{read_write_commands} "
    "picard RenameSampleInVcf {snakemake.params.extra} "
    "INPUT={input_file} OUTPUT={output_file} {log}"
)

Usage within Snakemake would be something like:

rule test_picard_renamesampleinvcf:
    input:
        call="a.bcf",
    output:
       call="renamed.bcf",
    threads: 3
    log: 
        "test_picard_renamesampleinvcf.log",
    params:
       extra="",
    wrapper:
        "master/bio/picard/renamesampleinvcf"

Do you think it's worth testing?

Summary by CodeRabbit

  • New Features

    • Added streaming support for variant data via named pipes, handling compressed (gz) and BCF formats and returning the appropriate paths/commands; includes automatic thread and resource validation for safe parallel use.
  • Bug Fixes

    • Corrected GATK argument-file option naming so generated commands use the correct plural form for argument files.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

Adds read_write_variants(snakemake, variant_key_name="call") to build bcftools mkfifo/streaming commands, resolve input/output keys, and enforce minimum thread counts; updates GATK option handling to use --arguments_file instead of --argument_file.

Changes

Cohort / File(s) Summary
BCFTools Variant Streaming
snakemake_wrapper_utils/bcftools.py
Added read_write_variants() returning (command_lines, input_file_name, output_file_name). Resolves input/output by key or index, enforces snakemake.threads >= min_threads, creates named pipes for .gz/.bcf inputs/outputs, and appends mkfifo + bcftools view streaming commands; adjusts required threads accordingly.
GATK Argument File Naming
snakemake_wrapper_utils/gatk.py
Replaced --argument_file with --arguments_file in validation messages and when joining multiple argument-file entries used to build GATK command-line options.

Sequence Diagram(s)

sequenceDiagram
    participant Snakemake
    participant read_write_variants
    participant FS as Filesystem
    participant BC as bcftools

    Snakemake->>read_write_variants: call with snakemake, variant_key_name
    read_write_variants->>read_write_variants: resolve input/output paths and compute min_threads
    read_write_variants->>read_write_variants: validate snakemake.threads >= min_threads
    alt input is .gz or .bcf
        read_write_variants->>FS: mkfifo <input_fifo>
        FS-->>read_write_variants: fifo ready
        read_write_variants->>BC: start "bcftools view ... > <input_fifo> &"
    end
    alt output is .gz or .bcf
        read_write_variants->>FS: mkfifo <output_fifo>
        FS-->>read_write_variants: fifo ready
        read_write_variants->>BC: start "bcftools view <output_fifo> ... > <output> &"
    end
    read_write_variants-->>Snakemake: return (command_lines, input_file_name, output_file_name)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main addition: a new function for reading and writing variant files in VCF/BCF format.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
snakemake_wrapper_utils/bcftools.py (3)

161-161: Consider unique FIFO names to prevent conflicts in parallel execution.

The hardcoded FIFO names (snakemake_wrapper_utils_read_variants.vcf and snakemake_wrapper_utils_write_variants.vcf) could cause conflicts when multiple jobs run concurrently in the same working directory. Consider using unique identifiers (e.g., based on output filename or a UUID).

♻️ Suggested approach using output filename for uniqueness
+import os
+import uuid
...
     if str(in_call).endswith((".gz", ".bcf")):
-        input_file_name = "snakemake_wrapper_utils_read_variants.vcf"
+        unique_id = uuid.uuid4().hex[:8]
+        input_file_name = f"snakemake_wrapper_utils_read_variants_{unique_id}.vcf"
...
     if str(out_call).endswith((".gz", ".bcf")):
-        output_file_name = "snakemake_wrapper_utils_write_variants.vcf"
+        unique_id = uuid.uuid4().hex[:8]
+        output_file_name = f"snakemake_wrapper_utils_write_variants_{unique_id}.vcf"

Also applies to: 181-181

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_wrapper_utils/bcftools.py` at line 161, The hardcoded FIFO names
(snakemake_wrapper_utils_read_variants.vcf and
snakemake_wrapper_utils_write_variants.vcf) can collide in parallel runs; change
how input_file_name and the corresponding write FIFO name are generated to
include a unique suffix (e.g., a UUID or a sanitized version of the output
filename) so each job gets distinct FIFO names, use that unique name when
creating the FIFO (os.mkfifo) and when opening/cleaning it, and ensure any
cleanup logic removes the job-specific FIFOs on exit.

149-150: Docstring is incomplete.

The docstring mentions only reading, but the function also handles writing compressed variants. Consider updating to reflect the full functionality.

📝 Suggested docstring
 def read_write_variants(snakemake, variant_key_name="call"):
-    """Obtain command lines to read gzipped VCF or BCF files and path to named pipes"""
+    """Obtain command lines to read/write gzipped VCF or BCF files via named pipes.
+    
+    Returns a tuple of (command_lines, input_file_name, output_file_name) where
+    command_lines contains mkfifo and bcftools commands to be prepended to the
+    shell command, and the file names are either the original paths or FIFO paths.
+    """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_wrapper_utils/bcftools.py` around lines 149 - 150, The docstring
for read_write_variants is incomplete because it only mentions reading; update
the docstring to document both reading and writing of gzipped VCF/BCF (including
named pipe creation), describe the parameters (snakemake,
variant_key_name="call"), explain the return values (command lines for reading
and writing compressed variants and paths to any created named pipes), and note
expected behavior for input formats (VCF/BCF) and compression handling so
callers know how to use the function.

173-176: Consider adding FIFO cleanup to prevent orphan files.

The generated commands create FIFOs but don't clean them up after use. Consider adding cleanup commands (e.g., rm -f at the end or using a trap).

Also applies to: 186-189

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_wrapper_utils/bcftools.py` around lines 173 - 176, The mkfifo usage
in the command construction (where command_lines is appended with "mkfifo
{input_file_name} ; bcftools view {bcftools_opts} {in_call} > {input_file_name}
&") leaves FIFOs behind; modify the command generation to ensure the FIFO named
by input_file_name is removed after use (either append a synchronous cleanup "rm
-f {input_file_name}" once the bcftools process completes or add a shell trap
that removes the FIFO on EXIT). Update both the instance at the shown block and
the similar block around lines 186-189 so that input_file_name is always cleaned
up even on error or interruption, and ensure background process management is
handled so cleanup runs after bcftools finishes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@snakemake_wrapper_utils/bcftools.py`:
- Line 152: The line uses a type-annotation instead of assignment for the
variable min_threads (min_threads: 1), so min_threads is never defined; change
it to an actual assignment (min_threads = 1) so the variable exists when
referenced later (e.g., where min_threads is used) and ensure this assignment is
placed before any uses in the module or function that reference min_threads.
- Around line 178-180: The output handling currently does not validate the
retrieved value from snakemake.output.get(variant_key_name): check that out_call
(and thus output_file_name) is not None like the input handling does; if
snakemake.output.get(variant_key_name) returns None, raise a descriptive error
(e.g., ValueError or RuntimeError) indicating the missing output key/filename so
downstream code doesn't receive None, and then proceed with the existing check
for str(out_call).endswith((".gz", ".bcf")).

---

Nitpick comments:
In `@snakemake_wrapper_utils/bcftools.py`:
- Line 161: The hardcoded FIFO names (snakemake_wrapper_utils_read_variants.vcf
and snakemake_wrapper_utils_write_variants.vcf) can collide in parallel runs;
change how input_file_name and the corresponding write FIFO name are generated
to include a unique suffix (e.g., a UUID or a sanitized version of the output
filename) so each job gets distinct FIFO names, use that unique name when
creating the FIFO (os.mkfifo) and when opening/cleaning it, and ensure any
cleanup logic removes the job-specific FIFOs on exit.
- Around line 149-150: The docstring for read_write_variants is incomplete
because it only mentions reading; update the docstring to document both reading
and writing of gzipped VCF/BCF (including named pipe creation), describe the
parameters (snakemake, variant_key_name="call"), explain the return values
(command lines for reading and writing compressed variants and paths to any
created named pipes), and note expected behavior for input formats (VCF/BCF) and
compression handling so callers know how to use the function.
- Around line 173-176: The mkfifo usage in the command construction (where
command_lines is appended with "mkfifo {input_file_name} ; bcftools view
{bcftools_opts} {in_call} > {input_file_name} &") leaves FIFOs behind; modify
the command generation to ensure the FIFO named by input_file_name is removed
after use (either append a synchronous cleanup "rm -f {input_file_name}" once
the bcftools process completes or add a shell trap that removes the FIFO on
EXIT). Update both the instance at the shown block and the similar block around
lines 186-189 so that input_file_name is always cleaned up even on error or
interruption, and ensure background process management is handled so cleanup
runs after bcftools finishes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 516c90cc-92f0-4c9c-9173-bfa92cbed0df

📥 Commits

Reviewing files that changed from the base of the PR and between 53c8556 and 6b1b8cd.

📒 Files selected for processing (2)
  • snakemake_wrapper_utils/bcftools.py
  • snakemake_wrapper_utils/gatk.py

Comment thread snakemake_wrapper_utils/bcftools.py Outdated
Comment thread snakemake_wrapper_utils/bcftools.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
snakemake_wrapper_utils/bcftools.py (1)

197-201: Consider adding FIFO cleanup to returned commands.

The named pipes created by mkfifo are not cleaned up. If the calling code or bcftools fails, orphaned FIFOs will remain in the filesystem. Consider appending cleanup commands or documenting that callers should handle cleanup.

💡 Example approach: return cleanup commands separately or document requirement

One option is to add a trap for cleanup in the command string:

# At the start of command_lines when FIFOs are used:
cleanup_files = []
# ... when creating FIFOs, track them in cleanup_files ...

# Prepend trap to command_lines:
if cleanup_files:
    cleanup_cmd = " ".join(f"rm -f {f}" for f in cleanup_files)
    command_lines = f"trap '{cleanup_cmd}' EXIT ; " + command_lines
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_wrapper_utils/bcftools.py` around lines 197 - 201, The returned
command_lines currently creates named pipes (FIFOs) but does not ensure they are
removed on exit; modify the code that constructs command_lines in bcftools.py to
track all created FIFO paths (e.g., collect into a cleanup_files list when
mkfifo is added) and prepend a shell trap (or append cleanup commands) so that
"rm -f" is executed on those FIFO paths on EXIT or ERR; ensure the final return
still returns command_lines, input_file_name, output_file_name but with the
trap/cleanup integrated (or alternatively return a separate cleanup command
string alongside command_lines) and reference the existing variables
command_lines, input_file_name, output_file_name when implementing the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@snakemake_wrapper_utils/bcftools.py`:
- Around line 186-195: get_bcftools_opts currently hardcodes snakemake.output[0]
for the --output flag which causes bcftools to write to the wrong file when the
variant output is not the first output; fix by resolving the actual variant
output path using variant_key_name (e.g., variant_output =
snakemake.output[variant_key_name]) and ensure that value is passed into or used
by get_bcftools_opts (or override its --output inside the caller before building
bcftools_opts) so bcftools writes to the intended variant file; update the call
site around this block (where get_bcftools_opts(snakemake) is invoked) to use
the variant_key_name-resolved output instead of snakemake.output[0].
- Around line 160-176: The hardcoded FIFO name stored in input_file_name and
created with mkfifo causes collisions and mkfifo failures; change this to
generate a unique FIFO path (e.g., use tempfile.mkstemp/mktemp or a UUID
combined with snakemake.rule or snakemake.jobid) before calling mkfifo, update
the mkfifo/redirect command built into command_lines to reference that unique
name, and ensure existing stale FIFOs are detected and unlinked before creating
(or create atomically via tempfile APIs) so functions using input_file_name, the
mkfifo command and the bcftools view redirect remain correct; keep
get_bcftools_opts and the rest of the command construction unchanged except for
replacing the fixed input_file_name with the generated unique path.

---

Nitpick comments:
In `@snakemake_wrapper_utils/bcftools.py`:
- Around line 197-201: The returned command_lines currently creates named pipes
(FIFOs) but does not ensure they are removed on exit; modify the code that
constructs command_lines in bcftools.py to track all created FIFO paths (e.g.,
collect into a cleanup_files list when mkfifo is added) and prepend a shell trap
(or append cleanup commands) so that "rm -f" is executed on those FIFO paths on
EXIT or ERR; ensure the final return still returns command_lines,
input_file_name, output_file_name but with the trap/cleanup integrated (or
alternatively return a separate cleanup command string alongside command_lines)
and reference the existing variables command_lines, input_file_name,
output_file_name when implementing the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 858eb9ea-da56-4985-a31f-5d2a09f2d916

📥 Commits

Reviewing files that changed from the base of the PR and between 6b1b8cd and b6d9977.

📒 Files selected for processing (1)
  • snakemake_wrapper_utils/bcftools.py

Comment thread snakemake_wrapper_utils/bcftools.py
Comment thread snakemake_wrapper_utils/bcftools.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
snakemake_wrapper_utils/bcftools.py (2)

164-179: ⚠️ Potential issue | 🟠 Major

FIFO naming can still collide across parallel jobs using the same variant paths.

Line 164 and Line 192 derive FIFO paths deterministically from in_call/out_call. Parallel jobs operating on the same file path can race on mkfifo or fail due to stale FIFOs from prior runs.

🔧 Proposed fix (unique FIFO names)
+import uuid
@@
-        input_file_name = f"{in_call}.snakemake_wrapper_utils_read_variants.vcf"
+        input_file_name = (
+            f"{in_call}.{uuid.uuid4().hex}.snakemake_wrapper_utils_read_variants.vcf"
+        )
@@
-        output_file_name = f"{out_call}.snakemake_wrapper_utils_write_variants.vcf"
+        output_file_name = (
+            f"{out_call}.{uuid.uuid4().hex}.snakemake_wrapper_utils_write_variants.vcf"
+        )

Also applies to: 192-200

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_wrapper_utils/bcftools.py` around lines 164 - 179, The FIFO name is
derived deterministically from in_call/out_call causing collisions; change
creation of input_file_name/output_file_name to use unique temporary FIFO names
(e.g., include a UUID or process PID+random suffix or use tempfile to generate a
unique path in /tmp) and update the mkfifo command usage that builds
command_lines (the f"mkfifo {input_file_name} ; bcftools view ..." and the
analogous block for out_call) to use that unique name; also ensure the FIFO is
removed/cleaned up after use.

195-200: ⚠️ Potential issue | 🟠 Major

Output routing still depends on snakemake.output[0] instead of resolved out_call.

At Line 196, get_bcftools_opts(snakemake) injects --output/--output-type from snakemake.output[0], which can mismatch variant_key_name and write to the wrong file.

🔧 Proposed fix
-        # This time, we include threading and output formats
-        bcftools_opts = get_bcftools_opts(snakemake)
+        # Keep shared options, but bind output to resolved variant target
+        bcftools_opts = get_bcftools_opts(
+            snakemake,
+            parse_output=False,
+            parse_output_format=False,
+        )
+        out_format = infer_out_format(
+            out_call, snakemake.params.get("uncompressed_bcf", False)
+        )
         command_lines += str(
             f"mkfifo {output_file_name} ; "
-            f"bcftools view {bcftools_opts} {output_file_name} & "
+            f"bcftools view {bcftools_opts} --output {out_call} --output-type {out_format} {output_file_name} & "
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_wrapper_utils/bcftools.py` around lines 195 - 200, The bcftools
invocation is built using get_bcftools_opts(snakemake) which pulls
--output/--output-type from snakemake.output[0] and can write to the wrong file;
update the call site so bcftools_opts is built or overridden to target the
resolved out_call/output_file_name (or remove any --output flags produced by
get_bcftools_opts and append "--output-type z -o {output_file_name}" explicitly)
— modify the symbols get_bcftools_opts, bcftools_opts, and the command
construction that uses output_file_name/out_call so the final bcftools command
writes to output_file_name (matching variant_key_name) rather than
snakemake.output[0].
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@snakemake_wrapper_utils/bcftools.py`:
- Around line 151-154: The code incorrectly treats any integer variant_key_name
as selecting snakemake.input[0]; update the logic in the block that sets in_call
(and the analogous block that sets out_call) to index snakemake.input and
snakemake.output using the provided integer variant_key_name (e.g.,
snakemake.input[variant_key_name] and snakemake.output[variant_key_name])
instead of hardcoding 0, and add a simple bounds check to raise a clear error if
the provided integer is out of range; refer to the variables variant_key_name,
in_call, out_call, snakemake.input, and snakemake.output to locate the two
affected sections.

---

Duplicate comments:
In `@snakemake_wrapper_utils/bcftools.py`:
- Around line 164-179: The FIFO name is derived deterministically from
in_call/out_call causing collisions; change creation of
input_file_name/output_file_name to use unique temporary FIFO names (e.g.,
include a UUID or process PID+random suffix or use tempfile to generate a unique
path in /tmp) and update the mkfifo command usage that builds command_lines (the
f"mkfifo {input_file_name} ; bcftools view ..." and the analogous block for
out_call) to use that unique name; also ensure the FIFO is removed/cleaned up
after use.
- Around line 195-200: The bcftools invocation is built using
get_bcftools_opts(snakemake) which pulls --output/--output-type from
snakemake.output[0] and can write to the wrong file; update the call site so
bcftools_opts is built or overridden to target the resolved
out_call/output_file_name (or remove any --output flags produced by
get_bcftools_opts and append "--output-type z -o {output_file_name}" explicitly)
— modify the symbols get_bcftools_opts, bcftools_opts, and the command
construction that uses output_file_name/out_call so the final bcftools command
writes to output_file_name (matching variant_key_name) rather than
snakemake.output[0].

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 22992318-fdab-4f5c-937a-7db10d4dda74

📥 Commits

Reviewing files that changed from the base of the PR and between b6d9977 and 8d306b3.

📒 Files selected for processing (1)
  • snakemake_wrapper_utils/bcftools.py

Comment on lines +151 to +154
if isinstance(variant_key_name, int):
in_call = snakemake.input[0]
else:
in_call = snakemake.input.get(variant_key_name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

variant_key_name integer path is ignored (always index 0).

Line 152 and Line 182 always read index 0, so passing variant_key_name=1 (or any other int) silently selects the wrong input/output.

🔧 Proposed fix
-    if isinstance(variant_key_name, int):
-        in_call = snakemake.input[0]
+    if isinstance(variant_key_name, int):
+        in_call = snakemake.input[variant_key_name]
@@
-    if isinstance(variant_key_name, int):
-        out_call = snakemake.output[0]
+    if isinstance(variant_key_name, int):
+        out_call = snakemake.output[variant_key_name]

Also applies to: 181-184

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_wrapper_utils/bcftools.py` around lines 151 - 154, The code
incorrectly treats any integer variant_key_name as selecting snakemake.input[0];
update the logic in the block that sets in_call (and the analogous block that
sets out_call) to index snakemake.input and snakemake.output using the provided
integer variant_key_name (e.g., snakemake.input[variant_key_name] and
snakemake.output[variant_key_name]) instead of hardcoding 0, and add a simple
bounds check to raise a clear error if the provided integer is out of range;
refer to the variables variant_key_name, in_call, out_call, snakemake.input, and
snakemake.output to locate the two affected sections.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
snakemake_wrapper_utils/bcftools.py (3)

151-154: ⚠️ Potential issue | 🟠 Major

Use the integer key when resolving the input.

Line 152 still ignores variant_key_name; variant_key_name=1 reads snakemake.input[0] but writes snakemake.output[1], which can pair the wrong input with the requested output.

🐛 Proposed fix
     if isinstance(variant_key_name, int):
-        in_call = snakemake.input[0]
+        in_call = snakemake.input[variant_key_name]
     else:
         in_call = snakemake.input.get(variant_key_name)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_wrapper_utils/bcftools.py` around lines 151 - 154, The code that
resolves the input wrongly uses index 0 when variant_key_name is an int; change
the branch so that when variant_key_name is an int you index snakemake.input
with that integer (use snakemake.input[variant_key_name]) instead of always
using 0, leaving the else branch to use snakemake.input.get(variant_key_name)
for non-int keys; update the assignment to in_call accordingly to ensure the
input selected matches the output index.

196-196: ⚠️ Potential issue | 🟠 Major

Write compressed output to the resolved variant output, not output[0].

Line 196 calls get_bcftools_opts(snakemake), which builds --output and --output-type from snakemake.output[0]. If variant_key_name points to any other output, bcftools writes the converted stream to the wrong file.

🐛 Proposed fix
         min_threads += 1
 
         # This time, we include threading and output formats
-        bcftools_opts = get_bcftools_opts(snakemake)
+        bcftools_opts = get_bcftools_opts(
+            snakemake,
+            parse_output=False,
+            parse_output_format=False,
+        )
+        out_format = infer_out_format(
+            str(out_call), snakemake.params.get("uncompressed_bcf", False)
+        )
+        bcftools_opts += f" --output {out_call} --output-type {out_format}"
         command_lines += str(

Read-only check:

#!/bin/bash
rg -n -C3 'def get_bcftools_opts|--output \{snakemake\.output\[0\]\}|bcftools_opts = get_bcftools_opts\(snakemake\)' snakemake_wrapper_utils/bcftools.py
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_wrapper_utils/bcftools.py` at line 196, get_bcftools_opts builds
--output/--output-type using snakemake.output[0], so bcftools_opts will always
point to output[0] even when the desired resolved variant file is a different
key (variant_key_name); change the call site to pass the correct resolved output
path (the value at snakemake.output[variant_key_name] or the
resolved_variant_path variable) into get_bcftools_opts or add an argument to
get_bcftools_opts to accept the target path, and ensure bcftools_opts is
constructed from that target rather than snakemake.output[0] so the compressed
output is written to the intended variant output file (update references to
bcftools_opts and any invocation of get_bcftools_opts accordingly).

163-164: ⚠️ Potential issue | 🟠 Major

Make FIFO paths unique per invocation.

Deriving FIFO names from in_call/out_call still collides when parallel jobs use the same variant path, and stale FIFOs from failed runs can break the next attempt.

🛡️ Proposed fix using unique FIFO names
+import os
 import sys
+import tempfile
+from uuid import uuid4
 from snakemake_wrapper_utils.snakemake import get_mem, is_arg
@@
+def _variant_fifo_path(direction):
+    return os.path.join(
+        tempfile.gettempdir(),
+        f"snakemake_wrapper_utils_{direction}_{uuid4().hex}.vcf",
+    )
+
+
 def read_write_variants(snakemake, variant_key_name="call"):
@@
     input_file_name = in_call
     if str(in_call).endswith((".gz", ".bcf")):
-        input_file_name = f"{in_call}.snakemake_wrapper_utils_read_variants.vcf"
+        input_file_name = _variant_fifo_path("read_variants")
@@
     if str(out_call).endswith((".gz", ".bcf")):
-        output_file_name = f"{out_call}.snakemake_wrapper_utils_write_variants.vcf"
+        output_file_name = _variant_fifo_path("write_variants")

You can verify the remaining deterministic FIFO construction with:

#!/bin/bash
rg -n -C2 'snakemake_wrapper_utils_(read|write)_variants\.vcf|mkfifo' snakemake_wrapper_utils/bcftools.py

Also applies to: 191-192

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_wrapper_utils/bcftools.py` around lines 163 - 164, The FIFO
filenames derived from in_call/out_call are deterministic and can collide across
parallel jobs or with stale FIFOs; change the construction of
input_file_name/output_file_name (the
".snakemake_wrapper_utils_read_variants.vcf" and
".snakemake_wrapper_utils_write_variants.vcf" names) to include a unique
per-invocation suffix (e.g. uuid.uuid4() or a combination of os.getpid() and a
timestamp) so each mkfifo target is unique, and add existence checks/cleanup
before creating the FIFO; apply the same change to the analogous code around the
output creation (the lines referenced at 191-192).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@snakemake_wrapper_utils/bcftools.py`:
- Around line 151-154: The code that resolves the input wrongly uses index 0
when variant_key_name is an int; change the branch so that when variant_key_name
is an int you index snakemake.input with that integer (use
snakemake.input[variant_key_name]) instead of always using 0, leaving the else
branch to use snakemake.input.get(variant_key_name) for non-int keys; update the
assignment to in_call accordingly to ensure the input selected matches the
output index.
- Line 196: get_bcftools_opts builds --output/--output-type using
snakemake.output[0], so bcftools_opts will always point to output[0] even when
the desired resolved variant file is a different key (variant_key_name); change
the call site to pass the correct resolved output path (the value at
snakemake.output[variant_key_name] or the resolved_variant_path variable) into
get_bcftools_opts or add an argument to get_bcftools_opts to accept the target
path, and ensure bcftools_opts is constructed from that target rather than
snakemake.output[0] so the compressed output is written to the intended variant
output file (update references to bcftools_opts and any invocation of
get_bcftools_opts accordingly).
- Around line 163-164: The FIFO filenames derived from in_call/out_call are
deterministic and can collide across parallel jobs or with stale FIFOs; change
the construction of input_file_name/output_file_name (the
".snakemake_wrapper_utils_read_variants.vcf" and
".snakemake_wrapper_utils_write_variants.vcf" names) to include a unique
per-invocation suffix (e.g. uuid.uuid4() or a combination of os.getpid() and a
timestamp) so each mkfifo target is unique, and add existence checks/cleanup
before creating the FIFO; apply the same change to the analogous code around the
output creation (the lines referenced at 191-192).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c688ce7a-d9c5-40ca-a201-4f8e6576c087

📥 Commits

Reviewing files that changed from the base of the PR and between 8d306b3 and d86c7ba.

📒 Files selected for processing (1)
  • snakemake_wrapper_utils/bcftools.py

Comment on lines +176 to +179
command_lines += str(
f"mkfifo {input_file_name} ; "
f"bcftools view {bcftools_opts} {in_call} > {input_file_name} & "
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

{prefix} tool_using_fifos ... ; wait

Repository: snakemake/snakemake-wrapper-utils

Length of output: 123


🏁 Script executed:

#!/bin/bash
rg -n -C4 '\bread_write_variants\s*\('
rg -n -C4 '\bwait\b'

Repository: snakemake/snakemake-wrapper-utils

Length of output: 702


🏁 Script executed:

# Get the file structure and lines in question
head -220 snakemake_wrapper_utils/bcftools.py | tail -80

Repository: snakemake/snakemake-wrapper-utils

Length of output: 2867


🏁 Script executed:

# Find all callers of read_write_variants
rg -n 'read_write_variants' --type py

Repository: snakemake/snakemake-wrapper-utils

Length of output: 177


🏁 Script executed:

# Search in snakemake wrappers repo for usage patterns
git log --oneline --all | head -5

Repository: snakemake/snakemake-wrapper-utils

Length of output: 310


🏁 Script executed:

# Search for all references to read_write_variants across the repo
rg 'read_write_variants' -A 10 -B 2

Repository: snakemake/snakemake-wrapper-utils

Length of output: 1001


🏁 Script executed:

# Check if this is exported in __init__.py or similar
find . -name "*.py" -type f | xargs grep -l "read_write_variants"

Repository: snakemake/snakemake-wrapper-utils

Length of output: 114


🏁 Script executed:

# Check documentation or examples
find . -name "README*" -o -name "*.md" | xargs grep -l "read_write_variants" 2>/dev/null || echo "No markdown found"

Repository: snakemake/snakemake-wrapper-utils

Length of output: 94


🏁 Script executed:

# Check if this is published/used in snakemake-wrappers
curl -s "https://raw.githubusercontent.com/snakemake/snakemake-wrappers/master/bio/bcftools/index/wrapper.smk" | head -50 || echo "Could not fetch"

Repository: snakemake/snakemake-wrapper-utils

Length of output: 91


🏁 Script executed:

# Look at the complete function to understand the intended usage pattern
sed -n '149,210p' snakemake_wrapper_utils/bcftools.py

Repository: snakemake/snakemake-wrapper-utils

Length of output: 2221


🏁 Script executed:

# Check if there's any documentation on how command_lines should be used
cat snakemake_wrapper_utils/bcftools.py | head -30

Repository: snakemake/snakemake-wrapper-utils

Length of output: 849


🏁 Script executed:

# Look for test files or examples
find . -type f -name "*test*" -o -name "*example*" | head -20

Repository: snakemake/snakemake-wrapper-utils

Length of output: 146


🏁 Script executed:

# Check git history for usage context
git log --all --oneline -- snakemake_wrapper_utils/bcftools.py | head -10

Repository: snakemake/snakemake-wrapper-utils

Length of output: 812


🏁 Script executed:

# Search in the entire git history for references to read_write_variants
git log -p --all -S "read_write_variants" | head -200

Repository: snakemake/snakemake-wrapper-utils

Length of output: 59


🏁 Script executed:

# Check __init__.py to see what's exported
cat snakemake_wrapper_utils/__init__.py 2>/dev/null || echo "No __init__.py or empty"

Repository: snakemake/snakemake-wrapper-utils

Length of output: 98


🏁 Script executed:

# Check the test file for usage patterns
cat tests/test_snakemake_wrapper_utils.py

Repository: snakemake/snakemake-wrapper-utils

Length of output: 180


🏁 Script executed:

# Check if there are any examples in comments or docstrings
rg -A 20 "read_write_variants" snakemake_wrapper_utils/bcftools.py

Repository: snakemake/snakemake-wrapper-utils

Length of output: 843


🏁 Script executed:

# Look at the git commit that introduced read_write_variants
git show c6f0fa7 -- snakemake_wrapper_utils/bcftools.py | head -100

Repository: snakemake/snakemake-wrapper-utils

Length of output: 2991


🏁 Script executed:

# Check recent commits to understand context better
git log --oneline -20

Repository: snakemake/snakemake-wrapper-utils

Length of output: 103


🏁 Script executed:

# See if there's any documentation or README
cat README.md 2>/dev/null || echo "No README"

Repository: snakemake/snakemake-wrapper-utils

Length of output: 367


🏁 Script executed:

# Look at the full context of how command_lines is assembled
sed -n '149,220p' snakemake_wrapper_utils/bcftools.py | cat -n

Repository: snakemake/snakemake-wrapper-utils

Length of output: 2627


🏁 Script executed:

# Check the complete docstring and surrounding context
sed -n '149,160p' snakemake_wrapper_utils/bcftools.py

Repository: snakemake/snakemake-wrapper-utils

Length of output: 532


🏁 Script executed:

# Search for any usage documentation or patterns in comments
rg -B 5 -A 30 "def read_write_variants" snakemake_wrapper_utils/bcftools.py

Repository: snakemake/snakemake-wrapper-utils

Length of output: 1278


Background bcftools processes need explicit wait to propagate failures.

The function returns a command string with backgrounded bcftools processes (&) but doesn't enforce or document that callers must append wait to detect conversion failures. If a caller's wrapper script exits before waiting for background processes, failed conversions will be masked while the foreground command succeeds, risking silent data corruption.

Either:

  1. Return a separate teardown string (e.g., "; wait") that must be appended to the command, or
  2. Expand the docstring to explicitly require that callers append wait to the returned prefix

This applies to lines 176-179, 197-199, and the return statement at line 206.

@tdayris tdayris closed this Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant