feat: Read and write variant files#57
Conversation
📝 WalkthroughWalkthroughAdds Changes
Sequence Diagram(s)sequenceDiagram
participant Snakemake
participant read_write_variants
participant FS as Filesystem
participant BC as bcftools
Snakemake->>read_write_variants: call with snakemake, variant_key_name
read_write_variants->>read_write_variants: resolve input/output paths and compute min_threads
read_write_variants->>read_write_variants: validate snakemake.threads >= min_threads
alt input is .gz or .bcf
read_write_variants->>FS: mkfifo <input_fifo>
FS-->>read_write_variants: fifo ready
read_write_variants->>BC: start "bcftools view ... > <input_fifo> &"
end
alt output is .gz or .bcf
read_write_variants->>FS: mkfifo <output_fifo>
FS-->>read_write_variants: fifo ready
read_write_variants->>BC: start "bcftools view <output_fifo> ... > <output> &"
end
read_write_variants-->>Snakemake: return (command_lines, input_file_name, output_file_name)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
snakemake_wrapper_utils/bcftools.py (3)
161-161: Consider unique FIFO names to prevent conflicts in parallel execution.The hardcoded FIFO names (
snakemake_wrapper_utils_read_variants.vcfandsnakemake_wrapper_utils_write_variants.vcf) could cause conflicts when multiple jobs run concurrently in the same working directory. Consider using unique identifiers (e.g., based on output filename or a UUID).♻️ Suggested approach using output filename for uniqueness
+import os +import uuid ... if str(in_call).endswith((".gz", ".bcf")): - input_file_name = "snakemake_wrapper_utils_read_variants.vcf" + unique_id = uuid.uuid4().hex[:8] + input_file_name = f"snakemake_wrapper_utils_read_variants_{unique_id}.vcf" ... if str(out_call).endswith((".gz", ".bcf")): - output_file_name = "snakemake_wrapper_utils_write_variants.vcf" + unique_id = uuid.uuid4().hex[:8] + output_file_name = f"snakemake_wrapper_utils_write_variants_{unique_id}.vcf"Also applies to: 181-181
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@snakemake_wrapper_utils/bcftools.py` at line 161, The hardcoded FIFO names (snakemake_wrapper_utils_read_variants.vcf and snakemake_wrapper_utils_write_variants.vcf) can collide in parallel runs; change how input_file_name and the corresponding write FIFO name are generated to include a unique suffix (e.g., a UUID or a sanitized version of the output filename) so each job gets distinct FIFO names, use that unique name when creating the FIFO (os.mkfifo) and when opening/cleaning it, and ensure any cleanup logic removes the job-specific FIFOs on exit.
149-150: Docstring is incomplete.The docstring mentions only reading, but the function also handles writing compressed variants. Consider updating to reflect the full functionality.
📝 Suggested docstring
def read_write_variants(snakemake, variant_key_name="call"): - """Obtain command lines to read gzipped VCF or BCF files and path to named pipes""" + """Obtain command lines to read/write gzipped VCF or BCF files via named pipes. + + Returns a tuple of (command_lines, input_file_name, output_file_name) where + command_lines contains mkfifo and bcftools commands to be prepended to the + shell command, and the file names are either the original paths or FIFO paths. + """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@snakemake_wrapper_utils/bcftools.py` around lines 149 - 150, The docstring for read_write_variants is incomplete because it only mentions reading; update the docstring to document both reading and writing of gzipped VCF/BCF (including named pipe creation), describe the parameters (snakemake, variant_key_name="call"), explain the return values (command lines for reading and writing compressed variants and paths to any created named pipes), and note expected behavior for input formats (VCF/BCF) and compression handling so callers know how to use the function.
173-176: Consider adding FIFO cleanup to prevent orphan files.The generated commands create FIFOs but don't clean them up after use. Consider adding cleanup commands (e.g.,
rm -fat the end or using a trap).Also applies to: 186-189
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@snakemake_wrapper_utils/bcftools.py` around lines 173 - 176, The mkfifo usage in the command construction (where command_lines is appended with "mkfifo {input_file_name} ; bcftools view {bcftools_opts} {in_call} > {input_file_name} &") leaves FIFOs behind; modify the command generation to ensure the FIFO named by input_file_name is removed after use (either append a synchronous cleanup "rm -f {input_file_name}" once the bcftools process completes or add a shell trap that removes the FIFO on EXIT). Update both the instance at the shown block and the similar block around lines 186-189 so that input_file_name is always cleaned up even on error or interruption, and ensure background process management is handled so cleanup runs after bcftools finishes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@snakemake_wrapper_utils/bcftools.py`:
- Line 152: The line uses a type-annotation instead of assignment for the
variable min_threads (min_threads: 1), so min_threads is never defined; change
it to an actual assignment (min_threads = 1) so the variable exists when
referenced later (e.g., where min_threads is used) and ensure this assignment is
placed before any uses in the module or function that reference min_threads.
- Around line 178-180: The output handling currently does not validate the
retrieved value from snakemake.output.get(variant_key_name): check that out_call
(and thus output_file_name) is not None like the input handling does; if
snakemake.output.get(variant_key_name) returns None, raise a descriptive error
(e.g., ValueError or RuntimeError) indicating the missing output key/filename so
downstream code doesn't receive None, and then proceed with the existing check
for str(out_call).endswith((".gz", ".bcf")).
---
Nitpick comments:
In `@snakemake_wrapper_utils/bcftools.py`:
- Line 161: The hardcoded FIFO names (snakemake_wrapper_utils_read_variants.vcf
and snakemake_wrapper_utils_write_variants.vcf) can collide in parallel runs;
change how input_file_name and the corresponding write FIFO name are generated
to include a unique suffix (e.g., a UUID or a sanitized version of the output
filename) so each job gets distinct FIFO names, use that unique name when
creating the FIFO (os.mkfifo) and when opening/cleaning it, and ensure any
cleanup logic removes the job-specific FIFOs on exit.
- Around line 149-150: The docstring for read_write_variants is incomplete
because it only mentions reading; update the docstring to document both reading
and writing of gzipped VCF/BCF (including named pipe creation), describe the
parameters (snakemake, variant_key_name="call"), explain the return values
(command lines for reading and writing compressed variants and paths to any
created named pipes), and note expected behavior for input formats (VCF/BCF) and
compression handling so callers know how to use the function.
- Around line 173-176: The mkfifo usage in the command construction (where
command_lines is appended with "mkfifo {input_file_name} ; bcftools view
{bcftools_opts} {in_call} > {input_file_name} &") leaves FIFOs behind; modify
the command generation to ensure the FIFO named by input_file_name is removed
after use (either append a synchronous cleanup "rm -f {input_file_name}" once
the bcftools process completes or add a shell trap that removes the FIFO on
EXIT). Update both the instance at the shown block and the similar block around
lines 186-189 so that input_file_name is always cleaned up even on error or
interruption, and ensure background process management is handled so cleanup
runs after bcftools finishes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 516c90cc-92f0-4c9c-9173-bfa92cbed0df
📒 Files selected for processing (2)
snakemake_wrapper_utils/bcftools.pysnakemake_wrapper_utils/gatk.py
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
snakemake_wrapper_utils/bcftools.py (1)
197-201: Consider adding FIFO cleanup to returned commands.The named pipes created by
mkfifoare not cleaned up. If the calling code or bcftools fails, orphaned FIFOs will remain in the filesystem. Consider appending cleanup commands or documenting that callers should handle cleanup.💡 Example approach: return cleanup commands separately or document requirement
One option is to add a trap for cleanup in the command string:
# At the start of command_lines when FIFOs are used: cleanup_files = [] # ... when creating FIFOs, track them in cleanup_files ... # Prepend trap to command_lines: if cleanup_files: cleanup_cmd = " ".join(f"rm -f {f}" for f in cleanup_files) command_lines = f"trap '{cleanup_cmd}' EXIT ; " + command_lines🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@snakemake_wrapper_utils/bcftools.py` around lines 197 - 201, The returned command_lines currently creates named pipes (FIFOs) but does not ensure they are removed on exit; modify the code that constructs command_lines in bcftools.py to track all created FIFO paths (e.g., collect into a cleanup_files list when mkfifo is added) and prepend a shell trap (or append cleanup commands) so that "rm -f" is executed on those FIFO paths on EXIT or ERR; ensure the final return still returns command_lines, input_file_name, output_file_name but with the trap/cleanup integrated (or alternatively return a separate cleanup command string alongside command_lines) and reference the existing variables command_lines, input_file_name, output_file_name when implementing the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@snakemake_wrapper_utils/bcftools.py`:
- Around line 186-195: get_bcftools_opts currently hardcodes snakemake.output[0]
for the --output flag which causes bcftools to write to the wrong file when the
variant output is not the first output; fix by resolving the actual variant
output path using variant_key_name (e.g., variant_output =
snakemake.output[variant_key_name]) and ensure that value is passed into or used
by get_bcftools_opts (or override its --output inside the caller before building
bcftools_opts) so bcftools writes to the intended variant file; update the call
site around this block (where get_bcftools_opts(snakemake) is invoked) to use
the variant_key_name-resolved output instead of snakemake.output[0].
- Around line 160-176: The hardcoded FIFO name stored in input_file_name and
created with mkfifo causes collisions and mkfifo failures; change this to
generate a unique FIFO path (e.g., use tempfile.mkstemp/mktemp or a UUID
combined with snakemake.rule or snakemake.jobid) before calling mkfifo, update
the mkfifo/redirect command built into command_lines to reference that unique
name, and ensure existing stale FIFOs are detected and unlinked before creating
(or create atomically via tempfile APIs) so functions using input_file_name, the
mkfifo command and the bcftools view redirect remain correct; keep
get_bcftools_opts and the rest of the command construction unchanged except for
replacing the fixed input_file_name with the generated unique path.
---
Nitpick comments:
In `@snakemake_wrapper_utils/bcftools.py`:
- Around line 197-201: The returned command_lines currently creates named pipes
(FIFOs) but does not ensure they are removed on exit; modify the code that
constructs command_lines in bcftools.py to track all created FIFO paths (e.g.,
collect into a cleanup_files list when mkfifo is added) and prepend a shell trap
(or append cleanup commands) so that "rm -f" is executed on those FIFO paths on
EXIT or ERR; ensure the final return still returns command_lines,
input_file_name, output_file_name but with the trap/cleanup integrated (or
alternatively return a separate cleanup command string alongside command_lines)
and reference the existing variables command_lines, input_file_name,
output_file_name when implementing the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 858eb9ea-da56-4985-a31f-5d2a09f2d916
📒 Files selected for processing (1)
snakemake_wrapper_utils/bcftools.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
snakemake_wrapper_utils/bcftools.py (2)
164-179:⚠️ Potential issue | 🟠 MajorFIFO naming can still collide across parallel jobs using the same variant paths.
Line 164 and Line 192 derive FIFO paths deterministically from
in_call/out_call. Parallel jobs operating on the same file path can race onmkfifoor fail due to stale FIFOs from prior runs.🔧 Proposed fix (unique FIFO names)
+import uuid @@ - input_file_name = f"{in_call}.snakemake_wrapper_utils_read_variants.vcf" + input_file_name = ( + f"{in_call}.{uuid.uuid4().hex}.snakemake_wrapper_utils_read_variants.vcf" + ) @@ - output_file_name = f"{out_call}.snakemake_wrapper_utils_write_variants.vcf" + output_file_name = ( + f"{out_call}.{uuid.uuid4().hex}.snakemake_wrapper_utils_write_variants.vcf" + )Also applies to: 192-200
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@snakemake_wrapper_utils/bcftools.py` around lines 164 - 179, The FIFO name is derived deterministically from in_call/out_call causing collisions; change creation of input_file_name/output_file_name to use unique temporary FIFO names (e.g., include a UUID or process PID+random suffix or use tempfile to generate a unique path in /tmp) and update the mkfifo command usage that builds command_lines (the f"mkfifo {input_file_name} ; bcftools view ..." and the analogous block for out_call) to use that unique name; also ensure the FIFO is removed/cleaned up after use.
195-200:⚠️ Potential issue | 🟠 MajorOutput routing still depends on
snakemake.output[0]instead of resolvedout_call.At Line 196,
get_bcftools_opts(snakemake)injects--output/--output-typefromsnakemake.output[0], which can mismatchvariant_key_nameand write to the wrong file.🔧 Proposed fix
- # This time, we include threading and output formats - bcftools_opts = get_bcftools_opts(snakemake) + # Keep shared options, but bind output to resolved variant target + bcftools_opts = get_bcftools_opts( + snakemake, + parse_output=False, + parse_output_format=False, + ) + out_format = infer_out_format( + out_call, snakemake.params.get("uncompressed_bcf", False) + ) command_lines += str( f"mkfifo {output_file_name} ; " - f"bcftools view {bcftools_opts} {output_file_name} & " + f"bcftools view {bcftools_opts} --output {out_call} --output-type {out_format} {output_file_name} & " )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@snakemake_wrapper_utils/bcftools.py` around lines 195 - 200, The bcftools invocation is built using get_bcftools_opts(snakemake) which pulls --output/--output-type from snakemake.output[0] and can write to the wrong file; update the call site so bcftools_opts is built or overridden to target the resolved out_call/output_file_name (or remove any --output flags produced by get_bcftools_opts and append "--output-type z -o {output_file_name}" explicitly) — modify the symbols get_bcftools_opts, bcftools_opts, and the command construction that uses output_file_name/out_call so the final bcftools command writes to output_file_name (matching variant_key_name) rather than snakemake.output[0].
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@snakemake_wrapper_utils/bcftools.py`:
- Around line 151-154: The code incorrectly treats any integer variant_key_name
as selecting snakemake.input[0]; update the logic in the block that sets in_call
(and the analogous block that sets out_call) to index snakemake.input and
snakemake.output using the provided integer variant_key_name (e.g.,
snakemake.input[variant_key_name] and snakemake.output[variant_key_name])
instead of hardcoding 0, and add a simple bounds check to raise a clear error if
the provided integer is out of range; refer to the variables variant_key_name,
in_call, out_call, snakemake.input, and snakemake.output to locate the two
affected sections.
---
Duplicate comments:
In `@snakemake_wrapper_utils/bcftools.py`:
- Around line 164-179: The FIFO name is derived deterministically from
in_call/out_call causing collisions; change creation of
input_file_name/output_file_name to use unique temporary FIFO names (e.g.,
include a UUID or process PID+random suffix or use tempfile to generate a unique
path in /tmp) and update the mkfifo command usage that builds command_lines (the
f"mkfifo {input_file_name} ; bcftools view ..." and the analogous block for
out_call) to use that unique name; also ensure the FIFO is removed/cleaned up
after use.
- Around line 195-200: The bcftools invocation is built using
get_bcftools_opts(snakemake) which pulls --output/--output-type from
snakemake.output[0] and can write to the wrong file; update the call site so
bcftools_opts is built or overridden to target the resolved
out_call/output_file_name (or remove any --output flags produced by
get_bcftools_opts and append "--output-type z -o {output_file_name}" explicitly)
— modify the symbols get_bcftools_opts, bcftools_opts, and the command
construction that uses output_file_name/out_call so the final bcftools command
writes to output_file_name (matching variant_key_name) rather than
snakemake.output[0].
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 22992318-fdab-4f5c-937a-7db10d4dda74
📒 Files selected for processing (1)
snakemake_wrapper_utils/bcftools.py
| if isinstance(variant_key_name, int): | ||
| in_call = snakemake.input[0] | ||
| else: | ||
| in_call = snakemake.input.get(variant_key_name) |
There was a problem hiding this comment.
variant_key_name integer path is ignored (always index 0).
Line 152 and Line 182 always read index 0, so passing variant_key_name=1 (or any other int) silently selects the wrong input/output.
🔧 Proposed fix
- if isinstance(variant_key_name, int):
- in_call = snakemake.input[0]
+ if isinstance(variant_key_name, int):
+ in_call = snakemake.input[variant_key_name]
@@
- if isinstance(variant_key_name, int):
- out_call = snakemake.output[0]
+ if isinstance(variant_key_name, int):
+ out_call = snakemake.output[variant_key_name]Also applies to: 181-184
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@snakemake_wrapper_utils/bcftools.py` around lines 151 - 154, The code
incorrectly treats any integer variant_key_name as selecting snakemake.input[0];
update the logic in the block that sets in_call (and the analogous block that
sets out_call) to index snakemake.input and snakemake.output using the provided
integer variant_key_name (e.g., snakemake.input[variant_key_name] and
snakemake.output[variant_key_name]) instead of hardcoding 0, and add a simple
bounds check to raise a clear error if the provided integer is out of range;
refer to the variables variant_key_name, in_call, out_call, snakemake.input, and
snakemake.output to locate the two affected sections.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
snakemake_wrapper_utils/bcftools.py (3)
151-154:⚠️ Potential issue | 🟠 MajorUse the integer key when resolving the input.
Line 152 still ignores
variant_key_name;variant_key_name=1readssnakemake.input[0]but writessnakemake.output[1], which can pair the wrong input with the requested output.🐛 Proposed fix
if isinstance(variant_key_name, int): - in_call = snakemake.input[0] + in_call = snakemake.input[variant_key_name] else: in_call = snakemake.input.get(variant_key_name)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@snakemake_wrapper_utils/bcftools.py` around lines 151 - 154, The code that resolves the input wrongly uses index 0 when variant_key_name is an int; change the branch so that when variant_key_name is an int you index snakemake.input with that integer (use snakemake.input[variant_key_name]) instead of always using 0, leaving the else branch to use snakemake.input.get(variant_key_name) for non-int keys; update the assignment to in_call accordingly to ensure the input selected matches the output index.
196-196:⚠️ Potential issue | 🟠 MajorWrite compressed output to the resolved variant output, not
output[0].Line 196 calls
get_bcftools_opts(snakemake), which builds--outputand--output-typefromsnakemake.output[0]. Ifvariant_key_namepoints to any other output,bcftoolswrites the converted stream to the wrong file.🐛 Proposed fix
min_threads += 1 # This time, we include threading and output formats - bcftools_opts = get_bcftools_opts(snakemake) + bcftools_opts = get_bcftools_opts( + snakemake, + parse_output=False, + parse_output_format=False, + ) + out_format = infer_out_format( + str(out_call), snakemake.params.get("uncompressed_bcf", False) + ) + bcftools_opts += f" --output {out_call} --output-type {out_format}" command_lines += str(Read-only check:
#!/bin/bash rg -n -C3 'def get_bcftools_opts|--output \{snakemake\.output\[0\]\}|bcftools_opts = get_bcftools_opts\(snakemake\)' snakemake_wrapper_utils/bcftools.py🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@snakemake_wrapper_utils/bcftools.py` at line 196, get_bcftools_opts builds --output/--output-type using snakemake.output[0], so bcftools_opts will always point to output[0] even when the desired resolved variant file is a different key (variant_key_name); change the call site to pass the correct resolved output path (the value at snakemake.output[variant_key_name] or the resolved_variant_path variable) into get_bcftools_opts or add an argument to get_bcftools_opts to accept the target path, and ensure bcftools_opts is constructed from that target rather than snakemake.output[0] so the compressed output is written to the intended variant output file (update references to bcftools_opts and any invocation of get_bcftools_opts accordingly).
163-164:⚠️ Potential issue | 🟠 MajorMake FIFO paths unique per invocation.
Deriving FIFO names from
in_call/out_callstill collides when parallel jobs use the same variant path, and stale FIFOs from failed runs can break the next attempt.🛡️ Proposed fix using unique FIFO names
+import os import sys +import tempfile +from uuid import uuid4 from snakemake_wrapper_utils.snakemake import get_mem, is_arg @@ +def _variant_fifo_path(direction): + return os.path.join( + tempfile.gettempdir(), + f"snakemake_wrapper_utils_{direction}_{uuid4().hex}.vcf", + ) + + def read_write_variants(snakemake, variant_key_name="call"): @@ input_file_name = in_call if str(in_call).endswith((".gz", ".bcf")): - input_file_name = f"{in_call}.snakemake_wrapper_utils_read_variants.vcf" + input_file_name = _variant_fifo_path("read_variants") @@ if str(out_call).endswith((".gz", ".bcf")): - output_file_name = f"{out_call}.snakemake_wrapper_utils_write_variants.vcf" + output_file_name = _variant_fifo_path("write_variants")You can verify the remaining deterministic FIFO construction with:
#!/bin/bash rg -n -C2 'snakemake_wrapper_utils_(read|write)_variants\.vcf|mkfifo' snakemake_wrapper_utils/bcftools.pyAlso applies to: 191-192
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@snakemake_wrapper_utils/bcftools.py` around lines 163 - 164, The FIFO filenames derived from in_call/out_call are deterministic and can collide across parallel jobs or with stale FIFOs; change the construction of input_file_name/output_file_name (the ".snakemake_wrapper_utils_read_variants.vcf" and ".snakemake_wrapper_utils_write_variants.vcf" names) to include a unique per-invocation suffix (e.g. uuid.uuid4() or a combination of os.getpid() and a timestamp) so each mkfifo target is unique, and add existence checks/cleanup before creating the FIFO; apply the same change to the analogous code around the output creation (the lines referenced at 191-192).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@snakemake_wrapper_utils/bcftools.py`:
- Around line 151-154: The code that resolves the input wrongly uses index 0
when variant_key_name is an int; change the branch so that when variant_key_name
is an int you index snakemake.input with that integer (use
snakemake.input[variant_key_name]) instead of always using 0, leaving the else
branch to use snakemake.input.get(variant_key_name) for non-int keys; update the
assignment to in_call accordingly to ensure the input selected matches the
output index.
- Line 196: get_bcftools_opts builds --output/--output-type using
snakemake.output[0], so bcftools_opts will always point to output[0] even when
the desired resolved variant file is a different key (variant_key_name); change
the call site to pass the correct resolved output path (the value at
snakemake.output[variant_key_name] or the resolved_variant_path variable) into
get_bcftools_opts or add an argument to get_bcftools_opts to accept the target
path, and ensure bcftools_opts is constructed from that target rather than
snakemake.output[0] so the compressed output is written to the intended variant
output file (update references to bcftools_opts and any invocation of
get_bcftools_opts accordingly).
- Around line 163-164: The FIFO filenames derived from in_call/out_call are
deterministic and can collide across parallel jobs or with stale FIFOs; change
the construction of input_file_name/output_file_name (the
".snakemake_wrapper_utils_read_variants.vcf" and
".snakemake_wrapper_utils_write_variants.vcf" names) to include a unique
per-invocation suffix (e.g. uuid.uuid4() or a combination of os.getpid() and a
timestamp) so each mkfifo target is unique, and add existence checks/cleanup
before creating the FIFO; apply the same change to the analogous code around the
output creation (the lines referenced at 191-192).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c688ce7a-d9c5-40ca-a201-4f8e6576c087
📒 Files selected for processing (1)
snakemake_wrapper_utils/bcftools.py
| command_lines += str( | ||
| f"mkfifo {input_file_name} ; " | ||
| f"bcftools view {bcftools_opts} {in_call} > {input_file_name} & " | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
{prefix} tool_using_fifos ... ; waitRepository: snakemake/snakemake-wrapper-utils
Length of output: 123
🏁 Script executed:
#!/bin/bash
rg -n -C4 '\bread_write_variants\s*\('
rg -n -C4 '\bwait\b'Repository: snakemake/snakemake-wrapper-utils
Length of output: 702
🏁 Script executed:
# Get the file structure and lines in question
head -220 snakemake_wrapper_utils/bcftools.py | tail -80Repository: snakemake/snakemake-wrapper-utils
Length of output: 2867
🏁 Script executed:
# Find all callers of read_write_variants
rg -n 'read_write_variants' --type pyRepository: snakemake/snakemake-wrapper-utils
Length of output: 177
🏁 Script executed:
# Search in snakemake wrappers repo for usage patterns
git log --oneline --all | head -5Repository: snakemake/snakemake-wrapper-utils
Length of output: 310
🏁 Script executed:
# Search for all references to read_write_variants across the repo
rg 'read_write_variants' -A 10 -B 2Repository: snakemake/snakemake-wrapper-utils
Length of output: 1001
🏁 Script executed:
# Check if this is exported in __init__.py or similar
find . -name "*.py" -type f | xargs grep -l "read_write_variants"Repository: snakemake/snakemake-wrapper-utils
Length of output: 114
🏁 Script executed:
# Check documentation or examples
find . -name "README*" -o -name "*.md" | xargs grep -l "read_write_variants" 2>/dev/null || echo "No markdown found"Repository: snakemake/snakemake-wrapper-utils
Length of output: 94
🏁 Script executed:
# Check if this is published/used in snakemake-wrappers
curl -s "https://raw.githubusercontent.com/snakemake/snakemake-wrappers/master/bio/bcftools/index/wrapper.smk" | head -50 || echo "Could not fetch"Repository: snakemake/snakemake-wrapper-utils
Length of output: 91
🏁 Script executed:
# Look at the complete function to understand the intended usage pattern
sed -n '149,210p' snakemake_wrapper_utils/bcftools.pyRepository: snakemake/snakemake-wrapper-utils
Length of output: 2221
🏁 Script executed:
# Check if there's any documentation on how command_lines should be used
cat snakemake_wrapper_utils/bcftools.py | head -30Repository: snakemake/snakemake-wrapper-utils
Length of output: 849
🏁 Script executed:
# Look for test files or examples
find . -type f -name "*test*" -o -name "*example*" | head -20Repository: snakemake/snakemake-wrapper-utils
Length of output: 146
🏁 Script executed:
# Check git history for usage context
git log --all --oneline -- snakemake_wrapper_utils/bcftools.py | head -10Repository: snakemake/snakemake-wrapper-utils
Length of output: 812
🏁 Script executed:
# Search in the entire git history for references to read_write_variants
git log -p --all -S "read_write_variants" | head -200Repository: snakemake/snakemake-wrapper-utils
Length of output: 59
🏁 Script executed:
# Check __init__.py to see what's exported
cat snakemake_wrapper_utils/__init__.py 2>/dev/null || echo "No __init__.py or empty"Repository: snakemake/snakemake-wrapper-utils
Length of output: 98
🏁 Script executed:
# Check the test file for usage patterns
cat tests/test_snakemake_wrapper_utils.pyRepository: snakemake/snakemake-wrapper-utils
Length of output: 180
🏁 Script executed:
# Check if there are any examples in comments or docstrings
rg -A 20 "read_write_variants" snakemake_wrapper_utils/bcftools.pyRepository: snakemake/snakemake-wrapper-utils
Length of output: 843
🏁 Script executed:
# Look at the git commit that introduced read_write_variants
git show c6f0fa7 -- snakemake_wrapper_utils/bcftools.py | head -100Repository: snakemake/snakemake-wrapper-utils
Length of output: 2991
🏁 Script executed:
# Check recent commits to understand context better
git log --oneline -20Repository: snakemake/snakemake-wrapper-utils
Length of output: 103
🏁 Script executed:
# See if there's any documentation or README
cat README.md 2>/dev/null || echo "No README"Repository: snakemake/snakemake-wrapper-utils
Length of output: 367
🏁 Script executed:
# Look at the full context of how command_lines is assembled
sed -n '149,220p' snakemake_wrapper_utils/bcftools.py | cat -nRepository: snakemake/snakemake-wrapper-utils
Length of output: 2627
🏁 Script executed:
# Check the complete docstring and surrounding context
sed -n '149,160p' snakemake_wrapper_utils/bcftools.pyRepository: snakemake/snakemake-wrapper-utils
Length of output: 532
🏁 Script executed:
# Search for any usage documentation or patterns in comments
rg -B 5 -A 30 "def read_write_variants" snakemake_wrapper_utils/bcftools.pyRepository: snakemake/snakemake-wrapper-utils
Length of output: 1278
Background bcftools processes need explicit wait to propagate failures.
The function returns a command string with backgrounded bcftools processes (&) but doesn't enforce or document that callers must append wait to detect conversion failures. If a caller's wrapper script exits before waiting for background processes, failed conversions will be masked while the foreground command succeeds, risking silent data corruption.
Either:
- Return a separate teardown string (e.g.,
"; wait") that must be appended to the command, or - Expand the docstring to explicitly require that callers append
waitto the returned prefix
This applies to lines 176-179, 197-199, and the return statement at line 206.
This is a draft for #55
Tools like
picarddon't handle writing tostderr, so I chose to usemkfifoinstead of pipes.Does this mean
bcftoolswould have to be included insnakemake-wrappers-utils?Usage within wrappers would look like:
Usage within Snakemake would be something like:
Do you think it's worth testing?
Summary by CodeRabbit
New Features
Bug Fixes