Skip to content

Inverting the simulated genomes #25

@SimiliSerpent

Description

@SimiliSerpent

Hi Rory,

I am simulating SARS-CoV-2 diluted in a bacterial environment. My configuration file looks as follows :

output_path = "$SIM_DIR/pod5_files"
target_yield = $TARGET_YIELD
pore_type = "R10"
nucleotide_type = "DNA"

[parameters]
sample_name = "test"
experiment_name = "sim_$SIMULATION_ID"
flowcell_name = "FAQ1234"
experiment_duration_set = 10240000
device_id = "Bantersaurus"
position = "FenceSitter"
sample_rate = 5000

[[sample]]
name = "NC_045512"
input_genome = "$SIM_DIR/ref/${SIM_VIRUS_REF}.fasta"
mean_read_length = $SIM_VIRUS_LEN
weight = $SIM_VIRUS_W
amplicon = false

[[sample]]
name = "U00096_3"
input_genome = "$SIM_DIR/ref/${SIM_NOISE_REF}.fasta"
mean_read_length = $SIM_NOISE_LEN
weight = $SIM_NOISE_W
amplicon = false

For instance, let's say w = 1 for virus and w = 150 for bacteria. However, sometimes the weights for virus and bacteria are inverted by Icarust. I see it because I selectively filter out all DNA different from the COVID19 DNA with Readfish; sometimes, almost no reads are filtered. I check after the run, and indeed find that Icarust only generated 1/151 bacterial reads and 150/151 viral reads.

If I restart the simulation without changing anything, everything works fine! So it is not that big a deal (I have to monitor the start of each simulation, and restart if necessary). But it is a bit worrying and definitely an unexpected behavior. It happens randomly, and I witnessed the issue in different simulation environment (different lab clusters).

Do you have any clues why that is? Does chance intervene at some point in the choice of the weights?

I hope you are doing well and thank you for your help.
Sincerely
Ben

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions