Skip to content

Usage feedback #47

@RagnarGrootKoerkamp

Description

@RagnarGrootKoerkamp

So I'm trying to use AWRY again to include it in an FM-index benchmark, but I must say that it's rather annoying to use and I'm close to giving up... My user journey for trying it on a human genome.

  • The input must be from disk, so I have to write my data to a file first.
  • Actually, it must be a fasta file, so I have to prepend it with >\n.
  • Ah but it must be ASCII, not u8 integer values 0/1/2/3. (ok that's on me though)
  • Ah but libsufr writes in $TMPDIR which is /tmp which is tmpfs/RAM for me, and libsufr needs more than the 32GB available there.
  • Ok I create a local directory tmp and put export TMPDIR=tmp, but now this causes compile time errors because rust tries creating sufr/libsufr/tmp/... and AWRY/tmp/... instead of my_crate/tmp/....
  • So I build without the env var, and run with the env var.
  • It then uses ~40GB disk, and 17GB RAM for the most time. That's fine.
  • Then, after 11 minutes, I think after suffix array construction is done, RAM suddenly spikes to 32 GB and it OOMs.
  • I do have 64 GB of RAM, but it turns out 32GB of that is taken by temporary files that are still lingering in /tmp (so add clarification to github actions to only test on main branch. #6 is still not really fixed apparently?).
  • So now I'm going to have to wait 11min again for a new attempt at construction, which may or may not end up failing.

In fact, I already have the BWT on disk, but I don't think there's a way for me to directly give that to AWRY?
That would have saved me a lot of pain here. It's also not quite clear if this option to keep/remove the temporary suffix array actually would reuse it between runs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions