Skip to content

example of user Issue in when running in parallel - error in one job affects all jobs on that core #185

@irc47

Description

@irc47

Hi!
This is not a bug with the radEmu code, but it's an example of a way that a user error can cause a confusing outcome which I thought might be useful for others.

I'm using the instructions from the Parallelizing computation for score tests with radEmu Vignette and in general I'm impressed with how easy it was to get this working.

I've encountered the following issue:

Warning: scheduled cores 21, 20 encountered errors in user code, all values of the jobs will be affected

This seems to mean that all values of the jobs run on those particular cores will be affected, and I think is the same as the issue described here: https://stat.ethz.ch/pipermail/r-sig-hpc/2019-September/002100.html . Indeed, if I run on n cores then the errors show up every n outputs.

It turns out that I caused this problem by putting in taxa indices that did not exist in the data (i.e. using j = 1:1002 when there were only 1000 taxa). Because the last 2 indices failed, when running on 40 cores every 39th and 40th job also failed.

It would have been nice to have seen this example when I was troubleshooting, so I'm just putting it here as a comment, I don't think any action is needed.

--Ilana

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions