Skip to content

Changes that should not cause crash, but do.#525

Open
danpovey wants to merge 1 commit into
k2-fsa:masterfrom
danpovey:show_crash
Open

Changes that should not cause crash, but do.#525
danpovey wants to merge 1 commit into
k2-fsa:masterfrom
danpovey:show_crash

Conversation

@danpovey

Copy link
Copy Markdown
Collaborator

No description provided.

@danpovey

danpovey commented Dec 18, 2020

Copy link
Copy Markdown
Collaborator Author

When I make this change in the code, when running build/bin/cu_intersect_test I get a crash on this line:

K2_CHECK_LT(backward_loglike, -src_state_forward_loglike + 2.0);

with output like the following. I have discovered by printing stuff out that it's due to atomicMax() not working. It appears that the compiler is somehow picking up the __host__ version of atomicMax() that I have declared, instead of the CUDA one.
I am using the CUDA toolkit version 10.1.

I am creating this pull request to demonstrate the issue to the NVidia guys (I think it is a compiler problem).

[ -100.39 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf ]
] }
[F] [F] [F] [F] /ceph-dan/k2/k2/csrc/intersect_pruned.cu:lambda [](signed int)->void::operator()(signed int)->void:1045 /ceph-dan/k2/k2/csrc/intersect_pruned.cu:lambda [](signed int)->void::op\
erator()(signed int)->void:1045 /ceph-dan/k2/k2/csrc/intersect_pruned.cu:lambda [](signed int)->void::operator()(signed int)->void:1045 /ceph-dan/k2/k2/csrc/intersect_pruned.cu:lambda [](signe\
d int)->void::operator()(signed int)->void:1045 block:[0,0,0], thread: [8,0,0] block:[0,0,0], thread: [42,0,0] block:[0,0,0], thread: [43,0,0] block:[0,0,0], thread: [61,0,0] Check failed: Che\
ck failed: Check failed: Check failed: backward_loglikebackward_loglikebackward_loglikebackward_loglike    <<<<    -src_state_forward_loglike + 2.0-src_state_forward_loglike + 2.0-src_state_fo\
rward_loglike + 2.0-src_state_forward_loglike + 2.0 ( ( ( (220.366562149.052094158.026581159.356491 vs.  vs.  vs.  vs. 220.320801147.468399147.468399155.392899) ) ) )

@luitjens

Copy link
Copy Markdown

Have you tried with Cuda 11.1 to see if the issue persists?

@zhu-han

zhu-han commented Dec 29, 2020

Copy link
Copy Markdown

"build/bin/cu_intersect_test" gets a crash with CUDA 11.1 . The output of "cu_intersect_test" is like the following:
cu_intersect_test.log

@danpovey

Copy link
Copy Markdown
Collaborator Author

Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants