There seems to be a race condition when:
- TX or joining is requested, but no airtime is available.
engineUpdate() then schedules to run itself again when airtime is available. In my tests, I added a packet to TX, which autostarted joining, but I believe this can also happen after joining.
- Another TX is requested around the time airtime becomes available, before the scheduled job is run.
engineUpdate for this TX request sees that airtime is available now, so it starts TX. As part of that, LMIC.osjob.func is set to jreqDone or updataDone.
- While the transceiver is starting TX, the scheduler runs the job scheduled under 1. above. However, because 3. changed the function, instead of
engineUpdate, it runs the tx complete handler.
- The tx complete handler sets up an RX window, but because the irq handler did not run,
LMIC.txend was not updated, so the RX setup might run right away (not entirely sure if this always happens or perhaps depend on timing, or the OP_NEXTCHNL flag?). If it runs immediately, TX is still ongoing, so RX setup asserts expecting to find a sleeping transceiver.
The cause of this problem is probably the reuse of LMIC.osjob. It is used for multiple jobs, but since scheduling a new job clears any previous uses of the same job, that should probably be ok (though it's hard to guarantee this will never "unschedule" any important jobs). However, since LMIC.osjob.func is also reused to store just a function pointer to be scheduled by the irq handler, this causes problems. A possible check could be to clear the job whenever changing LMIC.osjob.func (since it will definitely cause issues when the job is still scheduled), but perhaps the logic should be improved in other ways to prevent these issues?
There seems to be a race condition when:
engineUpdate()then schedules to run itself again when airtime is available. In my tests, I added a packet to TX, which autostarted joining, but I believe this can also happen after joining.engineUpdatefor this TX request sees that airtime is available now, so it starts TX. As part of that,LMIC.osjob.funcis set tojreqDoneorupdataDone.engineUpdate, it runs the tx complete handler.LMIC.txendwas not updated, so the RX setup might run right away (not entirely sure if this always happens or perhaps depend on timing, or theOP_NEXTCHNLflag?). If it runs immediately, TX is still ongoing, so RX setup asserts expecting to find a sleeping transceiver.The cause of this problem is probably the reuse of
LMIC.osjob. It is used for multiple jobs, but since scheduling a new job clears any previous uses of the same job, that should probably be ok (though it's hard to guarantee this will never "unschedule" any important jobs). However, sinceLMIC.osjob.funcis also reused to store just a function pointer to be scheduled by the irq handler, this causes problems. A possible check could be to clear the job whenever changingLMIC.osjob.func(since it will definitely cause issues when the job is still scheduled), but perhaps the logic should be improved in other ways to prevent these issues?