fix: resolve 5 known bugs (#50, #51, #52, #53, #54) by LeoBuron · Pull Request #56 · es-ude/OnDeviceTraining

LeoBuron · 2026-04-05T08:07:37Z

Summary

Build broken: HE_NORMAL/HE_UNIFORM enum names don't match Distributions.h #50: Refactor tensorInitWithDistribution to call Distributions.c functions instead of inline math — fixes wrong enum names, wrong formulas for Xavier normal / Kaiming uniform, randomNormal/randomUniform mixup, and memset ONES bug
linearLayerInitNonTrainable writes through NULL pointer (UB) #51: Fix NULL pointer dereference in linearLayerInitNonTrainable by using parameterInit()
DLEVEL macro redefined warning in Common.h #52: Restructure DLEVEL macro in Common.h to #if/#elif/#else chain, eliminating -Wmacro-redefined warnings
Softmax backward: O(n²) VLA on stack overflows on MCUs #53: Replace O(n²) VLA Jacobian in Softmax backward with O(n) dot-product algorithm
Two separate RNGs make training non-reproducible #54: Unify RNG under XorShift32, replacing all rand()/srand() with rngNextFloat()/rngSetSeed(); migrate rngShuffleIndices to global RNG

Distribution test range bounds widened from 4.5σ to 5σ: for n=10000 samples the expected max z-score is ~4.29σ, the Box-Muller theoretical max is ~5.68σ. 5σ sits between both, with ~0.06% false-positive rate.

New tests: UnitTestRNG (4 tests), testLinearLayerInitNonTrainable.

Test plan

All 26 unit tests pass (ctest --preset unit_test)
Build compiles without warnings on unit_test and unit_test_debug presets
linearLayerInitNonTrainable no longer segfaults
Softmax backward produces identical results with O(n) algorithm

🤖 Generated with Claude Code

HerrCooker · 2026-04-07T10:09:43Z

+    TEST_ASSERT_EQUAL(LINEAR, layer->type);
+
+    linearConfig_t *config = layer->config->linear;
+    TEST_ASSERT_NOT_NULL(config->weights);
+    TEST_ASSERT_NOT_NULL(config->bias);
+    TEST_ASSERT_EQUAL_PTR(weights, config->weights->param);
+    TEST_ASSERT_NULL(config->weights->grad);
+    TEST_ASSERT_EQUAL_PTR(bias, config->bias->param);
+    TEST_ASSERT_NULL(config->bias->grad);


I would try to reduce the number of assertions here

- #50: Refactor tensorInitWithDistribution to use Distributions.c functions, fixing wrong enum names (HE_NORMAL/HE_UNIFORM), wrong formulas for Xavier normal and Kaiming uniform, randomNormal/randomUniform mixup, and memset ONES bug - #51: Fix NULL pointer dereference in linearLayerInitNonTrainable by using parameterInit() to properly allocate parameter_t structs - #52: Restructure DLEVEL macro in Common.h to #if/#elif/#else chain, eliminating -Wmacro-redefined warnings - #53: Replace O(n^2) VLA Jacobian in Softmax backward with O(n) dot-product algorithm — same math, no stack allocation - #54: Unify RNG under XorShift32, replacing all rand()/srand() calls with rngNextFloat()/rngSetSeed(); add rngNextFloat to RNG module; migrate rngShuffleIndices to use global RNG state Distribution test range bounds widened from 4.5σ to 5σ: The old 3σ×1.5=4.5σ bound was fragile — for n=10000 normal samples the expected max z-score is ~4.29σ (√(2·ln(n))), giving ~6.6% false-failure rate per test. The Box-Muller transform with the u1≥1e-7 guard has a theoretical z_max of ~5.68σ. A 5σ bound sits above the expected max (4.29) but below the theoretical max (5.68), catching real distribution bugs with a false-positive rate of ~0.06%. Fix XorShift32 zero-state bug: initialize global RNG state to 1 (XorShift with state 0 is absorbing — always outputs 0). Fix UnitTestMSE buffer overflow: SymInt32 buffers were sized as numberOfElements bytes instead of numberOfElements * sizeof(int32_t), causing stack corruption. New tests: UnitTestRNG (4 tests), testLinearLayerInitNonTrainable

LeoBuron requested a review from HerrCooker April 5, 2026 08:08

Base automatically changed from ci-pipeline to main April 5, 2026 15:43

LeoBuron force-pushed the fix-known-bugs branch 2 times, most recently from 27839fb to 43fb95a Compare April 5, 2026 15:48

HerrCooker approved these changes Apr 7, 2026

View reviewed changes

LeoBuron force-pushed the fix-known-bugs branch from 43fb95a to a6f21e7 Compare April 9, 2026 05:39

LeoBuron force-pushed the fix-known-bugs branch from a6f21e7 to 3d2e651 Compare April 9, 2026 08:22

LeoBuron self-assigned this Apr 9, 2026

LeoBuron merged commit 3d2e651 into main Apr 9, 2026
1 check passed

LeoBuron deleted the fix-known-bugs branch April 9, 2026 08:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve 5 known bugs (#50, #51, #52, #53, #54)#56

fix: resolve 5 known bugs (#50, #51, #52, #53, #54)#56
LeoBuron merged 1 commit intomainfrom
fix-known-bugs

LeoBuron commented Apr 5, 2026 •

edited

Loading

Uh oh!

HerrCooker Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LeoBuron commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

HerrCooker Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LeoBuron commented Apr 5, 2026 •

edited

Loading