fix(#69): fix tensorInitWithDistribution buffer overflow and add rank…#73
Merged
fix(#69): fix tensorInitWithDistribution buffer overflow and add rank…#73
Conversation
e0070b0 to
73ff6a8
Compare
HerrCooker
approved these changes
Apr 14, 2026
e4f403a to
09f7ba9
Compare
… check to doDimensionsMatch tensorInitWithDistribution computed the element count by summing dimensions instead of multiplying them (+= instead of *=, init 0 instead of 1). This caused two simultaneous failures: 1. Buffer overflow for bias tensors with shape [1, N]: sum = 1+N > product = N, so memset wrote one float past the end of the caller's data buffer. 2. Under-initialization for weight tensors with shape [M, N] where M,N > 1: sum = M+N << product = M*N, so only a fraction of values received their intended distribution (e.g. Xavier), with the rest staying at zero. The overflow alone would not explain the crash, but this codebase uses zero-copy shape storage: initTensorWithQFloat calls setShape(shape, dims, ...) which stores shape->dimensions = dims — a raw pointer into the caller's stack frame, not a deep copy. This is a deliberate design choice to minimize heap allocations on MCU targets where memory is scarce. The consequence is that the 4-byte memset overflow from bias0Data into the adjacent bias0Dims array on main()'s stack corrupted dims[0] from 1 to 0. Because the tensor's shape points directly at that array, the bias permanently read shape [0, N] instead of [1, N]. The forward pass then hit doDimensionsMatch comparing output [1, N] against bias [0, N], producing the "Dim 1: 0, Dims 2: 20" error from the issue. Additionally, doDimensionsMatch did not verify that both tensors have the same rank (numberOfDimensions). When ranks differed, the loop iterated using the first tensor's rank and read out of bounds from the second tensor's VLA. This is now a hard error (exit) rather than silent undefined behavior. Also: - MnistExperiment: extracted model construction into buildModel(), added CSV header row. Data arrays are declared static so they outlive the function — required by the zero-copy shape/data pointer design. Tests added: - UnitTestTensorApi: 4 tests verifying tensorInitWithDistribution initializes the correct number of elements (product, not sum) for ZEROS, ONES, NORMAL - UnitTestArithmetic: 2 tests for doDimensionsMatch (same shape, different dims) - UnitTestMultiLayerTraining: 3 integration tests exercising a 4-layer model (Linear→ReLU→Linear→Softmax) with CrossEntropy loss — the exact architecture that crashed in MnistExperiment
09f7ba9 to
b507b44
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
… check to doDimensionsMatch
tensorInitWithDistribution computed the element count by summing dimensions instead of multiplying them (+= instead of *=, init 0 instead of 1). This caused two simultaneous failures:
Buffer overflow for bias tensors with shape [1, N]: sum = 1+N > product = N, so memset wrote one float past the end of the caller's data buffer.
Under-initialization for weight tensors with shape [M, N] where M,N > 1: sum = M+N << product = M*N, so only a fraction of values received their intended distribution (e.g. Xavier), with the rest staying at zero.
The overflow alone would not explain the crash, but this codebase uses zero-copy shape storage: initTensorWithQFloat calls setShape(shape, dims, ...) which stores shape->dimensions = dims — a raw pointer into the caller's stack frame, not a deep copy. This is a deliberate design choice to minimize heap allocations on MCU targets where memory is scarce.
The consequence is that the 4-byte memset overflow from bias0Data into the adjacent bias0Dims array on main()'s stack corrupted dims[0] from 1 to 0. Because the tensor's shape points directly at that array, the bias permanently read shape [0, N] instead of [1, N]. The forward pass then hit doDimensionsMatch comparing output [1, N] against bias [0, N], producing the "Dim 1: 0, Dims 2: 20" error from the issue.
Additionally, doDimensionsMatch did not verify that both tensors have the same rank (numberOfDimensions). When ranks differed, the loop iterated using the first tensor's rank and read out of bounds from the second tensor's VLA. This is now a hard error (exit) rather than silent undefined behavior.
Also:
Tests added: