Skip to content

Comments

POC FS LAYOUT#266

Open
Adit2607 wants to merge 3 commits intodevelopfrom
poc/fs_layout
Open

POC FS LAYOUT#266
Adit2607 wants to merge 3 commits intodevelopfrom
poc/fs_layout

Conversation

@Adit2607
Copy link
Contributor

@Adit2607 Adit2607 commented Jan 7, 2026

🔁 Pull Request Template – BharatMLStack

Please fill out the following sections to help us review your changes efficiently.


📌 Summary

e.g., Adds optimizes Redis fetch latency in online-feature-store, or improves search UI responsiveness in trufflebox-ui.


📂 Modules Affected

  • horizon (Real-time systems / networking)
  • online-feature-store (Feature serving infra)
  • trufflebox-ui (Admin panel / UI)
  • infra (Docker, CI/CD, GCP/AWS setup)
  • docs (Documentation updates)
  • Other: ___________

✅ Type of Change

  • Feature addition
  • Bug fix
  • Infra / build system change
  • Performance improvement
  • Refactor
  • Documentation
  • Other: ___________

📊 Benchmark / Metrics (if applicable)

Summary by CodeRabbit

  • New Features

    • Added Layout2 feature storage format with bitmap optimization for improved compression in sparse data scenarios.
    • Integrated default value support for more efficient feature handling.
  • Tests

    • Added comprehensive compression comparison tests validating Layout2 advantages across multiple data types and sparsity levels.
  • Documentation

    • Added detailed test results report documenting Layout2 vs. Layout1 performance, trade-offs, and migration guidance.

@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

Walkthrough

This pull request introduces Layout2, a bitmap-optimized feature-vector compression format, alongside Layout1. Changes span deserialization, serialization, storage, and feature retrieval to support bitmap-based sparse data optimization, with comprehensive testing and documentation.

Changes

Cohort / File(s) Summary
Layout2 Core Implementation
online-feature-store/internal/data/blocks/deserialized_psdb_v2.go, online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go
Introduced bitmap support for layout-2: added BitmapMeta field, deserializePSDBForLayout2 function, bitmap validation, and helper countSetBitsBefore. Updated GetNumericScalarFeature to accept numFeatures and defaultValue, returning default when bitmap marks feature as default. Builder methods SetBitmap and SetupBitmapMeta added for bitmap composition during serialization.
Test Suite & Validation
online-feature-store/internal/data/blocks/layout_comparison_test.go, online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go, online-feature-store/internal/data/blocks/deserialized_psdb_v2_test.go
New comprehensive layout comparison tests (TestLayout1VsLayout2Compression, TestLayout2BitmapOptimization) evaluating compression across sparsity, data types, and compression strategies. Existing tests updated to pass new GetNumericScalarFeature signature (pos, numFeatures, defaultValue).
Documentation & Reports
LAYOUT_TEST_RESULTS.md, online-feature-store/internal/data/blocks/layout_comparison_results.txt
Added detailed markdown and ASCII reports documenting Layout2 vs Layout1 head-to-head comparison, test results across multiple scenarios, key findings, trade-offs, production implications, and migration guidance.
Feature Retrieval & Persistence
online-feature-store/internal/handler/feature/retrieve.go, online-feature-store/internal/handler/feature/persist.go
Extended GetFeature signature to accept defaultValue parameter, propagating it to GetNumericScalarFeature calls. Updated BuildPSDBBlock to accept featureBitmap parameter, passing it to SetBitmap on the builder.
System-Level Parsing & Accessors
online-feature-store/internal/system/system.go
All feature value parsers and getters (ParseFeatureValue, GetInt32, GetFP32, GetStringVector, etc.) updated to return (data, []byte, error) tuple with bitmap payload. Updated 15\+ method signatures to support bitmap/default-tracking alongside feature data.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant PersistHandler as Persist Handler
    participant System
    participant PermStorageDataBlock
    participant PSDB as DeserializedPSDB

    Client->>PersistHandler: preparePersistData with feature values
    PersistHandler->>System: ParseFeatureValue(featureLabels, features, dataType, featureMeta)
    System-->>PersistHandler: (featureData, featureBitmap, error)
    PersistHandler->>PermStorageDataBlock: BuildPSDBBlock(data, featureBitmap, config)
    PermStorageDataBlock->>PermStorageDataBlock: SetBitmap(featureBitmap)
    PermStorageDataBlock->>PermStorageDataBlock: SetupBitmapMeta(numFeatures)
    PermStorageDataBlock->>PermStorageDataBlock: Serialize (layout-2 path)
    PermStorageDataBlock-->>PersistHandler: serialized PSDB with bitmap
    
    Note over PermStorageDataBlock: If layout==2:<br/>- Embed bitmap in payload<br/>- Set bitmapMeta header
    
    Client->>PersistHandler: RetrieveFeature(seq)
    PersistHandler->>System: GetDefaultValueByte(...)
    System-->>PersistHandler: defaultValue
    PersistHandler->>PSDB: GetFeature(dataType, seq, numFeatures, ..., defaultValue)
    PSDB->>PSDB: GetNumericScalarFeature(seq, numFeatures, defaultValue)
    alt Bitmap indicates default
        PSDB-->>PersistHandler: defaultValue
    else Bitmap indicates stored value
        PSDB->>PSDB: countSetBitsBefore(bitmap, seq, numFeatures)
        PSDB->>PSDB: extract from dense payload at calculated index
        PSDB-->>PersistHandler: feature value
    end
Loading
🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Krd Checker ⚠️ Warning PR description contains only template without KRD link or valid KRD exemption (missing Pod Type H0/H1/H2 or insufficient justification). Add KRD link (KRD: https://docs.google.com/document/...) or KRD exemption with Pod Type and 20+ character justification.
✅ Passed checks (1 passed)
Check name Status Explanation
Dynamic Configuration Validation ✅ Passed No modifications to application-dyn-*.yml files found; PR changes limited to Go source files, Markdown, and text files in online-feature-store module.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (2)
online-feature-store/internal/handler/feature/retrieve.go (1)

883-889: Consider caching default values instead of per-feature lookups.

GetDefaultValueByte is now called for every feature in the hot path. If config access isn’t O(1), this could add latency. Consider caching defaults per FG/version (or reuse values preloaded in preProcessForKeys) and pass them into GetFeature.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/handler/feature/retrieve.go` around lines 883 -
889, GetDefaultValueByte is being called for each feature in the hot path
(h.config.GetDefaultValueByte in the loop) which can be expensive; change the
code to cache default values per feature-group/version (keyed by fgId and
version) — populate that cache once (preferably in preProcessForKeys or at the
start of the retrieval flow) and then look up the defaultValue from the cache
inside the loop instead of calling GetDefaultValueByte repeatedly, and update
the call to GetFeature to accept the cached defaultValue so you no longer call
h.config.GetDefaultValueByte for every featureLabel.
online-feature-store/internal/data/blocks/layout_comparison_test.go (1)

303-446: Results file should go to a temp dir or be opt‑in.

Tests always write layout_comparison_results.txt to the repo root, which can pollute CI or fail in read‑only environments. Consider t.TempDir() or gating behind an env flag.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go` around
lines 303 - 446, The generateResultsFile function always writes
layout_comparison_results.txt to the repo root which can pollute CI or fail in
read‑only environments; change generateResultsFile to accept an output path
(e.g., outputDir string) or honor an env flag (e.g., LAYOUT_RESULTS_DIR or
LAYOUT_RESULTS_ENABLED) and, in tests, use t.TempDir() to pass a temp directory
(or skip writing when the flag is off); update callers/tests that invoke
generateResultsFile to provide the temp dir or check the flag so file creation
is opt‑in and not written to the repo root.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@LAYOUT_TEST_RESULTS.md`:
- Around line 11-22: The table in LAYOUT_TEST_RESULTS.md has inconsistent
"Improvement" percentages compared to layout_comparison_results.txt; regenerate
the results so both files come from the same successful test run and match
exactly: rerun the layout comparison test suite, produce fresh outputs, and
update LAYOUT_TEST_RESULTS.md entries (e.g., rows "High sparsity | 500 | 80%",
"Very high sparsity | 850 | 95%", "Low sparsity | 1000 | 23%", "Medium sparsity
| 100 | 50%", "Low sparsity | 200 | 20%", "FP16 high sparsity | 500 | 70%") to
the values generated by layout_comparison_results.txt (or vice versa) ensuring
date stamps remain identical and no manual edits alter the numeric percentages.
- Line 131: The summary line stating "10 different scenarios" is inconsistent
with the compressed-size table (9 rows) and layout_comparison_results.txt (13
scenarios); pick the authoritative source used by the test suite (preferably
layout_comparison_results.txt if it’s the executed output) and update the
phrasing in LAYOUT_TEST_RESULTS.md (replace the "10 different scenarios" text)
to match that authoritative scenario count, or alternatively update the
compressed-size table to include the missing scenarios so all three sources (the
"compressed-size" table, the "10 different scenarios" sentence, and
layout_comparison_results.txt) consistently report the same number.

In
`@online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go`:
- Line 67: Tests call ddb.GetNumericScalarFeature(i, 3, []byte{0,0,0}) with
hardcoded numFeatures and default bytes which are wrong; replace the placeholder
3 and []byte{0,0,0} with the actual feature count and a default-sized slice
derived from the feature's dataType.Size(). Locate calls to
GetNumericScalarFeature (and similar numeric scalar lookup helpers) and pass the
real feature count variable used in the test and construct the default value as
make([]byte, dataType.Size()) or equivalent so bitmap-path and larger numeric
types get correct-sized defaults.

In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2_test.go`:
- Line 394: Calls to d.GetNumericScalarFeature are using a hardcoded
numFeatures=3 and a 3‑byte default which is incorrect; compute the real feature
count from the test's values slice (e.g. numFeatures := len(values)) and create
defaultValue with the proper byte length from the feature data type (e.g.
defaultValue := make([]byte, dataType.Size())). Then pass those into
GetNumericScalarFeature(pos, numFeatures, defaultValue). Update every occurrence
(the call sites that currently pass 3 and a 3‑byte default) to use the
corresponding values length and dataType.Size() for the default.

In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2.go`:
- Around line 46-51: Tests that assert an "invalid layout version" error must be
updated because layoutVersion 2 is now handled by deserializePSDBForLayout2;
find the switch on layoutVersion in deserialized_psdb_v2.go (cases for 1 and 2
call deserializePSDBForLayout1/deserializePSDBForLayout2) and update any unit
tests that previously used version 2 as the invalid value to use a truly
unsupported value (e.g., 3) so they exercise the default error path instead of
the new layout-2 handler.
- Around line 19-20: The DeserializedPSDB.Copy() implementation currently omits
the new BitmapMeta flag (and any bitmap-related state), causing cloned instances
to lose bitmap metadata; update the Copy() method on DeserializedPSDB to copy
BitmapMeta from the source into the destination and also deep-copy any
bitmap-related fields (e.g., bitmap slices, pointers or flags that indicate
presence/layout) so cloned objects preserve bitmap presence and layout-2 payload
semantics.
- Around line 329-388: The bitmap branch in
DeserializedPSDB.GetNumericScalarFeature fails to validate pos, allowing
negative or >= numFeatures values to panic or misbehave; add explicit bounds
checks near the top of the bitmap path (before computing byteIdx/bitIdx or
indexing bitmap) to return an error (e.g., "position out of bounds") if pos < 0
or pos >= numFeatures, and then proceed with the existing
bitmapSize/byteIdx/bitIdx logic so countSetBitsBefore and bitmap indexing cannot
be invoked with an invalid pos.

In `@online-feature-store/internal/data/blocks/layout_comparison_results.txt`:
- Around line 1-337: The committed static snapshot layout_comparison_results.txt
is stale because the Go test suite for internal/data/blocks is failing (exit
code 1); fix by running and resolving the failing tests (go test
./internal/data/blocks), then regenerate the snapshot file (overwrite
layout_comparison_results.txt with the new output) or remove and recreate it as
a CI artifact only after tests pass, and include the regenerated file in the
same commit; ensure the commit message references the fixed tests and
regenerated snapshot so CI can validate the updated numbers.

In `@online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go`:
- Around line 268-305: The loop uses p.bitmap[i/8] without bounds checking which
can panic if the bitmap is too short; before the loop in the p.layoutVersion ==
2 && len(p.bitmap) > 0 branch compute needed := (len(values)+7)/8 and validate
len(p.bitmap) >= needed, and if not return a clear error (e.g.,
fmt.Errorf("invalid bitmap length: need %d bytes", needed)); add this guard so
indexing p.bitmap[i/8] is always safe.
- Around line 124-130: Serialize() incorrectly routes layoutVersion==2 through
serializeLayout1(), causing non-FP32 serializers to omit the layout‑2
header/bitmapMeta and corrupt deserialization; update Serialize() to handle case
2 by calling the layout‑2 serializer(s) (e.g., call serializeFP32AndLessV2 for
FP32 paths and implement or route other types to a serializeLayout2 variant that
appends bitmapMeta) or explicitly return an error for unsupported types; ensure
serializeFP32AndLessV2’s bitmapMeta format is applied consistently for all
layoutVersion==2 code paths and update any serializer functions referenced
(serializeLayout1, serializeFP32AndLessV2, or new serializeLayout2) so the
deserializer sees the header byte for every layout‑2 block.

---

Nitpick comments:
In `@online-feature-store/internal/data/blocks/layout_comparison_test.go`:
- Around line 303-446: The generateResultsFile function always writes
layout_comparison_results.txt to the repo root which can pollute CI or fail in
read‑only environments; change generateResultsFile to accept an output path
(e.g., outputDir string) or honor an env flag (e.g., LAYOUT_RESULTS_DIR or
LAYOUT_RESULTS_ENABLED) and, in tests, use t.TempDir() to pass a temp directory
(or skip writing when the flag is off); update callers/tests that invoke
generateResultsFile to provide the temp dir or check the flag so file creation
is opt‑in and not written to the repo root.

In `@online-feature-store/internal/handler/feature/retrieve.go`:
- Around line 883-889: GetDefaultValueByte is being called for each feature in
the hot path (h.config.GetDefaultValueByte in the loop) which can be expensive;
change the code to cache default values per feature-group/version (keyed by fgId
and version) — populate that cache once (preferably in preProcessForKeys or at
the start of the retrieval flow) and then look up the defaultValue from the
cache inside the loop instead of calling GetDefaultValueByte repeatedly, and
update the call to GetFeature to accept the cached defaultValue so you no longer
call h.config.GetDefaultValueByte for every featureLabel.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11efebf and af24654.

📒 Files selected for processing (10)
  • LAYOUT_TEST_RESULTS.md
  • online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go
  • online-feature-store/internal/data/blocks/deserialized_psdb_v2.go
  • online-feature-store/internal/data/blocks/deserialized_psdb_v2_test.go
  • online-feature-store/internal/data/blocks/layout_comparison_results.txt
  • online-feature-store/internal/data/blocks/layout_comparison_test.go
  • online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go
  • online-feature-store/internal/handler/feature/persist.go
  • online-feature-store/internal/handler/feature/retrieve.go
  • online-feature-store/internal/system/system.go

Comment on lines +11 to +22
| Test Scenario | Features | Default Ratio | Compression | Improvement |
|---------------|----------|---------------|-------------|-------------|
| High sparsity | 500 | 80% | ZSTD | **21.66%** ✅ |
| Very high sparsity | 850 | 95% | ZSTD | **10.23%** ✅ |
| Low sparsity | 1000 | 23% | ZSTD | **6.39%** ✅ |
| Medium sparsity | 100 | 50% | ZSTD | **24.47%** ✅ |
| Low sparsity | 200 | 20% | ZSTD | **8.90%** ✅ |
| Edge case: All non-zero | 50 | 0% | ZSTD | **-3.50%** ⚠️ |
| Edge case: All zeros | 100 | 100% | ZSTD | **18.75%** ✅ |
| FP16 high sparsity | 500 | 70% | ZSTD | **28.54%** ✅ |
| No compression | 500 | 60% | None | **56.85%** ✅ |

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Compressed size improvement percentages are inconsistent with layout_comparison_results.txt.

Six of the nine comparable rows in this table show different percentages from the matching entries in layout_comparison_results.txt, despite both files sharing the same generation date (January 7, 2026):

Scenario This file .txt file
500 features, 80% defaults 21.66% 23.72%
850 features, 95% defaults 10.23% 6.85%
1000 features, 23% defaults 6.39% 6.02%
100 features, 50% defaults 24.47% 23.66%
200 features, 20% defaults 8.90% 7.77%
500 FP16, 70% defaults 28.54% 27.11%

This suggests the two files were authored from different test runs or edited manually after generation. Both should be regenerated from a single, passing test run to ensure they agree.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@LAYOUT_TEST_RESULTS.md` around lines 11 - 22, The table in
LAYOUT_TEST_RESULTS.md has inconsistent "Improvement" percentages compared to
layout_comparison_results.txt; regenerate the results so both files come from
the same successful test run and match exactly: rerun the layout comparison test
suite, produce fresh outputs, and update LAYOUT_TEST_RESULTS.md entries (e.g.,
rows "High sparsity | 500 | 80%", "Very high sparsity | 850 | 95%", "Low
sparsity | 1000 | 23%", "Medium sparsity | 100 | 50%", "Low sparsity | 200 |
20%", "FP16 high sparsity | 500 | 70%") to the values generated by
layout_comparison_results.txt (or vice versa) ensuring date stamps remain
identical and no manual edits alter the numeric percentages.


### Test Coverage

- ✅ 10 different scenarios covering sparsity from 0% to 100%
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Scenario count in Test Coverage doesn't match either document.

Line 131 states "10 different scenarios" but the compressed-size table above (lines 11–21) has 9 rows, and layout_comparison_results.txt documents 13 scenarios. Please align the count with whichever authoritative set is actually exercised by the test suite.

✏️ Suggested fix
-  - ✅ 10 different scenarios covering sparsity from 0% to 100%
+  - ✅ 13 different scenarios covering sparsity from 0% to 100%
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
-10 different scenarios covering sparsity from 0% to 100%
-13 different scenarios covering sparsity from 0% to 100%
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@LAYOUT_TEST_RESULTS.md` at line 131, The summary line stating "10 different
scenarios" is inconsistent with the compressed-size table (9 rows) and
layout_comparison_results.txt (13 scenarios); pick the authoritative source used
by the test suite (preferably layout_comparison_results.txt if it’s the executed
output) and update the phrasing in LAYOUT_TEST_RESULTS.md (replace the "10
different scenarios" text) to match that authoritative scenario count, or
alternatively update the compressed-size table to include the missing scenarios
so all three sources (the "compressed-size" table, the "10 different scenarios"
sentence, and layout_comparison_results.txt) consistently report the same
number.

// Verify all values
for i, expected := range []int32{1, 2, 3} {
feature, err := ddb.GetNumericScalarFeature(i)
feature, err := ddb.GetNumericScalarFeature(i, 3, []byte{0, 0, 0})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Pass the correct feature count and default size.

numFeatures=3 and []byte{0,0,0} are placeholders that don’t match actual feature counts or data type sizes. This can hide bitmap-path defects and will fail when defaults are returned for larger numeric types. Please pass the real count and a default sized via dataType.Size().

Also applies to: 124-124, 279-279, 336-336, 492-492, 549-549, 705-705, 762-762, 917-917, 978-978, 1146-1146, 1203-1203, 1359-1359, 1416-1416

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go`
at line 67, Tests call ddb.GetNumericScalarFeature(i, 3, []byte{0,0,0}) with
hardcoded numFeatures and default bytes which are wrong; replace the placeholder
3 and []byte{0,0,0} with the actual feature count and a default-sized slice
derived from the feature's dataType.Size(). Locate calls to
GetNumericScalarFeature (and similar numeric scalar lookup helpers) and pass the
real feature count variable used in the test and construct the default value as
make([]byte, dataType.Size()) or equivalent so bitmap-path and larger numeric
types get correct-sized defaults.

// Test each position
for pos := 0; pos < 3; pos++ {
feature, err := d.GetNumericScalarFeature(pos)
feature, err := d.GetNumericScalarFeature(pos, 3, []byte{0, 0, 0})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use real numFeatures and correctly sized defaultValue.

Many calls pass numFeatures=3 and a 3‑byte default regardless of actual feature count/type. This hides layout‑2 bitmap behavior and will break if a default is returned for int32/float64 (size mismatch). Please pass the real feature count and a default sized to dataType.Size().

Example pattern:

defaultValue := make([]byte, types.DataTypeInt32.Size())
feature, err := d.GetNumericScalarFeature(pos, len(values), defaultValue)

Also applies to: 566-566, 590-590, 614-614, 638-638, 662-662, 999-999, 1104-1104, 1131-1131, 1158-1158, 1185-1185, 1212-1212

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2_test.go` at
line 394, Calls to d.GetNumericScalarFeature are using a hardcoded numFeatures=3
and a 3‑byte default which is incorrect; compute the real feature count from the
test's values slice (e.g. numFeatures := len(values)) and create defaultValue
with the proper byte length from the feature data type (e.g. defaultValue :=
make([]byte, dataType.Size())). Then pass those into
GetNumericScalarFeature(pos, numFeatures, defaultValue). Update every occurrence
(the call sites that currently pass 3 and a 3‑byte default) to use the
corresponding values length and dataType.Size() for the default.

Comment on lines +19 to +20
// NEW (optional)
BitmapMeta byte
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

BitmapMeta isn’t preserved when cloning DeserializedPSDB.

BitmapMeta is now a functional flag, but Copy() doesn’t carry it over. Cached copies will lose bitmap presence metadata and can misread layout‑2 payloads as dense. Please copy BitmapMeta (and any related bitmap state) in Copy().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2.go` around
lines 19 - 20, The DeserializedPSDB.Copy() implementation currently omits the
new BitmapMeta flag (and any bitmap-related state), causing cloned instances to
lose bitmap metadata; update the Copy() method on DeserializedPSDB to copy
BitmapMeta from the source into the destination and also deep-copy any
bitmap-related fields (e.g., bitmap slices, pointers or flags that indicate
presence/layout) so cloned objects preserve bitmap presence and layout-2 payload
semantics.

Comment on lines 46 to 51
switch layoutVersion {
case 1:
ddb, err = deserializePSDBForLayout1(data)
case 2:
ddb, err = deserializePSDBForLayout2(data)
default:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update “invalid layout version” tests after enabling layout‑2.

With layout version 2 now supported, tests expecting an error for version 2 should be updated (e.g., use version 3) to avoid CI failures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2.go` around
lines 46 - 51, Tests that assert an "invalid layout version" error must be
updated because layoutVersion 2 is now handled by deserializePSDBForLayout2;
find the switch on layoutVersion in deserialized_psdb_v2.go (cases for 1 and 2
call deserializePSDBForLayout1/deserializePSDBForLayout2) and update any unit
tests that previously used version 2 as the invalid value to use a truly
unsupported value (e.g., 3) so they exercise the default error path instead of
the new layout-2 handler.

Comment on lines +329 to +388
func (dd *DeserializedPSDB) GetNumericScalarFeature(
pos int,
numFeatures int,
defaultValue []byte,
) ([]byte, error) {

size := dd.DataType.Size()
start := pos * size
end := start + size
if start >= len(dd.OriginalData) || end > len(dd.OriginalData) {
data := dd.OriginalData
offset := 0

// ─────────────────────────────
// Layout-2 bitmap handling
// ─────────────────────────────
if dd.LayoutVersion == 2 && (dd.BitmapMeta&0x08) != 0 {

bitmapSize := (numFeatures + 7) / 8
if len(data) < bitmapSize {
return nil, fmt.Errorf("corrupt bitmap payload")
}

bitmap := data[:bitmapSize]
dense := data[bitmapSize:]

byteIdx := pos / 8
bitIdx := pos % 8

if byteIdx >= len(bitmap) {
return nil, fmt.Errorf("bitmap index out of bounds")
}

// Feature is default
if (bitmap[byteIdx] & (1 << bitIdx)) == 0 {
return defaultValue, nil
}

denseIdx := countSetBitsBefore(bitmap, pos, numFeatures)
start := denseIdx * size
end := start + size

if end > len(dense) {
return nil, fmt.Errorf(
"dense offset out of bounds (idx=%d start=%d len=%d)",
denseIdx, start, len(dense),
)
}

return dense[start:end], nil
}

// ─────────────────────────────
// Dense value access
// ─────────────────────────────
offset = pos * size
end := offset + size

if offset < 0 || end > len(data) {
return nil, fmt.Errorf("position out of bounds")
}
return dd.OriginalData[start:end], nil

return data[offset:end], nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add bounds checks in the bitmap path.

The bitmap branch doesn’t guard pos < 0 or pos >= numFeatures. A negative pos panics, and oversized pos can silently return defaults from padding bits.

🐛 Suggested fix
 	if dd.LayoutVersion == 2 && (dd.BitmapMeta&0x08) != 0 {
+		if pos < 0 || pos >= numFeatures {
+			return nil, fmt.Errorf("position out of bounds")
+		}
 
 		bitmapSize := (numFeatures + 7) / 8
 		if len(data) < bitmapSize {
 			return nil, fmt.Errorf("corrupt bitmap payload")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (dd *DeserializedPSDB) GetNumericScalarFeature(
pos int,
numFeatures int,
defaultValue []byte,
) ([]byte, error) {
size := dd.DataType.Size()
start := pos * size
end := start + size
if start >= len(dd.OriginalData) || end > len(dd.OriginalData) {
data := dd.OriginalData
offset := 0
// ─────────────────────────────
// Layout-2 bitmap handling
// ─────────────────────────────
if dd.LayoutVersion == 2 && (dd.BitmapMeta&0x08) != 0 {
bitmapSize := (numFeatures + 7) / 8
if len(data) < bitmapSize {
return nil, fmt.Errorf("corrupt bitmap payload")
}
bitmap := data[:bitmapSize]
dense := data[bitmapSize:]
byteIdx := pos / 8
bitIdx := pos % 8
if byteIdx >= len(bitmap) {
return nil, fmt.Errorf("bitmap index out of bounds")
}
// Feature is default
if (bitmap[byteIdx] & (1 << bitIdx)) == 0 {
return defaultValue, nil
}
denseIdx := countSetBitsBefore(bitmap, pos, numFeatures)
start := denseIdx * size
end := start + size
if end > len(dense) {
return nil, fmt.Errorf(
"dense offset out of bounds (idx=%d start=%d len=%d)",
denseIdx, start, len(dense),
)
}
return dense[start:end], nil
}
// ─────────────────────────────
// Dense value access
// ─────────────────────────────
offset = pos * size
end := offset + size
if offset < 0 || end > len(data) {
return nil, fmt.Errorf("position out of bounds")
}
return dd.OriginalData[start:end], nil
return data[offset:end], nil
func (dd *DeserializedPSDB) GetNumericScalarFeature(
pos int,
numFeatures int,
defaultValue []byte,
) ([]byte, error) {
size := dd.DataType.Size()
data := dd.OriginalData
offset := 0
// ─────────────────────────────
// Layout-2 bitmap handling
// ─────────────────────────────
if dd.LayoutVersion == 2 && (dd.BitmapMeta&0x08) != 0 {
if pos < 0 || pos >= numFeatures {
return nil, fmt.Errorf("position out of bounds")
}
bitmapSize := (numFeatures + 7) / 8
if len(data) < bitmapSize {
return nil, fmt.Errorf("corrupt bitmap payload")
}
bitmap := data[:bitmapSize]
dense := data[bitmapSize:]
byteIdx := pos / 8
bitIdx := pos % 8
if byteIdx >= len(bitmap) {
return nil, fmt.Errorf("bitmap index out of bounds")
}
// Feature is default
if (bitmap[byteIdx] & (1 << bitIdx)) == 0 {
return defaultValue, nil
}
denseIdx := countSetBitsBefore(bitmap, pos, numFeatures)
start := denseIdx * size
end := start + size
if end > len(dense) {
return nil, fmt.Errorf(
"dense offset out of bounds (idx=%d start=%d len=%d)",
denseIdx, start, len(dense),
)
}
return dense[start:end], nil
}
// ─────────────────────────────
// Dense value access
// ─────────────────────────────
offset = pos * size
end := offset + size
if offset < 0 || end > len(data) {
return nil, fmt.Errorf("position out of bounds")
}
return data[offset:end], nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2.go` around
lines 329 - 388, The bitmap branch in DeserializedPSDB.GetNumericScalarFeature
fails to validate pos, allowing negative or >= numFeatures values to panic or
misbehave; add explicit bounds checks near the top of the bitmap path (before
computing byteIdx/bitIdx or indexing bitmap) to return an error (e.g., "position
out of bounds") if pos < 0 or pos >= numFeatures, and then proceed with the
existing bitmapSize/byteIdx/bitIdx logic so countSetBitsBefore and bitmap
indexing cannot be invoked with an invalid pos.

Comment on lines +1 to +337
╔════════════════════════════════════════════════════════════════════════════════╗
║ Layout1 vs Layout2 Compression Test Results ║
║ Generated: 2026-01-07 15:32:12 ║
╚════════════════════════════════════════════════════════════════════════════════╝

┌────────────────────────────────────────────────────────────────────────────────┐
│ Test Results Summary │
└────────────────────────────────────────────────────────────────────────────────┘

Test Name | Features | Defaults | Original Δ | Compressed Δ
--------------------------------------------------------------------------------------------------------------
500 features with 80% defaults (high sparsity) | 500 | 80.0% | 76.85% | 23.72% ✅
850 features with 95% defaults (very high spars... | 850 | 95.0% | 91.79% | 6.85% ✅
850 features with 0% defaults (very high sparsity) | 850 | 0.0% | -3.15% | -0.23% ⚠️
850 features with 100% defaults (very high spar... | 850 | 100.0% | 96.85% | 6.67% ✅
850 features with 80% defaults (very high spars... | 850 | 80.0% | 76.85% | 18.78% ✅
850 features with 50% defaults (very high spars... | 850 | 50.0% | 46.85% | 18.08% ✅
1000 features with 23% defaults (low sparsity) | 1000 | 23.0% | 19.88% | 6.02% ✅
100 features with 50% defaults (medium sparsity) | 100 | 50.0% | 46.75% | 23.66% ✅
200 features with 20% defaults (low sparsity) | 200 | 20.0% | 16.88% | 7.77% ✅
50 features with 0% defaults (all non-zero) - b... | 50 | 0.0% | -3.50% | -3.50% ⚠️
100 features with 100% defaults (all zeros) | 100 | 100.0% | 96.75% | 18.75% ✅
500 features FP16 with 70% defaults | 500 | 70.0% | 63.70% | 27.11% ✅
500 features with 60% defaults (No compression) | 500 | 60.0% | 56.85% | 56.85% ✅


┌────────────────────────────────────────────────────────────────────────────────┐
│ Detailed Results │
└────────────────────────────────────────────────────────────────────────────────┘

1. 500 features with 80% defaults (high sparsity)
──────────────────────────────────────────────────────────────────────────────
Configuration:
Features: 500 total | 100 non-zero (20.0%) | 400 defaults (80.0%)
Data Type: DataTypeFP32
Compression: 1

Layout1 (Baseline):
Original Size: 2000 bytes
Compressed Size: 607 bytes

Layout2 (Optimized):
Original Size: 463 bytes
Compressed Size: 463 bytes

Improvements:
Original Size: +1537 bytes (76.85%)
Compressed Size: +144 bytes (23.72%)
Total Size: 23.21% reduction
Result: ✅ Layout2 is BETTER

2. 850 features with 95% defaults (very high sparsity)
──────────────────────────────────────────────────────────────────────────────
Configuration:
Features: 850 total | 43 non-zero (5.1%) | 807 defaults (95.0%)
Data Type: DataTypeFP32
Compression: 1

Layout1 (Baseline):
Original Size: 3400 bytes
Compressed Size: 292 bytes

Layout2 (Optimized):
Original Size: 279 bytes
Compressed Size: 272 bytes

Improvements:
Original Size: +3121 bytes (91.79%)
Compressed Size: +20 bytes (6.85%)
Total Size: 6.31% reduction
Result: ✅ Layout2 is BETTER

3. 850 features with 0% defaults (very high sparsity)
──────────────────────────────────────────────────────────────────────────────
Configuration:
Features: 850 total | 850 non-zero (100.0%) | 0 defaults (0.0%)
Data Type: DataTypeFP32
Compression: 1

Layout1 (Baseline):
Original Size: 3400 bytes
Compressed Size: 3097 bytes

Layout2 (Optimized):
Original Size: 3507 bytes
Compressed Size: 3104 bytes

Improvements:
Original Size: -107 bytes (-3.15%)
Compressed Size: -7 bytes (-0.23%)
Total Size: -0.26% reduction
Result: ⚠️ Layout2 has overhead (expected for 0% defaults)

4. 850 features with 100% defaults (very high sparsity)
──────────────────────────────────────────────────────────────────────────────
Configuration:
Features: 850 total | 0 non-zero (0.0%) | 850 defaults (100.0%)
Data Type: DataTypeFP32
Compression: 1

Layout1 (Baseline):
Original Size: 3400 bytes
Compressed Size: 15 bytes

Layout2 (Optimized):
Original Size: 107 bytes
Compressed Size: 14 bytes

Improvements:
Original Size: +3293 bytes (96.85%)
Compressed Size: +1 bytes (6.67%)
Total Size: 0.00% reduction
Result: ✅ Layout2 is BETTER

5. 850 features with 80% defaults (very high sparsity)
──────────────────────────────────────────────────────────────────────────────
Configuration:
Features: 850 total | 170 non-zero (20.0%) | 680 defaults (80.0%)
Data Type: DataTypeFP32
Compression: 1

Layout1 (Baseline):
Original Size: 3400 bytes
Compressed Size: 969 bytes

Layout2 (Optimized):
Original Size: 787 bytes
Compressed Size: 787 bytes

Improvements:
Original Size: +2613 bytes (76.85%)
Compressed Size: +182 bytes (18.78%)
Total Size: 18.51% reduction
Result: ✅ Layout2 is BETTER

6. 850 features with 50% defaults (very high sparsity)
──────────────────────────────────────────────────────────────────────────────
Configuration:
Features: 850 total | 425 non-zero (50.0%) | 425 defaults (50.0%)
Data Type: DataTypeFP32
Compression: 1

Layout1 (Baseline):
Original Size: 3400 bytes
Compressed Size: 2063 bytes

Layout2 (Optimized):
Original Size: 1807 bytes
Compressed Size: 1690 bytes

Improvements:
Original Size: +1593 bytes (46.85%)
Compressed Size: +373 bytes (18.08%)
Total Size: 17.95% reduction
Result: ✅ Layout2 is BETTER

7. 1000 features with 23% defaults (low sparsity)
──────────────────────────────────────────────────────────────────────────────
Configuration:
Features: 1000 total | 770 non-zero (77.0%) | 230 defaults (23.0%)
Data Type: DataTypeFP32
Compression: 1

Layout1 (Baseline):
Original Size: 4000 bytes
Compressed Size: 3125 bytes

Layout2 (Optimized):
Original Size: 3205 bytes
Compressed Size: 2937 bytes

Improvements:
Original Size: +795 bytes (19.88%)
Compressed Size: +188 bytes (6.02%)
Total Size: 5.97% reduction
Result: ✅ Layout2 is BETTER

8. 100 features with 50% defaults (medium sparsity)
──────────────────────────────────────────────────────────────────────────────
Configuration:
Features: 100 total | 50 non-zero (50.0%) | 50 defaults (50.0%)
Data Type: DataTypeFP32
Compression: 1

Layout1 (Baseline):
Original Size: 400 bytes
Compressed Size: 279 bytes

Layout2 (Optimized):
Original Size: 213 bytes
Compressed Size: 213 bytes

Improvements:
Original Size: +187 bytes (46.75%)
Compressed Size: +66 bytes (23.66%)
Total Size: 22.57% reduction
Result: ✅ Layout2 is BETTER

9. 200 features with 20% defaults (low sparsity)
──────────────────────────────────────────────────────────────────────────────
Configuration:
Features: 200 total | 160 non-zero (80.0%) | 40 defaults (20.0%)
Data Type: DataTypeFP32
Compression: 1

Layout1 (Baseline):
Original Size: 800 bytes
Compressed Size: 721 bytes

Layout2 (Optimized):
Original Size: 665 bytes
Compressed Size: 665 bytes

Improvements:
Original Size: +135 bytes (16.88%)
Compressed Size: +56 bytes (7.77%)
Total Size: 7.53% reduction
Result: ✅ Layout2 is BETTER

10. 50 features with 0% defaults (all non-zero) - bitmap overhead expected
──────────────────────────────────────────────────────────────────────────────
Configuration:
Features: 50 total | 50 non-zero (100.0%) | 0 defaults (0.0%)
Data Type: DataTypeFP32
Compression: 1

Layout1 (Baseline):
Original Size: 200 bytes
Compressed Size: 200 bytes

Layout2 (Optimized):
Original Size: 207 bytes
Compressed Size: 207 bytes

Improvements:
Original Size: -7 bytes (-3.50%)
Compressed Size: -7 bytes (-3.50%)
Total Size: -3.83% reduction
Result: ⚠️ Layout2 has overhead (expected for 0% defaults)

11. 100 features with 100% defaults (all zeros)
──────────────────────────────────────────────────────────────────────────────
Configuration:
Features: 100 total | 0 non-zero (0.0%) | 100 defaults (100.0%)
Data Type: DataTypeFP32
Compression: 1

Layout1 (Baseline):
Original Size: 400 bytes
Compressed Size: 16 bytes

Layout2 (Optimized):
Original Size: 13 bytes
Compressed Size: 13 bytes

Improvements:
Original Size: +387 bytes (96.75%)
Compressed Size: +3 bytes (18.75%)
Total Size: 8.00% reduction
Result: ✅ Layout2 is BETTER

12. 500 features FP16 with 70% defaults
──────────────────────────────────────────────────────────────────────────────
Configuration:
Features: 500 total | 150 non-zero (30.0%) | 350 defaults (70.0%)
Data Type: DataTypeFP16
Compression: 1

Layout1 (Baseline):
Original Size: 1000 bytes
Compressed Size: 498 bytes

Layout2 (Optimized):
Original Size: 363 bytes
Compressed Size: 363 bytes

Improvements:
Original Size: +637 bytes (63.70%)
Compressed Size: +135 bytes (27.11%)
Total Size: 26.43% reduction
Result: ✅ Layout2 is BETTER

13. 500 features with 60% defaults (No compression)
──────────────────────────────────────────────────────────────────────────────
Configuration:
Features: 500 total | 200 non-zero (40.0%) | 300 defaults (60.0%)
Data Type: DataTypeFP32
Compression: 0

Layout1 (Baseline):
Original Size: 2000 bytes
Compressed Size: 2000 bytes

Layout2 (Optimized):
Original Size: 863 bytes
Compressed Size: 863 bytes

Improvements:
Original Size: +1137 bytes (56.85%)
Compressed Size: +1137 bytes (56.85%)
Total Size: 56.55% reduction
Result: ✅ Layout2 is BETTER


┌────────────────────────────────────────────────────────────────────────────────┐
│ Aggregate Statistics │
└────────────────────────────────────────────────────────────────────────────────┘

Tests Passed: 11/13 scenarios
Layout2 Better: 11/13 scenarios (84.6%)

Average Improvements (excluding 0% defaults):
Original Size: 57.50% reduction
Compressed Size: 17.85% reduction

Maximum Improvements:
Original Size: 96.85% reduction
Compressed Size: 56.85% reduction

Minimum Improvements (with defaults present):
Original Size: 16.88% reduction
Compressed Size: 6.02% reduction


┌────────────────────────────────────────────────────────────────────────────────┐
│ Conclusion │
└────────────────────────────────────────────────────────────────────────────────┘

✅ Layout2 should be used as the default layout version.

Rationale:
• Consistent improvements in 11 out of 13 scenarios (84.6%)
• Average compressed size reduction: 17.85%
• Maximum original size reduction: 96.85%
• Minimal overhead (3.5%) only in edge case with 0% defaults
• Production ML feature vectors typically have 20-95% sparsity

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Test results are potentially stale due to CI pipeline failure.

The Go test suite for internal/data/blocks is currently failing with exit code 1. This file appears to have been committed as a static snapshot, but if the underlying tests are broken, the numbers here may not reflect the current state of the code. The file should be regenerated (or removed and regenerated as part of the CI artifact) only after all tests pass.

🧰 Tools
🪛 GitHub Actions: Online Feature Store CI

[error] 1-1: Go test failed for package internal/data/blocks (exit code 1).

🪛 LanguageTool

[grammar] ~38-~38: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 200...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~59-~59: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 340...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~80-~80: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 340...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~101-~101: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 340...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~122-~122: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 340...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~143-~143: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 340...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~164-~164: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 400...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~185-~185: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 40...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~206-~206: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 80...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~227-~227: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 20...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~248-~248: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 40...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~269-~269: Ensure spelling is correct
Context: ...ataTypeFP16 Compression: 1 Layout1 (Baseline): Original Size: 100...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~290-~290: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 0 Layout1 (Baseline): Original Size: 200...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~309-~309: Ensure spelling is correct
Context: ...────────┘ Tests Passed: 11/13 scenarios Layout2 Better: 11/13 scenarios (84.6%) Averag...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_results.txt`
around lines 1 - 337, The committed static snapshot
layout_comparison_results.txt is stale because the Go test suite for
internal/data/blocks is failing (exit code 1); fix by running and resolving the
failing tests (go test ./internal/data/blocks), then regenerate the snapshot
file (overwrite layout_comparison_results.txt with the new output) or remove and
recreate it as a CI artifact only after tests pass, and include the regenerated
file in the same commit; ensure the commit message references the fixed tests
and regenerated snapshot so CI can validate the updated numbers.

Comment on lines 124 to 130
func (p *PermStorageDataBlock) Serialize() ([]byte, error) {
switch p.layoutVersion {
case 1:
return p.serializeLayout1()
case 2:
return p.serializeLayout1()
default:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Layout‑2 header byte is missing for non‑FP32 types.

layoutVersion==2 routes through serializeLayout1 for all data types, but only serializeFP32AndLessV2 appends bitmapMeta. For other types, the deserializer will treat the first payload byte as bitmapMeta, corrupting reads. Either append bitmapMeta for all layout‑2 serializers or reject layout‑2 for unsupported types.

🐛 Possible guard to prevent silent corruption
 	switch p.layoutVersion {
 	case 1:
 		return p.serializeLayout1()
 	case 2:
-		return p.serializeLayout1()
+		switch p.dataType {
+		case types.DataTypeFP32, types.DataTypeFP16, types.DataTypeFP8E4M3, types.DataTypeFP8E5M2:
+			return p.serializeLayout1()
+		default:
+			return nil, fmt.Errorf("layout-2 not supported for data type: %s", p.dataType)
+		}
 	default:
 		return nil, fmt.Errorf("unsupported layout version: %d", p.layoutVersion)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (p *PermStorageDataBlock) Serialize() ([]byte, error) {
switch p.layoutVersion {
case 1:
return p.serializeLayout1()
case 2:
return p.serializeLayout1()
default:
func (p *PermStorageDataBlock) Serialize() ([]byte, error) {
switch p.layoutVersion {
case 1:
return p.serializeLayout1()
case 2:
switch p.dataType {
case types.DataTypeFP32, types.DataTypeFP16, types.DataTypeFP8E4M3, types.DataTypeFP8E5M2:
return p.serializeLayout1()
default:
return nil, fmt.Errorf("layout-2 not supported for data type: %s", p.dataType)
}
default:
return nil, fmt.Errorf("unsupported layout version: %d", p.layoutVersion)
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go`
around lines 124 - 130, Serialize() incorrectly routes layoutVersion==2 through
serializeLayout1(), causing non-FP32 serializers to omit the layout‑2
header/bitmapMeta and corrupt deserialization; update Serialize() to handle case
2 by calling the layout‑2 serializer(s) (e.g., call serializeFP32AndLessV2 for
FP32 paths and implement or route other types to a serializeLayout2 variant that
appends bitmapMeta) or explicitly return an error for unsupported types; ensure
serializeFP32AndLessV2’s bitmapMeta format is applied consistently for all
layoutVersion==2 code paths and update any serializer functions referenced
(serializeLayout1, serializeFP32AndLessV2, or new serializeLayout2) so the
deserializer sees the header byte for every layout‑2 block.

Comment on lines +268 to +305
if p.layoutVersion == 2 && len(p.bitmap) > 0 {

for i, v := range values {
if (p.bitmap[i/8] & (1 << (i % 8))) == 0 {
continue
}
putFloat(p.originalData[idx:idx+unitSize], v)
idx += unitSize
}

p.originalData = p.originalData[:idx]
} else {
for _, v := range values {
putFloat(p.originalData[idx:idx+unitSize], v)
idx += unitSize
}
}

// ─────────────────────────────
// Step 2: layout-2 payload handling
// ─────────────────────────────
if p.layoutVersion == 2 {
// prepend bitmap to payload if present
if len(p.bitmap) > 0 {
p.bitmapMeta = p.bitmapMeta | 1<<3 // bitmapPresent = 1
tmp := make([]byte, 0, len(p.bitmap)+len(p.originalData))
tmp = append(tmp, p.bitmap...)
tmp = append(tmp, p.originalData...)
p.originalData = tmp
}

// append bitmapMeta to header
if len(p.buf) != PSDBLayout1LengthBytes {
return nil, fmt.Errorf("invalid base header length for layout-2")
}
p.buf = append(p.buf, p.bitmapMeta)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate bitmap length before indexing.

The loop indexes p.bitmap[i/8] without checking size; a malformed bitmap can panic. Add a guard that the bitmap has at least ceil(len(values)/8) bytes.

🐛 Suggested fix
 	if p.layoutVersion == 2 && len(p.bitmap) > 0 {
+		if len(p.bitmap)*8 < len(values) {
+			return nil, fmt.Errorf("bitmap too small for %d values", len(values))
+		}
 
 		for i, v := range values {
 			if (p.bitmap[i/8] & (1 << (i % 8))) == 0 {
 				continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if p.layoutVersion == 2 && len(p.bitmap) > 0 {
for i, v := range values {
if (p.bitmap[i/8] & (1 << (i % 8))) == 0 {
continue
}
putFloat(p.originalData[idx:idx+unitSize], v)
idx += unitSize
}
p.originalData = p.originalData[:idx]
} else {
for _, v := range values {
putFloat(p.originalData[idx:idx+unitSize], v)
idx += unitSize
}
}
// ─────────────────────────────
// Step 2: layout-2 payload handling
// ─────────────────────────────
if p.layoutVersion == 2 {
// prepend bitmap to payload if present
if len(p.bitmap) > 0 {
p.bitmapMeta = p.bitmapMeta | 1<<3 // bitmapPresent = 1
tmp := make([]byte, 0, len(p.bitmap)+len(p.originalData))
tmp = append(tmp, p.bitmap...)
tmp = append(tmp, p.originalData...)
p.originalData = tmp
}
// append bitmapMeta to header
if len(p.buf) != PSDBLayout1LengthBytes {
return nil, fmt.Errorf("invalid base header length for layout-2")
}
p.buf = append(p.buf, p.bitmapMeta)
}
if p.layoutVersion == 2 && len(p.bitmap) > 0 {
if len(p.bitmap)*8 < len(values) {
return nil, fmt.Errorf("bitmap too small for %d values", len(values))
}
for i, v := range values {
if (p.bitmap[i/8] & (1 << (i % 8))) == 0 {
continue
}
putFloat(p.originalData[idx:idx+unitSize], v)
idx += unitSize
}
p.originalData = p.originalData[:idx]
} else {
for _, v := range values {
putFloat(p.originalData[idx:idx+unitSize], v)
idx += unitSize
}
}
// ─────────────────────────────
// Step 2: layout-2 payload handling
// ─────────────────────────────
if p.layoutVersion == 2 {
// prepend bitmap to payload if present
if len(p.bitmap) > 0 {
p.bitmapMeta = p.bitmapMeta | 1<<3 // bitmapPresent = 1
tmp := make([]byte, 0, len(p.bitmap)+len(p.originalData))
tmp = append(tmp, p.bitmap...)
tmp = append(tmp, p.originalData...)
p.originalData = tmp
}
// append bitmapMeta to header
if len(p.buf) != PSDBLayout1LengthBytes {
return nil, fmt.Errorf("invalid base header length for layout-2")
}
p.buf = append(p.buf, p.bitmapMeta)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go`
around lines 268 - 305, The loop uses p.bitmap[i/8] without bounds checking
which can panic if the bitmap is too short; before the loop in the
p.layoutVersion == 2 && len(p.bitmap) > 0 branch compute needed :=
(len(values)+7)/8 and validate len(p.bitmap) >= needed, and if not return a
clear error (e.g., fmt.Errorf("invalid bitmap length: need %d bytes", needed));
add this guard so indexing p.bitmap[i/8] is always safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants