POC FS LAYOUT by Adit2607 · Pull Request #266 · Meesho/BharatMLStack

Adit2607 · 2026-01-07T10:17:32Z

🔁 Pull Request Template – BharatMLStack

Please fill out the following sections to help us review your changes efficiently.

📌 Summary

e.g., Adds optimizes Redis fetch latency in online-feature-store, or improves search UI responsiveness in trufflebox-ui.

📂 Modules Affected

horizon (Real-time systems / networking)
online-feature-store (Feature serving infra)
trufflebox-ui (Admin panel / UI)
infra (Docker, CI/CD, GCP/AWS setup)
docs (Documentation updates)
Other: ___________

✅ Type of Change

📊 Benchmark / Metrics (if applicable)

Summary by CodeRabbit

New Features
- Added Layout2 feature storage format with bitmap optimization for improved compression in sparse data scenarios.
- Integrated default value support for more efficient feature handling.
Tests
- Added comprehensive compression comparison tests validating Layout2 advantages across multiple data types and sparsity levels.
Documentation
- Added detailed test results report documenting Layout2 vs. Layout1 performance, trade-offs, and migration guidance.

…to develop

…_layout

coderabbitai · 2026-02-24T08:51:41Z

Walkthrough

This pull request introduces Layout2, a bitmap-optimized feature-vector compression format, alongside Layout1. Changes span deserialization, serialization, storage, and feature retrieval to support bitmap-based sparse data optimization, with comprehensive testing and documentation.

Changes

Cohort / File(s)	Summary
Layout2 Core Implementation `online-feature-store/internal/data/blocks/deserialized_psdb_v2.go`, `online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go`	Introduced bitmap support for layout-2: added BitmapMeta field, deserializePSDBForLayout2 function, bitmap validation, and helper countSetBitsBefore. Updated GetNumericScalarFeature to accept numFeatures and defaultValue, returning default when bitmap marks feature as default. Builder methods SetBitmap and SetupBitmapMeta added for bitmap composition during serialization.
Test Suite & Validation `online-feature-store/internal/data/blocks/layout_comparison_test.go`, `online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go`, `online-feature-store/internal/data/blocks/deserialized_psdb_v2_test.go`	New comprehensive layout comparison tests (TestLayout1VsLayout2Compression, TestLayout2BitmapOptimization) evaluating compression across sparsity, data types, and compression strategies. Existing tests updated to pass new GetNumericScalarFeature signature (pos, numFeatures, defaultValue).
Documentation & Reports `LAYOUT_TEST_RESULTS.md`, `online-feature-store/internal/data/blocks/layout_comparison_results.txt`	Added detailed markdown and ASCII reports documenting Layout2 vs Layout1 head-to-head comparison, test results across multiple scenarios, key findings, trade-offs, production implications, and migration guidance.
Feature Retrieval & Persistence `online-feature-store/internal/handler/feature/retrieve.go`, `online-feature-store/internal/handler/feature/persist.go`	Extended GetFeature signature to accept defaultValue parameter, propagating it to GetNumericScalarFeature calls. Updated BuildPSDBBlock to accept featureBitmap parameter, passing it to SetBitmap on the builder.
System-Level Parsing & Accessors `online-feature-store/internal/system/system.go`	All feature value parsers and getters (ParseFeatureValue, GetInt32, GetFP32, GetStringVector, etc.) updated to return (data, []byte, error) tuple with bitmap payload. Updated 15\+ method signatures to support bitmap/default-tracking alongside feature data.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant PersistHandler as Persist Handler
    participant System
    participant PermStorageDataBlock
    participant PSDB as DeserializedPSDB

    Client->>PersistHandler: preparePersistData with feature values
    PersistHandler->>System: ParseFeatureValue(featureLabels, features, dataType, featureMeta)
    System-->>PersistHandler: (featureData, featureBitmap, error)
    PersistHandler->>PermStorageDataBlock: BuildPSDBBlock(data, featureBitmap, config)
    PermStorageDataBlock->>PermStorageDataBlock: SetBitmap(featureBitmap)
    PermStorageDataBlock->>PermStorageDataBlock: SetupBitmapMeta(numFeatures)
    PermStorageDataBlock->>PermStorageDataBlock: Serialize (layout-2 path)
    PermStorageDataBlock-->>PersistHandler: serialized PSDB with bitmap
    
    Note over PermStorageDataBlock: If layout==2:<br/>- Embed bitmap in payload<br/>- Set bitmapMeta header
    
    Client->>PersistHandler: RetrieveFeature(seq)
    PersistHandler->>System: GetDefaultValueByte(...)
    System-->>PersistHandler: defaultValue
    PersistHandler->>PSDB: GetFeature(dataType, seq, numFeatures, ..., defaultValue)
    PSDB->>PSDB: GetNumericScalarFeature(seq, numFeatures, defaultValue)
    alt Bitmap indicates default
        PSDB-->>PersistHandler: defaultValue
    else Bitmap indicates stored value
        PSDB->>PSDB: countSetBitsBefore(bitmap, seq, numFeatures)
        PSDB->>PSDB: extract from dense payload at calculated index
        PSDB-->>PersistHandler: feature value
    end

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Krd Checker	⚠️ Warning	PR description contains only template without KRD link or valid KRD exemption (missing Pod Type H0/H1/H2 or insufficient justification).	Add KRD link (KRD: https://docs.google.com/document/...) or KRD exemption with Pod Type and 20+ character justification.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Dynamic Configuration Validation	✅ Passed	No modifications to application-dyn-*.yml files found; PR changes limited to Go source files, Markdown, and text files in online-feature-store module.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

🧹 Nitpick comments (2)

online-feature-store/internal/handler/feature/retrieve.go (1)
883-889: Consider caching default values instead of per-feature lookups.

GetDefaultValueByte is now called for every feature in the hot path. If config access isn’t O(1), this could add latency. Consider caching defaults per FG/version (or reuse values preloaded in preProcessForKeys) and pass them into GetFeature.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/handler/feature/retrieve.go` around lines 883 -
889, GetDefaultValueByte is being called for each feature in the hot path
(h.config.GetDefaultValueByte in the loop) which can be expensive; change the
code to cache default values per feature-group/version (keyed by fgId and
version) — populate that cache once (preferably in preProcessForKeys or at the
start of the retrieval flow) and then look up the defaultValue from the cache
inside the loop instead of calling GetDefaultValueByte repeatedly, and update
the call to GetFeature to accept the cached defaultValue so you no longer call
h.config.GetDefaultValueByte for every featureLabel.
online-feature-store/internal/data/blocks/layout_comparison_test.go (1)
303-446: Results file should go to a temp dir or be opt‑in.

Tests always write layout_comparison_results.txt to the repo root, which can pollute CI or fail in read‑only environments. Consider t.TempDir() or gating behind an env flag.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go` around
lines 303 - 446, The generateResultsFile function always writes
layout_comparison_results.txt to the repo root which can pollute CI or fail in
read‑only environments; change generateResultsFile to accept an output path
(e.g., outputDir string) or honor an env flag (e.g., LAYOUT_RESULTS_DIR or
LAYOUT_RESULTS_ENABLED) and, in tests, use t.TempDir() to pass a temp directory
(or skip writing when the flag is off); update callers/tests that invoke
generateResultsFile to provide the temp dir or check the flag so file creation
is opt‑in and not written to the repo root.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@LAYOUT_TEST_RESULTS.md`:
- Around line 11-22: The table in LAYOUT_TEST_RESULTS.md has inconsistent
"Improvement" percentages compared to layout_comparison_results.txt; regenerate
the results so both files come from the same successful test run and match
exactly: rerun the layout comparison test suite, produce fresh outputs, and
update LAYOUT_TEST_RESULTS.md entries (e.g., rows "High sparsity | 500 | 80%",
"Very high sparsity | 850 | 95%", "Low sparsity | 1000 | 23%", "Medium sparsity
| 100 | 50%", "Low sparsity | 200 | 20%", "FP16 high sparsity | 500 | 70%") to
the values generated by layout_comparison_results.txt (or vice versa) ensuring
date stamps remain identical and no manual edits alter the numeric percentages.
- Line 131: The summary line stating "10 different scenarios" is inconsistent
with the compressed-size table (9 rows) and layout_comparison_results.txt (13
scenarios); pick the authoritative source used by the test suite (preferably
layout_comparison_results.txt if it’s the executed output) and update the
phrasing in LAYOUT_TEST_RESULTS.md (replace the "10 different scenarios" text)
to match that authoritative scenario count, or alternatively update the
compressed-size table to include the missing scenarios so all three sources (the
"compressed-size" table, the "10 different scenarios" sentence, and
layout_comparison_results.txt) consistently report the same number.

In
`@online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go`:
- Line 67: Tests call ddb.GetNumericScalarFeature(i, 3, []byte{0,0,0}) with
hardcoded numFeatures and default bytes which are wrong; replace the placeholder
3 and []byte{0,0,0} with the actual feature count and a default-sized slice
derived from the feature's dataType.Size(). Locate calls to
GetNumericScalarFeature (and similar numeric scalar lookup helpers) and pass the
real feature count variable used in the test and construct the default value as
make([]byte, dataType.Size()) or equivalent so bitmap-path and larger numeric
types get correct-sized defaults.

In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2_test.go`:
- Line 394: Calls to d.GetNumericScalarFeature are using a hardcoded
numFeatures=3 and a 3‑byte default which is incorrect; compute the real feature
count from the test's values slice (e.g. numFeatures := len(values)) and create
defaultValue with the proper byte length from the feature data type (e.g.
defaultValue := make([]byte, dataType.Size())). Then pass those into
GetNumericScalarFeature(pos, numFeatures, defaultValue). Update every occurrence
(the call sites that currently pass 3 and a 3‑byte default) to use the
corresponding values length and dataType.Size() for the default.

In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2.go`:
- Around line 46-51: Tests that assert an "invalid layout version" error must be
updated because layoutVersion 2 is now handled by deserializePSDBForLayout2;
find the switch on layoutVersion in deserialized_psdb_v2.go (cases for 1 and 2
call deserializePSDBForLayout1/deserializePSDBForLayout2) and update any unit
tests that previously used version 2 as the invalid value to use a truly
unsupported value (e.g., 3) so they exercise the default error path instead of
the new layout-2 handler.
- Around line 19-20: The DeserializedPSDB.Copy() implementation currently omits
the new BitmapMeta flag (and any bitmap-related state), causing cloned instances
to lose bitmap metadata; update the Copy() method on DeserializedPSDB to copy
BitmapMeta from the source into the destination and also deep-copy any
bitmap-related fields (e.g., bitmap slices, pointers or flags that indicate
presence/layout) so cloned objects preserve bitmap presence and layout-2 payload
semantics.
- Around line 329-388: The bitmap branch in
DeserializedPSDB.GetNumericScalarFeature fails to validate pos, allowing
negative or >= numFeatures values to panic or misbehave; add explicit bounds
checks near the top of the bitmap path (before computing byteIdx/bitIdx or
indexing bitmap) to return an error (e.g., "position out of bounds") if pos < 0
or pos >= numFeatures, and then proceed with the existing
bitmapSize/byteIdx/bitIdx logic so countSetBitsBefore and bitmap indexing cannot
be invoked with an invalid pos.

In `@online-feature-store/internal/data/blocks/layout_comparison_results.txt`:
- Around line 1-337: The committed static snapshot layout_comparison_results.txt
is stale because the Go test suite for internal/data/blocks is failing (exit
code 1); fix by running and resolving the failing tests (go test
./internal/data/blocks), then regenerate the snapshot file (overwrite
layout_comparison_results.txt with the new output) or remove and recreate it as
a CI artifact only after tests pass, and include the regenerated file in the
same commit; ensure the commit message references the fixed tests and
regenerated snapshot so CI can validate the updated numbers.

In `@online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go`:
- Around line 268-305: The loop uses p.bitmap[i/8] without bounds checking which
can panic if the bitmap is too short; before the loop in the p.layoutVersion ==
2 && len(p.bitmap) > 0 branch compute needed := (len(values)+7)/8 and validate
len(p.bitmap) >= needed, and if not return a clear error (e.g.,
fmt.Errorf("invalid bitmap length: need %d bytes", needed)); add this guard so
indexing p.bitmap[i/8] is always safe.
- Around line 124-130: Serialize() incorrectly routes layoutVersion==2 through
serializeLayout1(), causing non-FP32 serializers to omit the layout‑2
header/bitmapMeta and corrupt deserialization; update Serialize() to handle case
2 by calling the layout‑2 serializer(s) (e.g., call serializeFP32AndLessV2 for
FP32 paths and implement or route other types to a serializeLayout2 variant that
appends bitmapMeta) or explicitly return an error for unsupported types; ensure
serializeFP32AndLessV2’s bitmapMeta format is applied consistently for all
layoutVersion==2 code paths and update any serializer functions referenced
(serializeLayout1, serializeFP32AndLessV2, or new serializeLayout2) so the
deserializer sees the header byte for every layout‑2 block.

---

Nitpick comments:
In `@online-feature-store/internal/data/blocks/layout_comparison_test.go`:
- Around line 303-446: The generateResultsFile function always writes
layout_comparison_results.txt to the repo root which can pollute CI or fail in
read‑only environments; change generateResultsFile to accept an output path
(e.g., outputDir string) or honor an env flag (e.g., LAYOUT_RESULTS_DIR or
LAYOUT_RESULTS_ENABLED) and, in tests, use t.TempDir() to pass a temp directory
(or skip writing when the flag is off); update callers/tests that invoke
generateResultsFile to provide the temp dir or check the flag so file creation
is opt‑in and not written to the repo root.

In `@online-feature-store/internal/handler/feature/retrieve.go`:
- Around line 883-889: GetDefaultValueByte is being called for each feature in
the hot path (h.config.GetDefaultValueByte in the loop) which can be expensive;
change the code to cache default values per feature-group/version (keyed by fgId
and version) — populate that cache once (preferably in preProcessForKeys or at
the start of the retrieval flow) and then look up the defaultValue from the
cache inside the loop instead of calling GetDefaultValueByte repeatedly, and
update the call to GetFeature to accept the cached defaultValue so you no longer
call h.config.GetDefaultValueByte for every featureLabel.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11efebf and af24654.

📒 Files selected for processing (10)

LAYOUT_TEST_RESULTS.md
online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go
online-feature-store/internal/data/blocks/deserialized_psdb_v2.go
online-feature-store/internal/data/blocks/deserialized_psdb_v2_test.go
online-feature-store/internal/data/blocks/layout_comparison_results.txt
online-feature-store/internal/data/blocks/layout_comparison_test.go
online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go
online-feature-store/internal/handler/feature/persist.go
online-feature-store/internal/handler/feature/retrieve.go
online-feature-store/internal/system/system.go

coderabbitai · 2026-02-24T09:04:46Z

LAYOUT_TEST_RESULTS.md

+| Test Scenario | Features | Default Ratio | Compression | Improvement |
+|---------------|----------|---------------|-------------|-------------|
+| High sparsity | 500 | 80% | ZSTD | **21.66%** ✅ |
+| Very high sparsity | 850 | 95% | ZSTD | **10.23%** ✅ |
+| Low sparsity | 1000 | 23% | ZSTD | **6.39%** ✅ |
+| Medium sparsity | 100 | 50% | ZSTD | **24.47%** ✅ |
+| Low sparsity | 200 | 20% | ZSTD | **8.90%** ✅ |
+| Edge case: All non-zero | 50 | 0% | ZSTD | **-3.50%** ⚠️ |
+| Edge case: All zeros | 100 | 100% | ZSTD | **18.75%** ✅ |
+| FP16 high sparsity | 500 | 70% | ZSTD | **28.54%** ✅ |
+| No compression | 500 | 60% | None | **56.85%** ✅ |
+


⚠️ Potential issue | 🟠 Major

Compressed size improvement percentages are inconsistent with layout_comparison_results.txt.

Six of the nine comparable rows in this table show different percentages from the matching entries in layout_comparison_results.txt, despite both files sharing the same generation date (January 7, 2026):

Scenario This file .txt file

500 features, 80% defaults 21.66% 23.72%

850 features, 95% defaults 10.23% 6.85%

1000 features, 23% defaults 6.39% 6.02%

100 features, 50% defaults 24.47% 23.66%

200 features, 20% defaults 8.90% 7.77%

500 FP16, 70% defaults 28.54% 27.11%

This suggests the two files were authored from different test runs or edited manually after generation. Both should be regenerated from a single, passing test run to ensure they agree.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@LAYOUT_TEST_RESULTS.md` around lines 11 - 22, The table in LAYOUT_TEST_RESULTS.md has inconsistent "Improvement" percentages compared to layout_comparison_results.txt; regenerate the results so both files come from the same successful test run and match exactly: rerun the layout comparison test suite, produce fresh outputs, and update LAYOUT_TEST_RESULTS.md entries (e.g., rows "High sparsity | 500 | 80%", "Very high sparsity | 850 | 95%", "Low sparsity | 1000 | 23%", "Medium sparsity | 100 | 50%", "Low sparsity | 200 | 20%", "FP16 high sparsity | 500 | 70%") to the values generated by layout_comparison_results.txt (or vice versa) ensuring date stamps remain identical and no manual edits alter the numeric percentages.

coderabbitai · 2026-02-24T09:04:46Z

LAYOUT_TEST_RESULTS.md

+
+### Test Coverage
+
+- ✅ 10 different scenarios covering sparsity from 0% to 100%


⚠️ Potential issue | 🟡 Minor

Scenario count in Test Coverage doesn't match either document.

Line 131 states "10 different scenarios" but the compressed-size table above (lines 11–21) has 9 rows, and layout_comparison_results.txt documents 13 scenarios. Please align the count with whichever authoritative set is actually exercised by the test suite.

✏️ Suggested fix

- - ✅ 10 different scenarios covering sparsity from 0% to 100% + - ✅ 13 different scenarios covering sparsity from 0% to 100%

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- ✅ 10 different scenarios covering sparsity from 0% to 100%

- ✅ 13 different scenarios covering sparsity from 0% to 100%

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@LAYOUT_TEST_RESULTS.md` at line 131, The summary line stating "10 different scenarios" is inconsistent with the compressed-size table (9 rows) and layout_comparison_results.txt (13 scenarios); pick the authoritative source used by the test suite (preferably layout_comparison_results.txt if it’s the executed output) and update the phrasing in LAYOUT_TEST_RESULTS.md (replace the "10 different scenarios" text) to match that authoritative scenario count, or alternatively update the compressed-size table to include the missing scenarios so all three sources (the "compressed-size" table, the "10 different scenarios" sentence, and layout_comparison_results.txt) consistently report the same number.

coderabbitai · 2026-02-24T09:04:46Z

online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go

 				// Verify all values
 				for i, expected := range []int32{1, 2, 3} {
-					feature, err := ddb.GetNumericScalarFeature(i)
+					feature, err := ddb.GetNumericScalarFeature(i, 3, []byte{0, 0, 0})


⚠️ Potential issue | 🟡 Minor

Pass the correct feature count and default size.

numFeatures=3 and []byte{0,0,0} are placeholders that don’t match actual feature counts or data type sizes. This can hide bitmap-path defects and will fail when defaults are returned for larger numeric types. Please pass the real count and a default sized via dataType.Size().

Also applies to: 124-124, 279-279, 336-336, 492-492, 549-549, 705-705, 762-762, 917-917, 978-978, 1146-1146, 1203-1203, 1359-1359, 1416-1416

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go` at line 67, Tests call ddb.GetNumericScalarFeature(i, 3, []byte{0,0,0}) with hardcoded numFeatures and default bytes which are wrong; replace the placeholder 3 and []byte{0,0,0} with the actual feature count and a default-sized slice derived from the feature's dataType.Size(). Locate calls to GetNumericScalarFeature (and similar numeric scalar lookup helpers) and pass the real feature count variable used in the test and construct the default value as make([]byte, dataType.Size()) or equivalent so bitmap-path and larger numeric types get correct-sized defaults.

coderabbitai · 2026-02-24T09:04:46Z

online-feature-store/internal/data/blocks/deserialized_psdb_v2_test.go

 				// Test each position
 				for pos := 0; pos < 3; pos++ {
-					feature, err := d.GetNumericScalarFeature(pos)
+					feature, err := d.GetNumericScalarFeature(pos, 3, []byte{0, 0, 0})


⚠️ Potential issue | 🟡 Minor

Use real numFeatures and correctly sized defaultValue.

Many calls pass numFeatures=3 and a 3‑byte default regardless of actual feature count/type. This hides layout‑2 bitmap behavior and will break if a default is returned for int32/float64 (size mismatch). Please pass the real feature count and a default sized to dataType.Size().

Example pattern:

defaultValue := make([]byte, types.DataTypeInt32.Size()) feature, err := d.GetNumericScalarFeature(pos, len(values), defaultValue)

Also applies to: 566-566, 590-590, 614-614, 638-638, 662-662, 999-999, 1104-1104, 1131-1131, 1158-1158, 1185-1185, 1212-1212

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2_test.go` at line 394, Calls to d.GetNumericScalarFeature are using a hardcoded numFeatures=3 and a 3‑byte default which is incorrect; compute the real feature count from the test's values slice (e.g. numFeatures := len(values)) and create defaultValue with the proper byte length from the feature data type (e.g. defaultValue := make([]byte, dataType.Size())). Then pass those into GetNumericScalarFeature(pos, numFeatures, defaultValue). Update every occurrence (the call sites that currently pass 3 and a 3‑byte default) to use the corresponding values length and dataType.Size() for the default.

coderabbitai · 2026-02-24T09:04:46Z

online-feature-store/internal/data/blocks/deserialized_psdb_v2.go

+	// NEW (optional)
+	BitmapMeta byte


⚠️ Potential issue | 🟠 Major

BitmapMeta isn’t preserved when cloning DeserializedPSDB.

BitmapMeta is now a functional flag, but Copy() doesn’t carry it over. Cached copies will lose bitmap presence metadata and can misread layout‑2 payloads as dense. Please copy BitmapMeta (and any related bitmap state) in Copy().

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2.go` around lines 19 - 20, The DeserializedPSDB.Copy() implementation currently omits the new BitmapMeta flag (and any bitmap-related state), causing cloned instances to lose bitmap metadata; update the Copy() method on DeserializedPSDB to copy BitmapMeta from the source into the destination and also deep-copy any bitmap-related fields (e.g., bitmap slices, pointers or flags that indicate presence/layout) so cloned objects preserve bitmap presence and layout-2 payload semantics.

coderabbitai · 2026-02-24T09:04:47Z

online-feature-store/internal/data/blocks/deserialized_psdb_v2.go

 	switch layoutVersion {
 	case 1:
 		ddb, err = deserializePSDBForLayout1(data)
+	case 2:
+		ddb, err = deserializePSDBForLayout2(data)
 	default:


⚠️ Potential issue | 🟡 Minor

Update “invalid layout version” tests after enabling layout‑2.

With layout version 2 now supported, tests expecting an error for version 2 should be updated (e.g., use version 3) to avoid CI failures.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2.go` around lines 46 - 51, Tests that assert an "invalid layout version" error must be updated because layoutVersion 2 is now handled by deserializePSDBForLayout2; find the switch on layoutVersion in deserialized_psdb_v2.go (cases for 1 and 2 call deserializePSDBForLayout1/deserializePSDBForLayout2) and update any unit tests that previously used version 2 as the invalid value to use a truly unsupported value (e.g., 3) so they exercise the default error path instead of the new layout-2 handler.

coderabbitai · 2026-02-24T09:04:47Z

online-feature-store/internal/data/blocks/deserialized_psdb_v2.go

+func (dd *DeserializedPSDB) GetNumericScalarFeature(
+	pos int,
+	numFeatures int,
+	defaultValue []byte,
+) ([]byte, error) {
+
 	size := dd.DataType.Size()
-	start := pos * size
-	end := start + size
-	if start >= len(dd.OriginalData) || end > len(dd.OriginalData) {
+	data := dd.OriginalData
+	offset := 0
+
+	// ─────────────────────────────
+	// Layout-2 bitmap handling
+	// ─────────────────────────────
+	if dd.LayoutVersion == 2 && (dd.BitmapMeta&0x08) != 0 {
+
+		bitmapSize := (numFeatures + 7) / 8
+		if len(data) < bitmapSize {
+			return nil, fmt.Errorf("corrupt bitmap payload")
+		}
+
+		bitmap := data[:bitmapSize]
+		dense := data[bitmapSize:]
+
+		byteIdx := pos / 8
+		bitIdx := pos % 8
+
+		if byteIdx >= len(bitmap) {
+			return nil, fmt.Errorf("bitmap index out of bounds")
+		}
+
+		// Feature is default
+		if (bitmap[byteIdx] & (1 << bitIdx)) == 0 {
+			return defaultValue, nil
+		}
+
+		denseIdx := countSetBitsBefore(bitmap, pos, numFeatures)
+		start := denseIdx * size
+		end := start + size
+
+		if end > len(dense) {
+			return nil, fmt.Errorf(
+				"dense offset out of bounds (idx=%d start=%d len=%d)",
+				denseIdx, start, len(dense),
+			)
+		}
+
+		return dense[start:end], nil
+	}
+
+	// ─────────────────────────────
+	// Dense value access
+	// ─────────────────────────────
+	offset = pos * size
+	end := offset + size
+
+	if offset < 0 || end > len(data) {
 		return nil, fmt.Errorf("position out of bounds")
 	}
-	return dd.OriginalData[start:end], nil
+
+	return data[offset:end], nil


⚠️ Potential issue | 🟠 Major

Add bounds checks in the bitmap path.

The bitmap branch doesn’t guard pos < 0 or pos >= numFeatures. A negative pos panics, and oversized pos can silently return defaults from padding bits.

🐛 Suggested fix

if dd.LayoutVersion == 2 && (dd.BitmapMeta&0x08) != 0 { + if pos < 0 || pos >= numFeatures { + return nil, fmt.Errorf("position out of bounds") + } bitmapSize := (numFeatures + 7) / 8 if len(data) < bitmapSize { return nil, fmt.Errorf("corrupt bitmap payload")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (dd *DeserializedPSDB) GetNumericScalarFeature(

pos int,

numFeatures int,

defaultValue []byte,

) ([]byte, error) {

size := dd.DataType.Size()

start := pos * size

end := start + size

if start >= len(dd.OriginalData) || end > len(dd.OriginalData) {

data := dd.OriginalData

offset := 0

// ─────────────────────────────

// Layout-2 bitmap handling

// ─────────────────────────────

if dd.LayoutVersion == 2 && (dd.BitmapMeta&0x08) != 0 {

bitmapSize := (numFeatures + 7) / 8

if len(data) < bitmapSize {

return nil, fmt.Errorf("corrupt bitmap payload")

}

bitmap := data[:bitmapSize]

dense := data[bitmapSize:]

byteIdx := pos / 8

bitIdx := pos % 8

if byteIdx >= len(bitmap) {

return nil, fmt.Errorf("bitmap index out of bounds")

}

// Feature is default

if (bitmap[byteIdx] & (1 << bitIdx)) == 0 {

return defaultValue, nil

}

denseIdx := countSetBitsBefore(bitmap, pos, numFeatures)

start := denseIdx * size

end := start + size

if end > len(dense) {

return nil, fmt.Errorf(

"dense offset out of bounds (idx=%d start=%d len=%d)",

denseIdx, start, len(dense),

)

}

return dense[start:end], nil

}

// ─────────────────────────────

// Dense value access

// ─────────────────────────────

offset = pos * size

end := offset + size

if offset < 0 || end > len(data) {

return nil, fmt.Errorf("position out of bounds")

}

return dd.OriginalData[start:end], nil

return data[offset:end], nil

func (dd *DeserializedPSDB) GetNumericScalarFeature(

pos int,

numFeatures int,

defaultValue []byte,

) ([]byte, error) {

size := dd.DataType.Size()

data := dd.OriginalData

offset := 0

// ─────────────────────────────

// Layout-2 bitmap handling

// ─────────────────────────────

if dd.LayoutVersion == 2 && (dd.BitmapMeta&0x08) != 0 {

if pos < 0 || pos >= numFeatures {

return nil, fmt.Errorf("position out of bounds")

}

bitmapSize := (numFeatures + 7) / 8

if len(data) < bitmapSize {

return nil, fmt.Errorf("corrupt bitmap payload")

}

bitmap := data[:bitmapSize]

dense := data[bitmapSize:]

byteIdx := pos / 8

bitIdx := pos % 8

if byteIdx >= len(bitmap) {

return nil, fmt.Errorf("bitmap index out of bounds")

}

// Feature is default

if (bitmap[byteIdx] & (1 << bitIdx)) == 0 {

return defaultValue, nil

}

denseIdx := countSetBitsBefore(bitmap, pos, numFeatures)

start := denseIdx * size

end := start + size

if end > len(dense) {

return nil, fmt.Errorf(

"dense offset out of bounds (idx=%d start=%d len=%d)",

denseIdx, start, len(dense),

)

}

return dense[start:end], nil

}

// ─────────────────────────────

// Dense value access

// ─────────────────────────────

offset = pos * size

end := offset + size

if offset < 0 || end > len(data) {

return nil, fmt.Errorf("position out of bounds")

}

return data[offset:end], nil

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2.go` around lines 329 - 388, The bitmap branch in DeserializedPSDB.GetNumericScalarFeature fails to validate pos, allowing negative or >= numFeatures values to panic or misbehave; add explicit bounds checks near the top of the bitmap path (before computing byteIdx/bitIdx or indexing bitmap) to return an error (e.g., "position out of bounds") if pos < 0 or pos >= numFeatures, and then proceed with the existing bitmapSize/byteIdx/bitIdx logic so countSetBitsBefore and bitmap indexing cannot be invoked with an invalid pos.

coderabbitai · 2026-02-24T09:04:47Z

online-feature-store/internal/data/blocks/layout_comparison_results.txt

+╔════════════════════════════════════════════════════════════════════════════════╗
+║            Layout1 vs Layout2 Compression Test Results                        ║
+║                    Generated: 2026-01-07 15:32:12                                 ║
+╚════════════════════════════════════════════════════════════════════════════════╝
+
+┌────────────────────────────────────────────────────────────────────────────────┐
+│ Test Results Summary                                                           │
+└────────────────────────────────────────────────────────────────────────────────┘
+
+Test Name                                          | Features |     Defaults |   Original Δ | Compressed Δ
+--------------------------------------------------------------------------------------------------------------
+500 features with 80% defaults (high sparsity)     |      500 |       80.0% |      76.85% |      23.72% ✅
+850 features with 95% defaults (very high spars... |      850 |       95.0% |      91.79% |       6.85% ✅
+850 features with 0% defaults (very high sparsity) |      850 |        0.0% |      -3.15% |      -0.23% ⚠️ 
+850 features with 100% defaults (very high spar... |      850 |      100.0% |      96.85% |       6.67% ✅
+850 features with 80% defaults (very high spars... |      850 |       80.0% |      76.85% |      18.78% ✅
+850 features with 50% defaults (very high spars... |      850 |       50.0% |      46.85% |      18.08% ✅
+1000 features with 23% defaults (low sparsity)     |     1000 |       23.0% |      19.88% |       6.02% ✅
+100 features with 50% defaults (medium sparsity)   |      100 |       50.0% |      46.75% |      23.66% ✅
+200 features with 20% defaults (low sparsity)      |      200 |       20.0% |      16.88% |       7.77% ✅
+50 features with 0% defaults (all non-zero) - b... |       50 |        0.0% |      -3.50% |      -3.50% ⚠️ 
+100 features with 100% defaults (all zeros)        |      100 |      100.0% |      96.75% |      18.75% ✅
+500 features FP16 with 70% defaults                |      500 |       70.0% |      63.70% |      27.11% ✅
+500 features with 60% defaults (No compression)    |      500 |       60.0% |      56.85% |      56.85% ✅
+
+
+┌────────────────────────────────────────────────────────────────────────────────┐
+│ Detailed Results                                                               │
+└────────────────────────────────────────────────────────────────────────────────┘
+
+1. 500 features with 80% defaults (high sparsity)
+   ──────────────────────────────────────────────────────────────────────────────
+   Configuration:
+     Features:        500 total | 100 non-zero (20.0%) | 400 defaults (80.0%)
+     Data Type:       DataTypeFP32
+     Compression:     1
+
+   Layout1 (Baseline):
+     Original Size:     2000 bytes
+     Compressed Size:    607 bytes
+
+   Layout2 (Optimized):
+     Original Size:      463 bytes
+     Compressed Size:    463 bytes
+
+   Improvements:
+     Original Size:    +1537 bytes (76.85%)
+     Compressed Size:   +144 bytes (23.72%)
+     Total Size:      23.21% reduction
+     Result: ✅ Layout2 is BETTER
+
+2. 850 features with 95% defaults (very high sparsity)
+   ──────────────────────────────────────────────────────────────────────────────
+   Configuration:
+     Features:        850 total | 43 non-zero (5.1%) | 807 defaults (95.0%)
+     Data Type:       DataTypeFP32
+     Compression:     1
+
+   Layout1 (Baseline):
+     Original Size:     3400 bytes
+     Compressed Size:    292 bytes
+
+   Layout2 (Optimized):
+     Original Size:      279 bytes
+     Compressed Size:    272 bytes
+
+   Improvements:
+     Original Size:    +3121 bytes (91.79%)
+     Compressed Size:    +20 bytes (6.85%)
+     Total Size:      6.31% reduction
+     Result: ✅ Layout2 is BETTER
+
+3. 850 features with 0% defaults (very high sparsity)
+   ──────────────────────────────────────────────────────────────────────────────
+   Configuration:
+     Features:        850 total | 850 non-zero (100.0%) | 0 defaults (0.0%)
+     Data Type:       DataTypeFP32
+     Compression:     1
+
+   Layout1 (Baseline):
+     Original Size:     3400 bytes
+     Compressed Size:   3097 bytes
+
+   Layout2 (Optimized):
+     Original Size:     3507 bytes
+     Compressed Size:   3104 bytes
+
+   Improvements:
+     Original Size:     -107 bytes (-3.15%)
+     Compressed Size:     -7 bytes (-0.23%)
+     Total Size:      -0.26% reduction
+     Result: ⚠️  Layout2 has overhead (expected for 0% defaults)
+
+4. 850 features with 100% defaults (very high sparsity)
+   ──────────────────────────────────────────────────────────────────────────────
+   Configuration:
+     Features:        850 total | 0 non-zero (0.0%) | 850 defaults (100.0%)
+     Data Type:       DataTypeFP32
+     Compression:     1
+
+   Layout1 (Baseline):
+     Original Size:     3400 bytes
+     Compressed Size:     15 bytes
+
+   Layout2 (Optimized):
+     Original Size:      107 bytes
+     Compressed Size:     14 bytes
+
+   Improvements:
+     Original Size:    +3293 bytes (96.85%)
+     Compressed Size:     +1 bytes (6.67%)
+     Total Size:      0.00% reduction
+     Result: ✅ Layout2 is BETTER
+
+5. 850 features with 80% defaults (very high sparsity)
+   ──────────────────────────────────────────────────────────────────────────────
+   Configuration:
+     Features:        850 total | 170 non-zero (20.0%) | 680 defaults (80.0%)
+     Data Type:       DataTypeFP32
+     Compression:     1
+
+   Layout1 (Baseline):
+     Original Size:     3400 bytes
+     Compressed Size:    969 bytes
+
+   Layout2 (Optimized):
+     Original Size:      787 bytes
+     Compressed Size:    787 bytes
+
+   Improvements:
+     Original Size:    +2613 bytes (76.85%)
+     Compressed Size:   +182 bytes (18.78%)
+     Total Size:      18.51% reduction
+     Result: ✅ Layout2 is BETTER
+
+6. 850 features with 50% defaults (very high sparsity)
+   ──────────────────────────────────────────────────────────────────────────────
+   Configuration:
+     Features:        850 total | 425 non-zero (50.0%) | 425 defaults (50.0%)
+     Data Type:       DataTypeFP32
+     Compression:     1
+
+   Layout1 (Baseline):
+     Original Size:     3400 bytes
+     Compressed Size:   2063 bytes
+
+   Layout2 (Optimized):
+     Original Size:     1807 bytes
+     Compressed Size:   1690 bytes
+
+   Improvements:
+     Original Size:    +1593 bytes (46.85%)
+     Compressed Size:   +373 bytes (18.08%)
+     Total Size:      17.95% reduction
+     Result: ✅ Layout2 is BETTER
+
+7. 1000 features with 23% defaults (low sparsity)
+   ──────────────────────────────────────────────────────────────────────────────
+   Configuration:
+     Features:        1000 total | 770 non-zero (77.0%) | 230 defaults (23.0%)
+     Data Type:       DataTypeFP32
+     Compression:     1
+
+   Layout1 (Baseline):
+     Original Size:     4000 bytes
+     Compressed Size:   3125 bytes
+
+   Layout2 (Optimized):
+     Original Size:     3205 bytes
+     Compressed Size:   2937 bytes
+
+   Improvements:
+     Original Size:     +795 bytes (19.88%)
+     Compressed Size:   +188 bytes (6.02%)
+     Total Size:      5.97% reduction
+     Result: ✅ Layout2 is BETTER
+
+8. 100 features with 50% defaults (medium sparsity)
+   ──────────────────────────────────────────────────────────────────────────────
+   Configuration:
+     Features:        100 total | 50 non-zero (50.0%) | 50 defaults (50.0%)
+     Data Type:       DataTypeFP32
+     Compression:     1
+
+   Layout1 (Baseline):
+     Original Size:      400 bytes
+     Compressed Size:    279 bytes
+
+   Layout2 (Optimized):
+     Original Size:      213 bytes
+     Compressed Size:    213 bytes
+
+   Improvements:
+     Original Size:     +187 bytes (46.75%)
+     Compressed Size:    +66 bytes (23.66%)
+     Total Size:      22.57% reduction
+     Result: ✅ Layout2 is BETTER
+
+9. 200 features with 20% defaults (low sparsity)
+   ──────────────────────────────────────────────────────────────────────────────
+   Configuration:
+     Features:        200 total | 160 non-zero (80.0%) | 40 defaults (20.0%)
+     Data Type:       DataTypeFP32
+     Compression:     1
+
+   Layout1 (Baseline):
+     Original Size:      800 bytes
+     Compressed Size:    721 bytes
+
+   Layout2 (Optimized):
+     Original Size:      665 bytes
+     Compressed Size:    665 bytes
+
+   Improvements:
+     Original Size:     +135 bytes (16.88%)
+     Compressed Size:    +56 bytes (7.77%)
+     Total Size:      7.53% reduction
+     Result: ✅ Layout2 is BETTER
+
+10. 50 features with 0% defaults (all non-zero) - bitmap overhead expected
+   ──────────────────────────────────────────────────────────────────────────────
+   Configuration:
+     Features:        50 total | 50 non-zero (100.0%) | 0 defaults (0.0%)
+     Data Type:       DataTypeFP32
+     Compression:     1
+
+   Layout1 (Baseline):
+     Original Size:      200 bytes
+     Compressed Size:    200 bytes
+
+   Layout2 (Optimized):
+     Original Size:      207 bytes
+     Compressed Size:    207 bytes
+
+   Improvements:
+     Original Size:       -7 bytes (-3.50%)
+     Compressed Size:     -7 bytes (-3.50%)
+     Total Size:      -3.83% reduction
+     Result: ⚠️  Layout2 has overhead (expected for 0% defaults)
+
+11. 100 features with 100% defaults (all zeros)
+   ──────────────────────────────────────────────────────────────────────────────
+   Configuration:
+     Features:        100 total | 0 non-zero (0.0%) | 100 defaults (100.0%)
+     Data Type:       DataTypeFP32
+     Compression:     1
+
+   Layout1 (Baseline):
+     Original Size:      400 bytes
+     Compressed Size:     16 bytes
+
+   Layout2 (Optimized):
+     Original Size:       13 bytes
+     Compressed Size:     13 bytes
+
+   Improvements:
+     Original Size:     +387 bytes (96.75%)
+     Compressed Size:     +3 bytes (18.75%)
+     Total Size:      8.00% reduction
+     Result: ✅ Layout2 is BETTER
+
+12. 500 features FP16 with 70% defaults
+   ──────────────────────────────────────────────────────────────────────────────
+   Configuration:
+     Features:        500 total | 150 non-zero (30.0%) | 350 defaults (70.0%)
+     Data Type:       DataTypeFP16
+     Compression:     1
+
+   Layout1 (Baseline):
+     Original Size:     1000 bytes
+     Compressed Size:    498 bytes
+
+   Layout2 (Optimized):
+     Original Size:      363 bytes
+     Compressed Size:    363 bytes
+
+   Improvements:
+     Original Size:     +637 bytes (63.70%)
+     Compressed Size:   +135 bytes (27.11%)
+     Total Size:      26.43% reduction
+     Result: ✅ Layout2 is BETTER
+
+13. 500 features with 60% defaults (No compression)
+   ──────────────────────────────────────────────────────────────────────────────
+   Configuration:
+     Features:        500 total | 200 non-zero (40.0%) | 300 defaults (60.0%)
+     Data Type:       DataTypeFP32
+     Compression:     0
+
+   Layout1 (Baseline):
+     Original Size:     2000 bytes
+     Compressed Size:   2000 bytes
+
+   Layout2 (Optimized):
+     Original Size:      863 bytes
+     Compressed Size:    863 bytes
+
+   Improvements:
+     Original Size:    +1137 bytes (56.85%)
+     Compressed Size:  +1137 bytes (56.85%)
+     Total Size:      56.55% reduction
+     Result: ✅ Layout2 is BETTER
+
+
+┌────────────────────────────────────────────────────────────────────────────────┐
+│ Aggregate Statistics                                                           │
+└────────────────────────────────────────────────────────────────────────────────┘
+
+Tests Passed: 11/13 scenarios
+Layout2 Better: 11/13 scenarios (84.6%)
+
+Average Improvements (excluding 0% defaults):
+  Original Size:    57.50% reduction
+  Compressed Size:  17.85% reduction
+
+Maximum Improvements:
+  Original Size:    96.85% reduction
+  Compressed Size:  56.85% reduction
+
+Minimum Improvements (with defaults present):
+  Original Size:    16.88% reduction
+  Compressed Size:  6.02% reduction
+
+
+┌────────────────────────────────────────────────────────────────────────────────┐
+│ Conclusion                                                                     │
+└────────────────────────────────────────────────────────────────────────────────┘
+
+✅ Layout2 should be used as the default layout version.
+
+Rationale:
+  • Consistent improvements in 11 out of 13 scenarios (84.6%)
+  • Average compressed size reduction: 17.85%
+  • Maximum original size reduction: 96.85%
+  • Minimal overhead (3.5%) only in edge case with 0% defaults
+  • Production ML feature vectors typically have 20-95% sparsity
+


⚠️ Potential issue | 🔴 Critical

Test results are potentially stale due to CI pipeline failure.

The Go test suite for internal/data/blocks is currently failing with exit code 1. This file appears to have been committed as a static snapshot, but if the underlying tests are broken, the numbers here may not reflect the current state of the code. The file should be regenerated (or removed and regenerated as part of the CI artifact) only after all tests pass.

🧰 Tools

🪛 GitHub Actions: Online Feature Store CI

[error] 1-1: Go test failed for package internal/data/blocks (exit code 1).

🪛 LanguageTool

[grammar] ~38-~38: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 200...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~59-~59: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 340...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~80-~80: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 340...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~101-~101: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 340...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~122-~122: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 340...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~143-~143: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 340...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~164-~164: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 400...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~185-~185: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 40...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~206-~206: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 80...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~227-~227: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 20...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~248-~248: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 1 Layout1 (Baseline): Original Size: 40...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~269-~269: Ensure spelling is correct
Context: ...ataTypeFP16 Compression: 1 Layout1 (Baseline): Original Size: 100...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~290-~290: Ensure spelling is correct
Context: ...ataTypeFP32 Compression: 0 Layout1 (Baseline): Original Size: 200...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~309-~309: Ensure spelling is correct
Context: ...────────┘ Tests Passed: 11/13 scenarios Layout2 Better: 11/13 scenarios (84.6%) Averag...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@online-feature-store/internal/data/blocks/layout_comparison_results.txt` around lines 1 - 337, The committed static snapshot layout_comparison_results.txt is stale because the Go test suite for internal/data/blocks is failing (exit code 1); fix by running and resolving the failing tests (go test ./internal/data/blocks), then regenerate the snapshot file (overwrite layout_comparison_results.txt with the new output) or remove and recreate it as a CI artifact only after tests pass, and include the regenerated file in the same commit; ensure the commit message references the fixed tests and regenerated snapshot so CI can validate the updated numbers.

coderabbitai · 2026-02-24T09:04:47Z

online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go

 func (p *PermStorageDataBlock) Serialize() ([]byte, error) {
 	switch p.layoutVersion {
 	case 1:
 		return p.serializeLayout1()
+	case 2:
+		return p.serializeLayout1()
 	default:


⚠️ Potential issue | 🟠 Major

Layout‑2 header byte is missing for non‑FP32 types.

layoutVersion==2 routes through serializeLayout1 for all data types, but only serializeFP32AndLessV2 appends bitmapMeta. For other types, the deserializer will treat the first payload byte as bitmapMeta, corrupting reads. Either append bitmapMeta for all layout‑2 serializers or reject layout‑2 for unsupported types.

🐛 Possible guard to prevent silent corruption

switch p.layoutVersion { case 1: return p.serializeLayout1() case 2: - return p.serializeLayout1() + switch p.dataType { + case types.DataTypeFP32, types.DataTypeFP16, types.DataTypeFP8E4M3, types.DataTypeFP8E5M2: + return p.serializeLayout1() + default: + return nil, fmt.Errorf("layout-2 not supported for data type: %s", p.dataType) + } default: return nil, fmt.Errorf("unsupported layout version: %d", p.layoutVersion) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (p *PermStorageDataBlock) Serialize() ([]byte, error) {

switch p.layoutVersion {

case 1:

return p.serializeLayout1()

case 2:

return p.serializeLayout1()

default:

func (p *PermStorageDataBlock) Serialize() ([]byte, error) {

switch p.layoutVersion {

case 1:

return p.serializeLayout1()

case 2:

switch p.dataType {

case types.DataTypeFP32, types.DataTypeFP16, types.DataTypeFP8E4M3, types.DataTypeFP8E5M2:

return p.serializeLayout1()

default:

return nil, fmt.Errorf("layout-2 not supported for data type: %s", p.dataType)

}

default:

return nil, fmt.Errorf("unsupported layout version: %d", p.layoutVersion)

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go` around lines 124 - 130, Serialize() incorrectly routes layoutVersion==2 through serializeLayout1(), causing non-FP32 serializers to omit the layout‑2 header/bitmapMeta and corrupt deserialization; update Serialize() to handle case 2 by calling the layout‑2 serializer(s) (e.g., call serializeFP32AndLessV2 for FP32 paths and implement or route other types to a serializeLayout2 variant that appends bitmapMeta) or explicitly return an error for unsupported types; ensure serializeFP32AndLessV2’s bitmapMeta format is applied consistently for all layoutVersion==2 code paths and update any serializer functions referenced (serializeLayout1, serializeFP32AndLessV2, or new serializeLayout2) so the deserializer sees the header byte for every layout‑2 block.

coderabbitai · 2026-02-24T09:04:47Z

online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go

+	if p.layoutVersion == 2 && len(p.bitmap) > 0 {
+
+		for i, v := range values {
+			if (p.bitmap[i/8] & (1 << (i % 8))) == 0 {
+				continue
+			}
+			putFloat(p.originalData[idx:idx+unitSize], v)
+			idx += unitSize
+		}
+
+		p.originalData = p.originalData[:idx]
+	} else {
+		for _, v := range values {
+			putFloat(p.originalData[idx:idx+unitSize], v)
+			idx += unitSize
+		}
+	}
+
+	// ─────────────────────────────
+	// Step 2: layout-2 payload handling
+	// ─────────────────────────────
+	if p.layoutVersion == 2 {
+		// prepend bitmap to payload if present
+		if len(p.bitmap) > 0 {
+			p.bitmapMeta = p.bitmapMeta | 1<<3 // bitmapPresent = 1
+			tmp := make([]byte, 0, len(p.bitmap)+len(p.originalData))
+			tmp = append(tmp, p.bitmap...)
+			tmp = append(tmp, p.originalData...)
+			p.originalData = tmp
+		}
+
+		// append bitmapMeta to header
+		if len(p.buf) != PSDBLayout1LengthBytes {
+			return nil, fmt.Errorf("invalid base header length for layout-2")
+		}
+		p.buf = append(p.buf, p.bitmapMeta)
 	}
+


⚠️ Potential issue | 🟠 Major

Validate bitmap length before indexing.

The loop indexes p.bitmap[i/8] without checking size; a malformed bitmap can panic. Add a guard that the bitmap has at least ceil(len(values)/8) bytes.

🐛 Suggested fix

if p.layoutVersion == 2 && len(p.bitmap) > 0 { + if len(p.bitmap)*8 < len(values) { + return nil, fmt.Errorf("bitmap too small for %d values", len(values)) + } for i, v := range values { if (p.bitmap[i/8] & (1 << (i % 8))) == 0 { continue

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if p.layoutVersion == 2 && len(p.bitmap) > 0 {

for i, v := range values {

if (p.bitmap[i/8] & (1 << (i % 8))) == 0 {

continue

}

putFloat(p.originalData[idx:idx+unitSize], v)

idx += unitSize

}

p.originalData = p.originalData[:idx]

} else {

for _, v := range values {

putFloat(p.originalData[idx:idx+unitSize], v)

idx += unitSize

}

}

// ─────────────────────────────

// Step 2: layout-2 payload handling

// ─────────────────────────────

if p.layoutVersion == 2 {

// prepend bitmap to payload if present

if len(p.bitmap) > 0 {

p.bitmapMeta = p.bitmapMeta | 1<<3 // bitmapPresent = 1

tmp := make([]byte, 0, len(p.bitmap)+len(p.originalData))

tmp = append(tmp, p.bitmap...)

tmp = append(tmp, p.originalData...)

p.originalData = tmp

}

// append bitmapMeta to header

if len(p.buf) != PSDBLayout1LengthBytes {

return nil, fmt.Errorf("invalid base header length for layout-2")

}

p.buf = append(p.buf, p.bitmapMeta)

}

if p.layoutVersion == 2 && len(p.bitmap) > 0 {

if len(p.bitmap)*8 < len(values) {

return nil, fmt.Errorf("bitmap too small for %d values", len(values))

}

for i, v := range values {

if (p.bitmap[i/8] & (1 << (i % 8))) == 0 {

continue

}

putFloat(p.originalData[idx:idx+unitSize], v)

idx += unitSize

}

p.originalData = p.originalData[:idx]

} else {

for _, v := range values {

putFloat(p.originalData[idx:idx+unitSize], v)

idx += unitSize

}

}

// ─────────────────────────────

// Step 2: layout-2 payload handling

// ─────────────────────────────

if p.layoutVersion == 2 {

// prepend bitmap to payload if present

if len(p.bitmap) > 0 {

p.bitmapMeta = p.bitmapMeta | 1<<3 // bitmapPresent = 1

tmp := make([]byte, 0, len(p.bitmap)+len(p.originalData))

tmp = append(tmp, p.bitmap...)

tmp = append(tmp, p.originalData...)

p.originalData = tmp

}

// append bitmapMeta to header

if len(p.buf) != PSDBLayout1LengthBytes {

return nil, fmt.Errorf("invalid base header length for layout-2")

}

p.buf = append(p.buf, p.bitmapMeta)

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go` around lines 268 - 305, The loop uses p.bitmap[i/8] without bounds checking which can panic if the bitmap is too short; before the loop in the p.layoutVersion == 2 && len(p.bitmap) > 0 branch compute needed := (len(values)+7)/8 and validate len(p.bitmap) >= needed, and if not return a clear error (e.g., fmt.Errorf("invalid bitmap length: need %d bytes", needed)); add this guard so indexing p.bitmap[i/8] is always safe.

Adit2607 and others added 3 commits January 6, 2026 14:32

Merge branch 'develop' of github.com:Meesho/BharatMLStack-internal in…

4fcc58f

…to develop

Adding 1 More layout for decreasig FS network

a14f1df

Merge branch 'develop' of github.com:Meesho/BharatMLStack into poc/fs…

af24654

…_layout

coderabbitai bot reviewed Feb 24, 2026

View reviewed changes

Scenario	This file	`.txt` file
500 features, 80% defaults	21.66%	23.72%
850 features, 95% defaults	10.23%	6.85%
1000 features, 23% defaults	6.39%	6.02%
100 features, 50% defaults	24.47%	23.66%
200 features, 20% defaults	8.90%	7.77%
500 FP16, 70% defaults	28.54%	27.11%


		### Test Coverage

		- ✅ 10 different scenarios covering sparsity from 0% to 100%

	- ✅ 10 different scenarios covering sparsity from 0% to 100%
	- ✅ 13 different scenarios covering sparsity from 0% to 100%

Comments

Conversation

Adit2607 commented Jan 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔁 Pull Request Template – BharatMLStack

📌 Summary

📂 Modules Affected

✅ Type of Change

📊 Benchmark / Metrics (if applicable)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adit2607 commented Jan 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading