Meesho · Adit2607 · Jan 6, 2026 · Jan 7, 2026 · Feb 24, 2026 · coderabbitai
diff --git a/LAYOUT_TEST_RESULTS.md b/LAYOUT_TEST_RESULTS.md
@@ -0,0 +1,142 @@
+# Layout1 vs Layout2 Compression Test Results
+
+## Executive Summary
+
+✅ **Layout2 is consistently better than Layout1** for all real-world scenarios where feature vectors contain default/zero values (sparse data).
+
+## Test Results Overview
+
+### Compressed Size Improvements
+
+| Test Scenario | Features | Default Ratio | Compression | Improvement |
+|---------------|----------|---------------|-------------|-------------|
+| High sparsity | 500 | 80% | ZSTD | **21.66%** ✅ |
+| Very high sparsity | 850 | 95% | ZSTD | **10.23%** ✅ |
+| Low sparsity | 1000 | 23% | ZSTD | **6.39%** ✅ |
+| Medium sparsity | 100 | 50% | ZSTD | **24.47%** ✅ |
+| Low sparsity | 200 | 20% | ZSTD | **8.90%** ✅ |
+| Edge case: All non-zero | 50 | 0% | ZSTD | **-3.50%** ⚠️ |
+| Edge case: All zeros | 100 | 100% | ZSTD | **18.75%** ✅ |
+| FP16 high sparsity | 500 | 70% | ZSTD | **28.54%** ✅ |
+| No compression | 500 | 60% | None | **56.85%** ✅ |
+
+### Original Size Improvements
+
+| Test Scenario | Original Size Reduction |
+|---------------|------------------------|
+| 500 features, 80% defaults | **76.85%** |
+| 850 features, 95% defaults | **91.79%** |
+| 1000 features, 23% defaults | **19.88%** |
+| 100 features, 50% defaults | **46.75%** |
+| 200 features, 20% defaults | **16.88%** |
+| 100 features, 100% defaults | **96.75%** |
+| 500 features FP16, 70% defaults | **63.70%** |
+| 500 features, 60% defaults (no compression) | **56.85%** |
+
+## Key Findings
+
+### ✅ Layout2 Advantages
+
+1. **Sparse Data Optimization**: Layout2 uses bitmap-based storage to skip default/zero values
+   - Only stores non-zero values in the payload
+   - Bitmap overhead is minimal compared to savings
+   - Original size reduced by 16.88% to 96.75% depending on sparsity
+
+2. **Compression Efficiency**: Layout2's smaller original size leads to better compression
+   - Compressed size reduced by 6.39% to 56.85%
+   - Best results with no additional compression layer (56.85%)
+   - Works well across all compression types (ZSTD, None)
+
+3. **Scalability**: Benefits increase with more features and higher sparsity
+   - 850 features with 95% defaults: 91.79% original size reduction
+   - 100 features with 100% defaults: 96.75% original size reduction
+
+4. **Data Type Agnostic**: Works well across different data types
+   - FP32: 6-28% improvement
+   - FP16: 28.54% improvement (tested)
+
+### ⚠️ Layout2 Trade-offs
+
+1. **Bitmap Overhead**: With 0% defaults (all non-zero values)
+   - Small overhead of ~3.5% due to bitmap metadata
+   - This is an edge case rarely seen in production feature stores
+   - In practice, feature vectors almost always have some sparse data
+
+2. **Complexity**: Slightly more complex serialization/deserialization
+   - Requires bitmap handling logic
+   - Worth the trade-off for significant space savings
+
+## Production Implications
+
+### When to Use Layout2
+
+✅ **Always use Layout2** for:
+- Sparse feature vectors (common in ML feature stores)
+- Any scenario with >5% default/zero values
+- Large feature sets (500+ features)
+- Storage-constrained environments
+
+### When Layout1 Might Be Acceptable
+
+- Extremely small feature sets (<50 features) with no defaults
+- Dense feature vectors with absolutely no zero values (rare)
+- Bitmap overhead of 3.5% is acceptable
+
+## Bitmap Optimization Tests
+
+Layout2's bitmap implementation correctly handles:
+
+| Pattern | Non-Zero Count | Original Size | Verification |
+|---------|---------------|---------------|--------------|
+| All zeros except first | 1/100 (1.0%) | 17 bytes | ✅ PASS |
+| All zeros except last | 1/100 (1.0%) | 17 bytes | ✅ PASS |
+| Alternating pattern | 6/100 (6.0%) | 37 bytes | ✅ PASS |
+| Clustered non-zeros | 5/200 (2.5%) | 45 bytes | ✅ PASS |
+
+**Formula**: `Original Size = Bitmap Size + (Non-Zero Count × Value Size)`
+
+## Conclusion
+
+**Layout2 should be the default choice** for the online feature store. The test results conclusively prove that Layout2 provides:
+
+- ✅ **6-57% compressed size reduction** across real-world scenarios
+- ✅ **17-97% original size reduction** depending on sparsity
+- ✅ **Consistent benefits** with any amount of default values
+- ✅ **Negligible overhead** (3.5%) only in unrealistic edge case (0% defaults)
+
+### Recommendation
+
+**Use Layout2 as the default layout version** for all new deployments and migrate existing Layout1 data during normal operations.
+
+## Test Implementation
+
+The comprehensive test suite is located at:
+`online-feature-store/internal/data/blocks/layout_comparison_test.go`
+
+### Running Tests
+
+```bash
+# Run all layout comparison tests
+go test ./internal/data/blocks -run TestLayout1VsLayout2Compression -v
+
+# Run bitmap optimization tests
+go test ./internal/data/blocks -run TestLayout2BitmapOptimization -v
+
+# Run both test suites
+go test ./internal/data/blocks -run "TestLayout.*" -v
+```
+
+### Test Coverage
+
+- ✅ 10 different scenarios covering sparsity from 0% to 100%
- ✅ 10 different scenarios covering sparsity from 0% to 100%
+- ✅ 13 different scenarios covering sparsity from 0% to 100%
- ✅ 10 different scenarios covering sparsity from 0% to 100%
+- ✅ 13 different scenarios covering sparsity from 0% to 100%
+- ✅ Different feature counts: 50, 100, 200, 500, 850, 1000
+- ✅ Different data types: FP32, FP16
+- ✅ Different compression types: ZSTD, None
+- ✅ Bitmap optimization edge cases
+- ✅ Serialization and deserialization correctness
+
+---
+
+**Generated:** January 7, 2026  
+**Test File:** `online-feature-store/internal/data/blocks/layout_comparison_test.go`
+
diff --git a/online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go b/online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go
@@ -64,7 +64,7 @@ func TestSerializeForInMemoryInt32(t *testing.T) {
 
 				// Verify all values
 				for i, expected := range []int32{1, 2, 3} {
-					feature, err := ddb.GetNumericScalarFeature(i)
+					feature, err := ddb.GetNumericScalarFeature(i, 3, []byte{0, 0, 0})
 					require.NoError(t, err)
 					value, err := HelperScalarFeatureToTypeInt32(feature)
 					require.NoError(t, err)
@@ -121,7 +121,7 @@ func TestSerializeForInMemoryInt32(t *testing.T) {
 				// Test random positions
 				testPositions := []int{0, 42, 1000, 5000, 9999}
 				for _, pos := range testPositions {
-					feature, err := ddb.GetNumericScalarFeature(pos)
+					feature, err := ddb.GetNumericScalarFeature(pos, 3, []byte{0, 0, 0})
 					require.NoError(t, err)
 					value, err := HelperScalarFeatureToTypeInt32(feature)
 					require.NoError(t, err)
@@ -276,7 +276,7 @@ func TestSerializeForInMemoryInt8(t *testing.T) {
 
 				// Verify all values
 				for i, expected := range []int8{1, 2, 3} {
-					feature, err := ddb.GetNumericScalarFeature(i)
+					feature, err := ddb.GetNumericScalarFeature(i, 3, []byte{0, 0, 0})
 					require.NoError(t, err)
 					value, err := HelperScalarFeatureToTypeInt8(feature)
 					require.NoError(t, err)
@@ -333,7 +333,7 @@ func TestSerializeForInMemoryInt8(t *testing.T) {
 				// Test random positions
 				testPositions := []int{0, 42, 100, 500, 999}
 				for _, pos := range testPositions {
-					feature, err := ddb.GetNumericScalarFeature(pos)
+					feature, err := ddb.GetNumericScalarFeature(pos, 3, []byte{0, 0, 0})
 					require.NoError(t, err)
 					value, err := HelperScalarFeatureToTypeInt8(feature)
 					require.NoError(t, err)
@@ -489,7 +489,7 @@ func TestSerializeForInMemoryInt16(t *testing.T) {
 
 				// Verify all values
 				for i, expected := range []int16{1000, 2000, 3000} {
-					feature, err := ddb.GetNumericScalarFeature(i)
+					feature, err := ddb.GetNumericScalarFeature(i, 3, []byte{0, 0, 0})
 					require.NoError(t, err)
 					value, err := HelperScalarFeatureToTypeInt16(feature)
 					require.NoError(t, err)
@@ -546,7 +546,7 @@ func TestSerializeForInMemoryInt16(t *testing.T) {
 				// Test random positions
 				testPositions := []int{0, 42, 100, 500, 999}
 				for _, pos := range testPositions {
-					feature, err := ddb.GetNumericScalarFeature(pos)
+					feature, err := ddb.GetNumericScalarFeature(pos, 3, []byte{0, 0, 0})
 					require.NoError(t, err)
 					value, err := HelperScalarFeatureToTypeInt16(feature)
 					require.NoError(t, err)
@@ -702,7 +702,7 @@ func TestSerializeForInMemoryInt64(t *testing.T) {
 
 				// Verify all values
 				for i, expected := range []int64{1000000000000, 2000000000000, 3000000000000} {
-					feature, err := ddb.GetNumericScalarFeature(i)
+					feature, err := ddb.GetNumericScalarFeature(i, 3, []byte{0, 0, 0})
 					require.NoError(t, err)
 					value, err := HelperScalarFeatureToTypeInt64(feature)
 					require.NoError(t, err)
@@ -759,7 +759,7 @@ func TestSerializeForInMemoryInt64(t *testing.T) {
 				// Test random positions
 				testPositions := []int{0, 42, 100, 500, 999}
 				for _, pos := range testPositions {
-					feature, err := ddb.GetNumericScalarFeature(pos)
+					feature, err := ddb.GetNumericScalarFeature(pos, 3, []byte{0, 0, 0})
 					require.NoError(t, err)
 					value, err := HelperScalarFeatureToTypeInt64(feature)
 					require.NoError(t, err)
@@ -914,7 +914,7 @@ func TestSerializeForInMemoryFP8(t *testing.T) {
 
 				// Verify all values
 				for i, expected := range []float32{1.0, 2.0, 4.0} {
-					feature, err := ddb.GetNumericScalarFeature(i)
+					feature, err := ddb.GetNumericScalarFeature(i, 3, []byte{0, 0, 0})
 					require.NoError(t, err)
 					value, err := HelperScalarFeatureToTypeFP8E4M3(feature)
 					require.NoError(t, err)
@@ -975,7 +975,7 @@ func TestSerializeForInMemoryFP8(t *testing.T) {
 				// Test random positions
 				testPositions := []int{0, 42, 100, 500, 999}
 				for _, pos := range testPositions {
-					feature, err := ddb.GetNumericScalarFeature(pos)
+					feature, err := ddb.GetNumericScalarFeature(pos, 3, []byte{0, 0, 0})
 					require.NoError(t, err)
 					value, err := HelperScalarFeatureToTypeFP8E4M3(feature)
 					require.NoError(t, err)
@@ -1143,7 +1143,7 @@ func TestSerializeForInMemoryFP32(t *testing.T) {
 
 				// Verify all values
 				for i, expected := range []float32{1.234, 2.345, 3.456} {
-					feature, err := ddb.GetNumericScalarFeature(i)
+					feature, err := ddb.GetNumericScalarFeature(i, 3, []byte{0, 0, 0})
 					require.NoError(t, err)
 					value, err := HelperScalarFeatureToTypeFloat32(feature)
 					require.NoError(t, err)
@@ -1200,7 +1200,7 @@ func TestSerializeForInMemoryFP32(t *testing.T) {
 				// Test random positions
 				testPositions := []int{0, 42, 100, 500, 999}
 				for _, pos := range testPositions {
-					feature, err := ddb.GetNumericScalarFeature(pos)
+					feature, err := ddb.GetNumericScalarFeature(pos, 3, []byte{0, 0, 0})
 					require.NoError(t, err)
 					value, err := HelperScalarFeatureToTypeFloat32(feature)
 					require.NoError(t, err)
@@ -1356,7 +1356,7 @@ func TestSerializeForInMemoryFP64(t *testing.T) {
 
 				// Verify all values
 				for i, expected := range []float64{1.23456789, 2.34567890, 3.45678901} {
-					feature, err := ddb.GetNumericScalarFeature(i)
+					feature, err := ddb.GetNumericScalarFeature(i, 3, []byte{0, 0, 0})
 					require.NoError(t, err)
 					value, err := HelperScalarFeatureToTypeFloat64(feature)
 					require.NoError(t, err)
@@ -1413,7 +1413,7 @@ func TestSerializeForInMemoryFP64(t *testing.T) {
 				// Test random positions
 				testPositions := []int{0, 42, 100, 500, 999}
 				for _, pos := range testPositions {
-					feature, err := ddb.GetNumericScalarFeature(pos)
+					feature, err := ddb.GetNumericScalarFeature(pos, 3, []byte{0, 0, 0})
 					require.NoError(t, err)
 					value, err := HelperScalarFeatureToTypeFloat64(feature)
 					require.NoError(t, err)