Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
fef0cba
autoresearch: scaffold Pebble engine perf loop
pquerna May 25, 2026
f5b94e1
fix(equivalence/test): silence gosec G602 false positive
pquerna May 25, 2026
20a8ecd
P1.2b: L0CompactionThreshold 2→8. writepack_1m 4245.86→3575.28ms (-15…
pquerna May 25, 2026
e872976
P3.8: scratch byte buffers reused across PutGrantRecords loop + proto…
pquerna May 25, 2026
47a1ec9
Hoist resolveSyncBytes out of PutGrantRecords loop with last-value ca…
pquerna May 25, 2026
63c0869
Split PutGrantRecords into two pebble.Batches: primary keys vs index …
pquerna May 25, 2026
799a0c0
Use NewBatchWithSize(len*600) and len*140 to pre-allocate the priBatc…
pquerna May 25, 2026
b1a1700
Skip read-before-write Get during the first PutGrantRecords call of a…
pquerna May 25, 2026
99c76cd
Parallel-build PutGrantRecords for fresh-sync skipGet path with len(r…
pquerna May 25, 2026
1916594
autoresearch.md: document loop results (-52.3% from RFC baseline)
pquerna May 25, 2026
de09954
4-way shard the priBatch build in the parallel skipGet path. Each sha…
pquerna May 25, 2026
c9ca9a7
autoresearch.md: update with priBatch sharding win (-56.1% cumulative)
pquerna May 25, 2026
a0e1ce0
autoresearch.md: capture noise-floor calibration + parallel-close dea…
pquerna May 25, 2026
a864d68
Pre-sort the idxBatch via 4-way parallel sort + k-way merge before Se…
pquerna May 25, 2026
3d660b9
Arena-style storage for the parallel idx-key sort. Replaces the 2M in…
pquerna May 25, 2026
8525c14
Move priBatch.Commit and idxBatch.Commit INSIDE their respective para…
pquerna May 25, 2026
9b8fc47
After both parallel commits, call e.db.AsyncFlush() to force Pebble's…
pquerna May 25, 2026
d919128
Replace V2GrantToV3's per-grant builder allocations with a per-call g…
pquerna May 25, 2026
d4b7e29
Parallelize the V2\u2192V3 grant translation in Adapter.PutGrants wit…
pquerna May 25, 2026
4995f17
Async tmpdir cleanup in registeredStore.Close: spawn os.RemoveAll in …
pquerna May 25, 2026
77b0324
autoresearch.ideas.md: capture closed axes (do-not-retry list)
pquerna May 25, 2026
eacc8df
autoresearch.ideas.md: consolidate WriteEnvelope parallel-read attemp…
pquerna May 25, 2026
fdde009
autoresearch.ideas.md: close parallel-large-alloc axis (3 attempts co…
pquerna May 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions autoresearch.checks.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#!/usr/bin/env bash
# autoresearch.checks.sh — correctness gate. Runs after each successful bench.
# Suppresses verbose progress; only failures bubble up to the agent.
set -euo pipefail

export CGO_ENABLED=0

echo ">> engine + adapter + compactor + equivalence + envelope tests"
go test -tags=batonsdkv2 -count=1 -timeout=5m \
./pkg/dotc1z/engine/pebble/... \
./pkg/dotc1z/engine/equivalence/... \
./pkg/synccompactor/pebble/... \
./pkg/dotc1z/format/v3/... 2>&1 | tail -60

echo ">> SQLite engine regression guard"
go test -tags=baton_lambda_support -short -count=1 -timeout=5m \
./pkg/dotc1z/ 2>&1 | tail -40

echo ">> golangci-lint"
golangci-lint run --timeout=3m --build-tags=batonsdkv2 \
./pkg/dotc1z/engine/... \
./pkg/synccompactor/pebble/... 2>&1 | tail -40

echo ">> go.mod / go.sum drift"
if ! git diff --quiet -- go.mod go.sum; then
echo "FAIL: go.mod or go.sum modified — new dependency introduced"
git diff --stat -- go.mod go.sum
exit 1
fi

echo ">> proto wire format drift"
if ! git diff --quiet -- proto/c1/storage/v3/; then
echo "FAIL: proto wire format changed — out of scope for perf loop"
git diff --stat -- proto/c1/storage/v3/
exit 1
fi

echo "OK"
4 changes: 4 additions & 0 deletions autoresearch.config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"maxIterations": 200,
"workingDir": "/data/squire/src/baton-sdk"
}
70 changes: 70 additions & 0 deletions autoresearch.ideas.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Ideas backlog — Pebble engine perf

Free-form scratch. Append new ideas as bullets; mark tried ones with
status (kept / discarded / crashed) so we don't repeat them.

## To try (priority order)

- [ ] **P1.1** Memtable size 64 MiB → 256 MiB (`MemTableSize` in `options.go`).
- [ ] **P1.2** `L0CompactionThreshold` sweep: 2 → 4, 8.
- [ ] **P1.3** `MaxConcurrentCompactions` upper bound: 8 → 12 (gate on GOMAXPROCS).
- [ ] **P1.4** Enable bloom filters on L0 (FilterPolicy + FilterType).
- [ ] **P1.5** Mixed compression: Snappy at L0, zstd at L6.
- [ ] **P2.6** Per-record-type per-level options (grants vs resources).
- [ ] **P2.7** Codec codegen via `cmd/protoc-gen-batonstore` — replaces reflection path. Big change; may need human approval.
- [ ] **P3.8** Pool tuple encoder buffer (`AppendTupleString`) — kill per-record slice alloc.
- [ ] **P3.9** Larger SST block size (32 KiB → 64 KiB) — amortize header overhead.

## Tried — see jsonl for verdicts

(populated by the loop)

## Follow-up / human review

- Split-batch in PutGrantRecords (commit 63c0869b) breaks cross-batch atomicity:
if priBatch commits but idxBatch fails, primary records exist without
by_entitlement / by_principal index entries. Fresh-sync replays the
whole sync from the connector so it's OK there, but incremental Put
paths (mid-sync upserts) might leak. RFC stack-6 grant expansion path
could be a concrete victim. Consider:
- Apply split only when IsFreshSync() is true; keep one-batch atomic
semantics outside fresh-sync.
- Or: document the contract change.

## Closed axes (do NOT retry — multiple attempts confirm dead)

- **Parallel engine.Close + WriteEnvelope** (tried at #19, #28, #45 — three baselines).
Mechanism is theoretically safe (CheckpointTo creates self-contained dir), but
goroutine + channel coordination overhead exceeds the engine.Close wallclock
savings (~30-50 ms). At smaller scales the overhead dominates and regresses
10-15%. Not a clean win at any size.
- **Parallelize large heap allocations across goroutines** (#47 priBatch/idxBatch,
#48 priBatch sub-shards). Three different attempts. Go's heap allocator
serializes large (>32 KB) allocations through the central heap-arena mutex;
OS mmap underneath has kernel-level locks. Concurrent 150 MB-class allocs
from N goroutines queue serially, plus goroutine scheduling adds overhead
proportional to N. Stick to single-goroutine allocation for the big buffers.
- **FlushSplitBytes axis** (tried 2 MiB → 16 MiB at #21, #31; 2 MiB → 64 MiB at #37).
Pebble doesn't honor very large hints, or bigger SSTs lose write parallelism.
All flat-to-mildly-negative across multiple baselines.
- **Tournament tree / prefix-skip merge optimizations** (#39, #40). The naive
4-way bytes.Compare scan is already optimally branch-predictable and SIMD-tight;
wrapping with anything in Go costs more than it saves at k=4.
- **Parallel reads for WriteEnvelope** (#43 bulk-pre-read; #46 streaming with bounded
lookahead). Two different failure modes: #43 didn't actually overlap reads with
writes (3 serial phases); #46 did overlap but per-file os.ReadFile allocated
~530 MB of one-shot buffers vs io.Copy's reused 32 KB buffer. Pebble checkpoint
files are page-cache-hot anyway — io.Copy pulls them at memory speed, so serial
reading is already efficient. Closed axis.
- **Background WAL fsync** (WALBytesPerSync=4MiB, #38). On this hardware fsync
isn't a meaningful bottleneck; spreading it via background syncs doesn't help.
- **MemTableSize > 64 MiB** (#1 256 MiB, #16 128 MiB). Larger memtable lets entire
100k workload fit in memory → no during-write flushes → forced serial flush at
EndSync. 100k workload regresses ~30%.
- **L0CompactionThreshold ≠ 8** axis fully mapped (2/4/6/16). 8 is the knee.
- **CompactionConcurrencyRange** (#7). With L0=8 compactor isn't the bottleneck.
- **DisableAutomaticCompactions** (#20). With L0=8 it's already idle.
- **proto.MarshalAppend with SetDeferred + cached size** (#23). proto.Size
double-traversal eats the memcpy savings.
- **appendEscaped bytes.IndexByte fast path** (#22). Tuple encoder is on the
smaller goroutine; max(A,B) wallclock means optimizing B doesn't help when B<A.
Loading
Loading