ASMap optimizations by l0rinc · Pull Request #174 · l0rinc/bitcoin

l0rinc · 2026-05-17T13:12:46Z

Motivation

ASMap lookups are on hot paths for netgrouping and addrman behavior. This PR keeps the implementation shape close to the existing code, but removes a few avoidable costs in NetGroupManager::GetMappedAS() and
the ASMap varint decoder.

Changes

Avoid heap allocation when building the 128-bit ASMap lookup key.
Reuse the GetNetClass() result instead of rechecking IPv4 classification.
Avoid an IPv6 temporary byte vector by copying through GetIn6Addr().
Simplify ASMap mantissa decoding while keeping the generic DecodeBits() path.
Mark DecodeBits() inline; this helps GCC substantially and is neutral for Clang.
Return before address classification when ASMap is disabled.

Benchmarks

Median ns/op over 10 runs, -min-time=10000, pinned with taskset -c 0.

ASMap-enabled lookup, final stack vs origin/master:

Compiler	Benchmark	Before	After	Speedup
GCC 12.2.0	`ASMapGetMappedASMulti`	6071.9	4014.8	+51.2%
Clang 17.0.6	`ASMapGetMappedASMulti`	4639.3	4180.5	+11.0%

No-ASMap path, last commit vs previous commit:

Compiler	Benchmark	Before	After	Speedup
GCC 12.2.0	`AddrManAdd`	132114160.6	117119345.0	+12.8%
GCC 12.2.0	`AddrManAddThenGood`	150168118.5	133389867.5	+12.6%
Clang 17.0.6	`AddrManAdd`	102457638.5	96694378.8	+6.0%
Clang 17.0.6	`AddrManAddThenGood`	110488499.0	103380926.0	+6.9%

The ASMap benchmark added in PR 35285 makes GetMappedAS() measurable. perf record on ASMapGetMappedASMulti showed most time under Interpret()/DecodeBits(), but address classification and lookup-key setup were still visible. GetMappedAS() already calls GetNetClass() before building the lookup key, and GetNetClass() classifies IPv4 by using the same HasLinkedIPv4() predicate. Reuse that net_class result for the early IPv4/IPv6 filter and for selecting the IPv4 mapping path, and use a fixed stack array for the 128-bit lookup key instead of allocating a 16-byte vector on every lookup. Benchmarked with GCC 12.2.0 and Clang 17.0.6. Numbers are median ns/op over 10 runs with -min-time=10000. GCC 12.2.0, previous commit -> this commit: ASMapGetMappedASMulti: 6071.9 -> 5875.7 saved 196.1 ns/op, +3.3% faster ASMapGetMappedASCloudflarev4: 6730.7 -> 6585.8 saved 144.9 ns/op, +2.2% faster ASMapGetMappedASCloudflarev6: 5328.0 -> 5087.2 saved 240.9 ns/op, +4.7% faster ASMapGetMappedASQuad9v6: 4099.8 -> 3866.8 saved 233.0 ns/op, +6.0% faster ASMapGetMappedASUnmappedv6: 1095.4 -> 854.4 saved 241.0 ns/op, +28.2% faster Clang 17.0.6, previous commit -> this commit: ASMapGetMappedASMulti: 4639.3 -> 4492.5 saved 146.8 ns/op, +3.3% faster ASMapGetMappedASCloudflarev4: 5048.3 -> 4937.3 saved 111.0 ns/op, +2.2% faster ASMapGetMappedASCloudflarev6: 3940.9 -> 3806.9 saved 134.0 ns/op, +3.5% faster ASMapGetMappedASQuad9v6: 3013.9 -> 2881.3 saved 132.5 ns/op, +4.6% faster ASMapGetMappedASUnmappedv6: 693.4 -> 567.8 saved 125.6 ns/op, +22.1% faster

For IPv6 ASMap lookups, GetAddrBytes() builds a temporary vector even though GetMappedAS() only needs the fixed 16-byte IPv6 address. Copy the address through GetIn6Addr() into the existing stack lookup key instead. This keeps the same IPv6-only assertion while avoiding the temporary byte vector. Benchmarked with GCC 12.2.0 and Clang 17.0.6. Numbers are median ns/op over 10 runs with -min-time=10000. GCC 12.2.0, previous commit -> this commit: ASMapGetMappedASMulti: 5875.7 -> 5817.7 saved 58.0 ns/op, +1.0% faster ASMapGetMappedASCloudflarev6: 5087.2 -> 5000.6 saved 86.5 ns/op, +1.7% faster ASMapGetMappedASGooglev6: 5251.3 -> 5159.3 saved 92.0 ns/op, +1.8% faster ASMapGetMappedASQuad9v6: 3866.8 -> 3785.4 saved 81.4 ns/op, +2.2% faster ASMapGetMappedASUnmappedv6: 854.4 -> 769.8 saved 84.6 ns/op, +11.0% faster Clang 17.0.6, previous commit -> this commit: ASMapGetMappedASMulti: 4492.5 -> 4425.4 saved 67.1 ns/op, +1.5% faster ASMapGetMappedASCloudflarev6: 3806.9 -> 3746.4 saved 60.5 ns/op, +1.6% faster ASMapGetMappedASGooglev6: 3857.1 -> 3799.2 saved 57.9 ns/op, +1.5% faster ASMapGetMappedASQuad9v6: 2881.3 -> 2810.4 saved 70.9 ns/op, +2.5% faster ASMapGetMappedASUnmappedv6: 567.8 -> 506.6 saved 61.2 ns/op, +12.1% faster

DecodeBits() consumes an operand one bit at a time in big-endian order, but the old mantissa loop used a variable shift for every bit and added into the final value directly. Build the mantissa by shifting the accumulated value left and OR-ing the next consumed bit. This preserves the existing bit order and EOF checks without introducing a separate packed-bit reader. DecodeBits() itself is an anonymous-namespace helper, so it is covered indirectly through Interpret(), SanityCheckAsmap(), CheckStandardAsmap(), GetMappedAS(), and the ASMap fuzz targets rather than by direct unit tests. Keep this as the smallest mechanical change to the existing decoder. Benchmarked with GCC 12.2.0 and Clang 17.0.6. Numbers are median ns/op over 10 runs with -min-time=10000. GCC 12.2.0, previous commit -> this commit: ASMapGetMappedASMulti: 5817.7 -> 5622.1 saved 195.6 ns/op, +3.5% faster ASMapGetMappedASCloudflarev4: 6591.4 -> 6487.4 saved 104.0 ns/op, +1.6% faster ASMapGetMappedASCloudflarev6: 5000.6 -> 4912.0 saved 88.6 ns/op, +1.8% faster ASMapGetMappedASGooglev4: 6805.4 -> 6647.2 saved 158.2 ns/op, +2.4% faster ASMapGetMappedASQuad9v4: 4785.5 -> 4644.4 saved 141.2 ns/op, +3.0% faster Clang 17.0.6, previous commit -> this commit: ASMapGetMappedASMulti: 4425.4 -> 4176.2 saved 249.1 ns/op, +6.0% faster ASMapGetMappedASCloudflarev4: 4936.1 -> 4699.0 saved 237.0 ns/op, +5.0% faster ASMapGetMappedASCloudflarev6: 3746.4 -> 3550.5 saved 196.0 ns/op, +5.5% faster ASMapGetMappedASGooglev4: 5093.1 -> 4823.7 saved 269.4 ns/op, +5.6% faster ASMapGetMappedASQuad9v4: 3633.4 -> 3449.4 saved 184.1 ns/op, +5.3% faster

With the mantissa loop simplified, DecodeBits() is small enough that asking for inlining gives GCC better visibility into the fixed operand tables used by DecodeType(), DecodeASN(), DecodeMatch(), and DecodeJump(). Keep the generic DecodeType() path: after the mantissa and inlining changes, it benchmarks faster than the custom direct opcode decoder while preserving the smaller diff. Some IDEs mark this inline keyword redundant, and Clang 17 is indeed effectively neutral here. GCC 12 does not treat it as redundant in this build: the explicit hint substantially improves the ASMap benchmarks. A forced ALWAYS_INLINE version was measured separately and was slower than the plain inline hint, so keep the weaker standard keyword. Benchmarked with GCC 12.2.0 and Clang 17.0.6. Numbers are median ns/op over 10 runs with -min-time=10000. GCC 12.2.0, previous commit -> this commit: ASMapGetMappedASMulti: 5622.1 -> 4014.8 saved 1607.4 ns/op, +40.0% faster ASMapGetMappedASCloudflarev4: 6487.4 -> 4564.4 saved 1923.0 ns/op, +42.1% faster ASMapGetMappedASCloudflarev6: 4912.0 -> 3435.1 saved 1477.0 ns/op, +43.0% faster ASMapGetMappedASGooglev4: 6647.2 -> 4742.7 saved 1904.5 ns/op, +40.2% faster ASMapGetMappedASQuad9v6: 3714.9 -> 2606.8 saved 1108.1 ns/op, +42.5% faster ASMapGetMappedASUnmappedv6: 771.1 -> 609.2 saved 161.9 ns/op, +26.6% faster Clang 17.0.6, previous commit -> this commit: ASMapGetMappedASMulti: 4176.2 -> 4180.5 changed -4.3 ns/op, -0.1% slower ASMapGetMappedASCloudflarev4: 4699.0 -> 4695.6 saved 3.5 ns/op, +0.1% faster ASMapGetMappedASGooglev6: 3639.3 -> 3608.7 saved 30.6 ns/op, +0.8% faster ASMapGetMappedASQuad9v6: 2670.5 -> 2704.1 changed -33.6 ns/op, -1.2% slower ASMapGetMappedASUnmappedv6: 482.3 -> 482.4 changed -0.1 ns/op, -0.0% slower

GetMappedAS() first checks whether an ASMap is configured. When it is not, there is no need to classify the address before returning the reserved unmapped ASN value. This keeps the ASMap-enabled path unchanged except for the already-predictable non-empty check, while making the default no-ASMap path cheaper for callers such as AddrMan that repeatedly ask for netgroups. Benchmarked with GCC 12.2.0 and Clang 17.0.6. Numbers are median ns/op over 10 runs with -min-time=10000. GCC 12.2.0, previous commit -> this commit: AddrManAdd: 132114160.6 -> 117119345.0 saved 14994815.6 ns/op, +12.8% faster AddrManAddThenGood: 150168118.5 -> 133389867.5 saved 16778251.0 ns/op, +12.6% faster Clang 17.0.6, previous commit -> this commit: AddrManAdd: 102457638.5 -> 96694378.8 saved 5763259.8 ns/op, +6.0% faster AddrManAddThenGood: 110488499.0 -> 103380926.0 saved 7107573.0 ns/op, +6.9% faster

l0rinc closed this May 17, 2026

l0rinc reopened this May 17, 2026

l0rinc closed this May 17, 2026

l0rinc reopened this May 17, 2026

l0rinc added 5 commits May 18, 2026 18:17

l0rinc force-pushed the detached534 branch from db50a14 to 72f90cc Compare May 18, 2026 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASMap optimizations#174

ASMap optimizations#174
l0rinc wants to merge 5 commits into
masterfrom
detached534

l0rinc commented May 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

l0rinc commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Benchmarks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

l0rinc commented May 17, 2026 •

edited

Loading