Skip to content

Allow optionally passed string allocator and expand benchmarks #264

Open
jashook wants to merge 14 commits intomaxmind:mainfrom
jashook:allow_optional_string_allocator
Open

Allow optionally passed string allocator and expand benchmarks #264
jashook wants to merge 14 commits intomaxmind:mainfrom
jashook:allow_optional_string_allocator

Conversation

@jashook
Copy link
Copy Markdown
Contributor

@jashook jashook commented Feb 25, 2026

Change also includes an example string intern method which drops allocations nearly 50%.

Method Mean Error StdDev Gen0 Allocated
CityMemoryMappedLookup 6.526 ms 0.1058 ms 0.0883 ms 468.7500 3875.97 KB
CityMemoryMappedCachedLookup 2.539 ms 0.0328 ms 0.0307 ms 66.4063 552.98 KB
CityMemoryLookup 6.538 ms 0.0891 ms 0.0875 ms 468.7500 3875.97 KB
CityMemoryCachedLookup 2.510 ms 0.0147 ms 0.0130 ms 66.4063 552.98 KB

@jashook
Copy link
Copy Markdown
Contributor Author

jashook commented Feb 25, 2026

cc @horgh

Copy link
Copy Markdown
Contributor

@horgh horgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I had a few comments. For future changes, it would probably make sense to open an issue to talk about the design and whether we would be likely to accept a change before going through the effort of writing a PR.

Comment thread MaxMind.Db/Decoder.cs Outdated
Comment thread MaxMind.Db/Reader.cs Outdated
/// <param name="file">The file.</param>
public Reader(string file) : this(file, FileAccessMode.MemoryMapped)
/// <param name="stringAllocator">Optional allocator method for strings created from byte arrays.</param>
public Reader(string file, AllocatorDelegates.GetString? stringAllocator = null)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd need to figure out how callers would use this without bugs. Design wise I'm not sure it's something we'd want as is.

For example, Reader needs to be thread safe, so the allocator would need to be as well. That seems like something easy to misuse. e.g. the benchmark use seems to not be thread safe.

As well, we might need to have a way to limit the size of the cache, e.g. evict older entries or something.

Potentially ideally we would do something similar to the caching in the Java reader: https://github.com/maxmind/MaxMind-DB-Reader-java?tab=readme-ov-file#caching. That caches based on the data section offset I believe, so we could cache more than only strings.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd need to figure out how callers would use this without bugs. Design wise I'm not sure it's something we'd want as is.

A bit of this is me learning what this library does. That said, noted, can open issues, the issue here is more or less clear, the library allocates a bit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, Reader needs to be thread safe, so the allocator would need to be as well. That seems like something easy to misuse. e.g. the benchmark use seems to not be thread safe.

+1 moved the cache into the library. I left the cache internal, and added a new constructor passing in a cache capacity.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially ideally we would do something similar to the caching in the Java reader: https://github.com/maxmind/MaxMind-DB-Reader-java?tab=readme-ov-file#caching. That caches based on the data section offset I believe, so we could cache more than only strings.

I am not particularly married to a specific design here. Please give feedback on the current change, and I have no problem making changes. This has not been time intense.

@jashook jashook force-pushed the allow_optional_string_allocator branch from 4139fe6 to 25ed1f4 Compare March 2, 2026 05:13
throw new InvalidOperationException($"{dbPathVarName} was not set");
_reader = new Reader(dbPath);
_memMapReader = new Reader(dbPath, FileAccessMode.MemoryMapped);
_memMapCachedReader = new Reader(dbPath, FileAccessMode.MemoryMapped, 4_096);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to have a separate PR for improvements to the benchmark tool, if they are unrelated to the PR. I realize in a prior version the changes were more relevant as you were exercising new parameters to the reader.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will still need to a be change to exercise the new parameters, but it will make the pr easier to review. Thank you

Comment thread MaxMind.Db/AllocatorDelegates.cs
</ItemGroup>

<ItemGroup>
<PackageReference Include="System.Memory" Version="4.5.5" />
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Comment thread MaxMind.Db/Reader.cs
/// <param name="file">The MaxMind DB file.</param>
/// <param name="mode">The mode by which to access the DB file.</param>
/// <param name="cacheSize">Cache size for optional internal cache</param>
public Reader(string file, FileAccessMode mode, int cacheSize)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an improvement in terms of not breaking the API. However I wonder if we would want to provide some way to swap out the cache implementation. I'm not sure and not sure what it would best look like if so.

Comment thread MaxMind.Db/Reader.cs
/// Cache size
/// </summary>
/// <returns></returns>
public int CacheSize()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could leave this out unless it's necessary, to reduce adding to the public API for now.

Comment thread MaxMind.Db/Decoder.cs
Network? network
)
{
ValueTuple<object, long> returnValue = DecodeFromCacheOrCreate(offset, size, expectedType, type, injectables, network, static (Buffer database, long offset, int size, Type type, ObjectType objectType, Decoder decoder, InjectableValues? injectableValues, Network? network) =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be simpler if we only cached by data section offset / pointer. You can see we actually have caching by that already for when iterating the tree in FindAll():

if (!dataCache.TryGetValue(node.Pointer, out var data))

We cache by that in MaxMind-DB-Reader-java as well.

It may make sense to extend that caching too so we don't have two kinds of caches. I didn't realize we had that cache when you first made your PR.

Copy link
Copy Markdown
Contributor

@horgh horgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I saw a few more things!

Comment thread MaxMind.Db/Decoder.cs
{
var type = CtrlData(offset, out var size, out offset);
return DecodeByType(expectedType, type, offset, size, out outOffset, injectables, network);
return DecodeByTypeFromCacheOrCreate(expectedType, type, offset, size, out outOffset, injectables, network);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also have a test looking up different IPs that resolve to the same offset where we use Inject. It looks like we may cache the first IP for example if we inject IP, which doesn't seem desirable. Same issue with Network.

Comment thread MaxMind.Db/SNECache.cs

// Else we can add. Below will most likely end up as a tail call. Do not
// mark the method as aggressive inline.
return this._Cache.TryAdd((offset, size, type), item);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the size be incorrectly bumped if the key ends up existing already?

Comment thread MaxMind.Db/Decoder.cs
ReflectionUtil.CheckType(expectedType, typeof(byte[]));

return _database.Read(offset, size);
return (byte[])DecodeFromCacheOrCreate(offset, size, expectedType, static (Buffer database, long offset, int size) =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lower level DecodeFromCacheOrCreate calls seem to cause us to bump size a second time. Once here, and once in the call to DecodeByType.

Comment thread MaxMind.Db/SNECache.cs
public bool TryGet(long offset, int size, Type type, out ValueTuple<object, long> returnValue)
{
// Read, attempt to return a cached value
if (this._Cache.TryGetValue((offset, size, type), out returnValue))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the returned type be modified if the caller modified it? We would need to document how to handle that at least.

jashook added a commit to jashook/MaxMind-DB-Reader-dotnet that referenced this pull request Mar 25, 2026
jashook added a commit to jashook/MaxMind-DB-Reader-dotnet that referenced this pull request Mar 25, 2026
horgh added a commit that referenced this pull request Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants