Allow optionally passed string allocator and expand benchmarks #264
Allow optionally passed string allocator and expand benchmarks #264jashook wants to merge 14 commits intomaxmind:mainfrom
Conversation
|
cc @horgh |
horgh
left a comment
There was a problem hiding this comment.
Thanks for the PR! I had a few comments. For future changes, it would probably make sense to open an issue to talk about the design and whether we would be likely to accept a change before going through the effort of writing a PR.
| /// <param name="file">The file.</param> | ||
| public Reader(string file) : this(file, FileAccessMode.MemoryMapped) | ||
| /// <param name="stringAllocator">Optional allocator method for strings created from byte arrays.</param> | ||
| public Reader(string file, AllocatorDelegates.GetString? stringAllocator = null) |
There was a problem hiding this comment.
I think we'd need to figure out how callers would use this without bugs. Design wise I'm not sure it's something we'd want as is.
For example, Reader needs to be thread safe, so the allocator would need to be as well. That seems like something easy to misuse. e.g. the benchmark use seems to not be thread safe.
As well, we might need to have a way to limit the size of the cache, e.g. evict older entries or something.
Potentially ideally we would do something similar to the caching in the Java reader: https://github.com/maxmind/MaxMind-DB-Reader-java?tab=readme-ov-file#caching. That caches based on the data section offset I believe, so we could cache more than only strings.
There was a problem hiding this comment.
I think we'd need to figure out how callers would use this without bugs. Design wise I'm not sure it's something we'd want as is.
A bit of this is me learning what this library does. That said, noted, can open issues, the issue here is more or less clear, the library allocates a bit.
There was a problem hiding this comment.
For example, Reader needs to be thread safe, so the allocator would need to be as well. That seems like something easy to misuse. e.g. the benchmark use seems to not be thread safe.
+1 moved the cache into the library. I left the cache internal, and added a new constructor passing in a cache capacity.
There was a problem hiding this comment.
Potentially ideally we would do something similar to the caching in the Java reader: https://github.com/maxmind/MaxMind-DB-Reader-java?tab=readme-ov-file#caching. That caches based on the data section offset I believe, so we could cache more than only strings.
I am not particularly married to a specific design here. Please give feedback on the current change, and I have no problem making changes. This has not been time intense.
4139fe6 to
25ed1f4
Compare
| throw new InvalidOperationException($"{dbPathVarName} was not set"); | ||
| _reader = new Reader(dbPath); | ||
| _memMapReader = new Reader(dbPath, FileAccessMode.MemoryMapped); | ||
| _memMapCachedReader = new Reader(dbPath, FileAccessMode.MemoryMapped, 4_096); |
There was a problem hiding this comment.
It may make sense to have a separate PR for improvements to the benchmark tool, if they are unrelated to the PR. I realize in a prior version the changes were more relevant as you were exercising new parameters to the reader.
There was a problem hiding this comment.
There will still need to a be change to exercise the new parameters, but it will make the pr easier to review. Thank you
| </ItemGroup> | ||
|
|
||
| <ItemGroup> | ||
| <PackageReference Include="System.Memory" Version="4.5.5" /> |
| /// <param name="file">The MaxMind DB file.</param> | ||
| /// <param name="mode">The mode by which to access the DB file.</param> | ||
| /// <param name="cacheSize">Cache size for optional internal cache</param> | ||
| public Reader(string file, FileAccessMode mode, int cacheSize) |
There was a problem hiding this comment.
This seems like an improvement in terms of not breaking the API. However I wonder if we would want to provide some way to swap out the cache implementation. I'm not sure and not sure what it would best look like if so.
| /// Cache size | ||
| /// </summary> | ||
| /// <returns></returns> | ||
| public int CacheSize() |
There was a problem hiding this comment.
Maybe we could leave this out unless it's necessary, to reduce adding to the public API for now.
| Network? network | ||
| ) | ||
| { | ||
| ValueTuple<object, long> returnValue = DecodeFromCacheOrCreate(offset, size, expectedType, type, injectables, network, static (Buffer database, long offset, int size, Type type, ObjectType objectType, Decoder decoder, InjectableValues? injectableValues, Network? network) => |
There was a problem hiding this comment.
I wonder if it would be simpler if we only cached by data section offset / pointer. You can see we actually have caching by that already for when iterating the tree in FindAll():
MaxMind-DB-Reader-dotnet/MaxMind.Db/Reader.cs
Line 297 in 31eb3d2
We cache by that in MaxMind-DB-Reader-java as well.
It may make sense to extend that caching too so we don't have two kinds of caches. I didn't realize we had that cache when you first made your PR.
horgh
left a comment
There was a problem hiding this comment.
Sorry, I saw a few more things!
| { | ||
| var type = CtrlData(offset, out var size, out offset); | ||
| return DecodeByType(expectedType, type, offset, size, out outOffset, injectables, network); | ||
| return DecodeByTypeFromCacheOrCreate(expectedType, type, offset, size, out outOffset, injectables, network); |
There was a problem hiding this comment.
I think we should also have a test looking up different IPs that resolve to the same offset where we use Inject. It looks like we may cache the first IP for example if we inject IP, which doesn't seem desirable. Same issue with Network.
|
|
||
| // Else we can add. Below will most likely end up as a tail call. Do not | ||
| // mark the method as aggressive inline. | ||
| return this._Cache.TryAdd((offset, size, type), item); |
There was a problem hiding this comment.
Would the size be incorrectly bumped if the key ends up existing already?
| ReflectionUtil.CheckType(expectedType, typeof(byte[])); | ||
|
|
||
| return _database.Read(offset, size); | ||
| return (byte[])DecodeFromCacheOrCreate(offset, size, expectedType, static (Buffer database, long offset, int size) => |
There was a problem hiding this comment.
These lower level DecodeFromCacheOrCreate calls seem to cause us to bump size a second time. Once here, and once in the call to DecodeByType.
| public bool TryGet(long offset, int size, Type type, out ValueTuple<object, long> returnValue) | ||
| { | ||
| // Read, attempt to return a cached value | ||
| if (this._Cache.TryGetValue((offset, size, type), out returnValue)) |
There was a problem hiding this comment.
Would the returned type be modified if the caller modified it? We would need to document how to handle that at least.
Split benchmark changes out of #264
Change also includes an example string intern method which drops allocations nearly 50%.