Skip to content

07. Collections

David Shnayder edited this page Jul 26, 2024 · 7 revisions

Collections

Sharpify has multiple custom collections such as:

SortedList{T}

SortedList<T> is a re-implementation of List<T> with custom crud operations:

  • Add -> O(log n)
  • Remove -> O(log n)
  • Get by sorted index O(1) - i.e min is [0] and max is [length - 1], also second max is [length - 2]...
  • Option to disallow duplicates

The SortedList<T> also has convenience features, such as AsSpan, Clear methods, exposure of the List<T>.Enumerator which is an efficient struct, and also an implicit operator which can return the inner list in places which require List<T> (however be careful as the receiver may use the inner list and it may no longer maintain the features above)

PersistentDictionary

PersistentDictionary is a thread-safe Dictionary<string, string that is optimized for concurrency. The abstract class provides most of the important implementation to allow all the features, and should be used as the type for the object when you want to use a PersistentDictionary.

PersistentDictionary has many convenience methods such as automatic conversions that allow getting any value that implements IParsable and adding any value that implements IConvertible which at the vary least are most of the primitive types in .NET

The main differences between the api of this and a regular dictionary is that it is best to use the async overloads

// Upsert
public ValueTask UpsertAsync<T>(string key, T value) where T : struct, IConvertible
public virtual async ValueTask UpsertAsync(string key, string value)
// Retrieval
public async ValueTask<T> GetOrCreateAsync<T>(string key, T @default) where T : struct, IParsable<T>
public virtual async ValueTask<string> GetOrCreateAsync(string key, string @default)

you can also get values with a synchronous operation if you require by using PersistentDictionary[key]

but upsert is not available asynchronously due to the synchronization mechanisms that are used to optimize the concurrency

To configure the type for usage you can implement the class and it will show you the specific things required to make everything work. In addition, there are 2 built-in implementations:

  • LocalPersistentDictionary is an implementation that serializes and restores the dictionary from a local path
  • LazyLocalPersistentDictionary is an implementation that also serializes and restores the dictionary from a local path, doesn't maintain an in-memory version, allowing it to be garbage collected if it was even created, this is for very memory constrained scenarios. Reading from it, doesn't even create a dictionary.

StringBuffer and AllocateStringBuffer

StringBuffer and AllocatedStringBuffer are unique ref structs that efficiently build a string from ReadOnlySpan<char>s, chars and any other ISpanFormattable implemented types. both require knowing the maximum potential length ahead of time, and they differ by the way buffer works, StringBuffer rents a buffer from the array pool and is still efficient in large capacities, while AllocatedStringBuffer works on a pre-allocated buffer and best used in conjunction with stackalloc on smaller buffers (less than 1024).

Both use internal indexes to properly append elements, requiring basically no tracking from the user.

StringBuffer has a factory methods for both variants:

  • Rent(capacity, clearBuffer) method with clearBuffer being an optional argument that is used to enforce cleaning of the rented buffer (normally unnecessary), and returns a StringBuffer.
  • Create(Span<char>) method that returns an AllocatedStringBuffer, and best used with stackalloc.

StringBuffer also implements a Dispose function and can be used with a using keyword or statement, it is required to make sure the rented buffer returns to the array pool. Failure to dispose of it properly will not lead to any "critical" issues, other than the need of the array pool to allocate more memory to the replace the lost buffers, which may degrade performance.

Appending

Because each implementation has its own methods, the return signature is different, but for the argument, lets say that TBuffer is either a StringBuffer or AllocatedStringBuffer.

ref TBuffer Append(char c);
ref TBuffer Append(ReadOnlySpan<char> str);
ref TBuffer Append<T>(T value, ReadOnlySpan<char> format = default, IFormatProvider? provider = null) where T : ISpanFormattable {}
ref TBuffer AppendLine();
ref TBuffer AppendLine(char c);
ref TBuffer AppendLine(ReadOnlySpan<char> str);
ref TBuffer AppendLine<T>(T value, ReadOnlySpan<char> format = default, IFormatProvider? provider = null) where T : ISpanFormattable {};

All the append methods return a reference to self, this is to enable usage of the builder pattern.

Finalization (Get string)

Allocate(bool trimIfShorter = true, bool trimEndWhiteSpace = false)
// trimIfShorter -> will trim the buffer to the end of the latest appended segment
// trimEndWhiteSpaces -> will trim white spaces at the end
ToString() // Will call Allocate(true, false)
implicit string operator // Will also call Allocate(true, false)
implicit ReadOnlySpan<char> operator // A readonly span of the same sequence of Allocate(true, false), but no allocation.
implicit ReadOnlyMemory<char> operator // save as span but exclusive to StringBuffer
[Range] // Will allocate by range

Example

public string GetHello() {
  // Option 1 - Rented
  using var buffer = StringBuffer.Rent(50); // The number is ballpark but overestimated
  // Option 2 - Stack Allocated
  var buffer = StringBuffer.Create(stackalloc char[50]);

  buffer.Append("Hello");
  buffer.Append(' ');
  buffer.Append("Everyone");
  buffer.Append('!');
  return buffer;
  // We sample text is separated for api showcase.
}
// The implicit operator will kick in and Allocate(true, false)
// The returned result will be "Hello Everyone!"
// Example of usage with the builder pattern - similar to StringBuilder
public string GetHello() {
  return StringBuffer.Create(stackalloc char[50])
                           .Append("Hello")
                           .Append(' ')
                           .Append("Everyone")
                           .Append('!');
}

In the functionality they are very similar in some use cases to a StringBuilder however, the types themselves are stack allocated, and together with an option of either renting the buffer or stack allocating it as well, they can be much more efficient.

RentedBufferWriter{T}

RentedBufferWriter{T} is an allocation friendly alternative to ArrayBufferWriter{T} which implements IBufferWriter{T}, an interface that represent a bucket that data can be written to. while it is not a commonly used interface, created to optimize specific hot paths, such as networking and IO pipes, using them is not very straightforward, and while ArrayBufferWriter{T} is a rather useful tool for some cases, it's limitation is that it isn't bound to any capacity, thus it always allocates arrays, and when it runs out of space, it allocates bigger arrays to resize, and that puts unneeded pressure of the GC.

RentedBufferWriter{T} fixes this by restricting the capacity at initialization, and renting the buffer from the shared array pool. Note that SizeHint in GetSpan and GetMemory is completely ignored in this implementation as resizing the inner buffer is currently not possible, by design. In case you are not sure what can exact capacity needed is, overestimate, it won't have much negative effects on the shared array pool.

Aside from implementing the interface IBufferWriter{T}, it also explicitly implements IDisposable to make sure the inner buffer is returned to the shared array pool after use. And implements many convenience methods and properties, such as:

int ActualCapacity;
int FreeCapacity;
ReadOnlySpan<T> WrittenSpan;
ReadOnlyMemory<T> WrittenMemory;
ArraySegment WrittenSegment;
ReadOnlySpan<T> GetSpanSlice(int start, int length);
ReadOnlyMemory<T> GetMemorySlice(int start, int length);
void Advance(int count);
bool WriteAndAdvance(T item);
bool WriteAndAdvance(ReadOnlySpan<T> data);
void Reset();
T[] Buffer; // Which returns the instance of the inner buffer, be careful with this.
ref T[] GetReferenceUnsafe(); // returns the reference for the inner buffer, be extra careful with this

Clone this wiki locally