Skip to content

Consider storing 8 bytes inline #1

@dtolnay

Description

@dtolnay

ColdString only permits UTF-8 contents, but is not currently taking advantage of this to compress the representation. 8-byte strings are stored using 17 bytes of memory, when they could be stored in only 8 bytes (less than half as much).

There are 18,446,744,073,709,551,616 possible bit patterns in 8 bytes. ColdString's current representation dedicates 50% of these to represent a 63-bit pointer (2-byte aligned pointer) and 1,166,029,402,208,257 (0.006321%) to represent inline strings of 0 through 7 bytes of UTF-8. So 49.993679% of the state space is unused.

There are only 167,404,246,927,409,152 possible UTF-8 strings with length 8 bytes. This is only 0.9075% of the state space. So there should be plenty of room to store the 8-byte strings inline and still continue to have more than 49.08% of the available states unused.

While it is definitely possible to support up to 8 bytes inline and 63-bit pointers based on the above analysis, you may find that there is a simpler and more performant implementation by reducing the pointers to 62 or 61 bits. For example, a very simple representation would use the fact that 8-byte UTF-8 cannot begin with 10xxxxxx in the first byte, or cannot end with 11xxxxxx in the last byte. With this information, 8-byte UTF-8 can be stored inline as-is, 61-bit pointers can be stored with 111xxxxx in the last byte, and 0-7 bytes can be stored with 110xxxxx in the last byte.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions