Skip to content

Adjustable alignment #5

@erenberg

Description

@erenberg

Thank you for doing this! A proper standard for this kind of content has long been overdue.

I'm not asking for changes, this is just an analysis of BBF to a particular storage system.

I am interested in storing books in immutable, location-independent, content-addressed storage. Any storage system that fits this definition should be able to store any kind of single-file book format, and a subset of those systems can store books in a file-and-directory format as it would appear on a file-system.

I've done some tests with storing BBF in the Encoding for Robust Immutable Storage (ERIS). ERIS encoding uses 32KiB block size for BBF rather than 4KiB. ERIS overhead is some padding and a Merkle tree of 32KiB metadata blocks. My tests show a typical %0.004 overhead that decreases relative to the total size of a file.

The way I'm currently archiving these sorts of books is to unarchive into folders, encode that to ERIS-FS, and view them through a VFS. ERIS-FS does a seperate encoding of each file with an additional encoding of the file-system metadata.

With a 4KiB alignment in BBF it is unlikely there is any block deduplication between content in a folder and content in the BBF because of the initial 4KiB shift. If the alignment could be increased to 32KiB then the content blocks would align and result in identical ERIS encoded blocks for the EBF and for the files before muxing. There is an exception for the final block in each file, ERIS uses a padding scheme that appends a 0x80 byte and fills the remainder of the block with zero bits (ISO/IEC 7816-4) instead of the simple BBF zero-fill.

From an initial analysis the only optimisation of BBF for ERIS would be adjustable alignment for each archive, probably as a multiple of 4KiB, and changing the padding scheme.

Obviously there is more overhead with increased padding, but I imagine it is less than storing a generic file and folder structure. If multiple BBF archives contained some of the same content and had a common alignment, there would ERIS block deduplication between them.

(Content-address storage that use a variable block size should autmatically realign blocks following padding. ERIS doesn't do variable sized blocks for the sake of simplicity and privacy.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions