Skip to content

Latest commit

 

History

History
190 lines (148 loc) · 5.84 KB

File metadata and controls

190 lines (148 loc) · 5.84 KB

MDF Binary Format Specification (MDFB)

Version: 1
Extension: .mdfb
Byte order: Little-endian

Overview

MDFB is the compact binary serialization of MDF documents. It provides faster parsing and smaller file sizes compared to the text format while preserving full type fidelity.

Key features:

  • String deduplication via a shared string table
  • Optimal integer/float sizing (i32/i64, f32/f64)
  • CRC32 integrity verification
  • Single-pass streaming write, random-access read

File Layout

+========================+
|   Header (56 bytes)    |
+========================+
|   String Table         |
|   (variable size)      |
+========================+
|   Data Section         |
|   (variable size)      |
+========================+

Header

56 bytes, packed (no padding):

Offset Size Type Field Description
0 4 u32 magic 0x4246444D ("MDFB" in little-endian)
4 4 u32 version Format version (currently 1)
8 4 u32 flags Reserved (must be 0)
12 4 u32 stringTableCount Number of strings in string table
16 8 u64 stringTableOffset Byte offset to string table
24 8 u64 dataOffset Byte offset to data section
32 8 u64 dataSize Size of data section in bytes
40 4 u32 rootCount Number of root nodes
44 4 u32 checksum CRC32 of data section
48 8 u64 reserved Reserved for future use (must be 0)

String Table

Located at stringTableOffset. Contains stringTableCount entries, each:

u32  length       // UTF-8 byte count (NOT null-terminated)
u8[] bytes        // Raw UTF-8 bytes

Strings are deduplicated: each unique string appears exactly once. Nodes and values reference strings by their 0-based index. The special index 0xFFFFFFFF represents "no string" (used for unnamed nodes).

Data Section

Located at dataOffset, contains rootCount root nodes encoded sequentially.

Node Encoding

u32  typeStringIndex      // Index into string table
u32  nameStringIndex      // Index into string table (or 0xFFFFFFFF)
u32  propertyCount        // Number of properties
u32  childCount           // Number of child nodes
[Property...]             // propertyCount properties
[Node...]                 // childCount child nodes (recursive)

Property Encoding

u32  keyStringIndex       // Index into string table
Value                     // Typed value (see below)

Value Encoding

Each value is prefixed by a 1-byte type tag:

Tag Type Payload
0 Null (none)
1 Bool u8 (0 = false, 1 = true)
2 Int32 i32
3 Int64 i64
4 Float32 f32
5 Float64 f64
6 String u32 string index
7 Vec2 f32 x, f32 y
8 Vec3 f32 x, f32 y, f32 z
9 Vec4 f32 x, f32 y, f32 z, f32 w
10 Quat f32 x, f32 y, f32 z, f32 w
11 UUID u32 string index
12 AssetRef u32 string index
13 Array u32 count, then count values
15 Enum u32 string index

Type Optimization

The writer automatically selects the most compact representation:

  • Integers: if the value fits in [-2^31, 2^31-1], uses Int32 (4 bytes); otherwise Int64 (8 bytes)
  • Floats: if (double)(float)value == value, uses Float32 (4 bytes); otherwise Float64 (8 bytes)

CRC32 Checksum

The checksum field contains the CRC32 (IEEE polynomial 0xEDB88320) of the entire data section. This allows detection of data corruption. The checksum is verified on read; a mismatch causes a parse error.

Checksum algorithm:

Initial CRC: 0xFFFFFFFF
Polynomial:  0xEDB88320 (reversed IEEE)
Final XOR:   0xFFFFFFFF
Input:       data[dataOffset .. dataOffset + dataSize]

Write Process

The binary writer performs a 5-phase process:

  1. Collect strings -- traverse the entire document, collecting all unique strings
  2. Write placeholder header -- 56 zero bytes
  3. Write string table -- all unique strings with length prefixes
  4. Write data section -- recursively encode all root nodes
  5. Patch header -- fill in actual offsets, sizes, and CRC32 checksum

Example

Given this MDF text:

Player {
    name: "Alice"
    health: 100
    position: vec3(1.0, 2.0, 3.0)
}

The binary layout would be approximately:

[Header: 56 bytes]
  magic:            4D 44 42 46
  version:          01 00 00 00
  stringTableCount: 04 00 00 00    (4 strings: "Player", "name", "Alice", "health", "position")
  ...

[String Table]
  "Player":   06 00 00 00 50 6C 61 79 65 72
  "name":     04 00 00 00 6E 61 6D 65
  "Alice":    05 00 00 00 41 6C 69 63 65
  "health":   06 00 00 00 68 65 61 6C 74 68
  "position": 08 00 00 00 70 6F 73 69 74 69 6F 6E

[Data Section]
  Node:
    type: 00 00 00 00             (string index 0 = "Player")
    name: FF FF FF FF             (no name)
    propCount: 03 00 00 00
    childCount: 00 00 00 00

    Property "name":
      key: 01 00 00 00            (string index 1 = "name")
      value type: 06              (String)
      value: 02 00 00 00          (string index 2 = "Alice")

    Property "health":
      key: 03 00 00 00            (string index 3 = "health")
      value type: 02              (Int32)
      value: 64 00 00 00          (100)

    Property "position":
      key: 04 00 00 00            (string index 4 = "position")
      value type: 08              (Vec3)
      value: 00 00 80 3F          (1.0f)
             00 00 00 40          (2.0f)
             00 00 40 40          (3.0f)

Compatibility

  • Readers MUST reject files with magic != 0x4246444D
  • Readers MUST reject files with version > 1
  • Readers SHOULD verify the CRC32 checksum
  • Writers MUST set flags and reserved to 0
  • The format is forward-compatible: new value type tags (14, 16+) may be added in future versions