-
Notifications
You must be signed in to change notification settings - Fork 2
File Structure Analysis
u/Phenakist has delineated the general regions of the flash chip as follows (see below for Segment 4-Region 13 correction):
So the structure of the firmware seems to be:
A) Chapter Map
B) Chapter
C) Transition
D) Chapter
E) Transtion
... (repeat for number of chapters in cassette) ...
F) [8552] bytes of code, common across all chip dumps
G) Empty space (FF FF FF FF...)
H) 16 byte chip ID and/or key (or possibly the chapter marks)
I) FF's to end of file
They also mentioned that there seems to be 12 segments of "chapters" which correlates with the carousel's 12 windows.
Cartridge data has been collected in binary. Bytes are written in 4-byte chunks of Little Endian, meaning that a 4-byte array (eg: 04 5c 78 00) is actually read backwards (0x785c04) as an address. Conversely, Big Endian notation would be forwards, like natural reading (0x045c7800) -- this is not the case for cartridge data!
Table 1. General stratification of cartridge data by segment. Addresses are reported in hexadecimal.
| Arbitrary Designation | Address Start | Address End (inclusive) | Size (bytes) | Description |
|---|---|---|---|---|
| Segment 1 | 0x0 | 0x6B | 112 | This is a highly variable region with a few conserved markers. This is always the initial pattern to these cartridges. The pattern suggests that there are 4-byte addresses repeated here. In conversing with Gemini, it seems that these may make up a pointer-table. The first pointer seems to be always 0x4EE8 and the other 23 pointers are ascending and unique. In total, this will generate 12 unique regions and 12 conservative regions. |
| Segment 2 | 0x6C | 0x4EE7 | 20088 | This area is completely conserved (validated). It is further split into Segments 2.1 and 2.2 as per the pointer table in Segment 1. |
| Segment 2.1 | 0x6C | 0x2439 | 9166 | This region is completely conserved. |
| Segment 2.2 | 0x243A | 0x4EE7 | 10926 | This region is completely conserved. |
| Segment 3 | 0x4EE8 | variable | variable | Unique data segment. |
| Segment 4 | variable | Variable | 8552 | This is the last significant data chunk. It is completely conserved (validated). |
| Segment 5 | variable | 0xFFF7F | variable | This is purely empty space. It takes up a decent amount of the chip. |
| Segment 6 | 0xFFF80 | 0xFFF8F | 12 | This is the last line of data unique to each book "series" (ie: Puppy, Elephant, and Lion are one trio "series"). |
| Segment 7 | 0xFFF90 | 0xFFFFF | 128 | This is the last segment. It is simply empty space. |
A list of cartridge-specific segment specifications
Inter-Cartridge Region Comparisons
Table of Region References by Cartridge
There are 24 regions within Segments 3 and 4. Within each of these segments, half the regions are located (Regions 1-12 in Segment 3; Regions 13-24 in Segment 4). All Regions start with a 4-byte "size," followed by "80 3E" as the next two bytes for all Regions 1-24.
As previously stated, all Regions start with a 4-byte "size." In Regions 1-12, this size does not end at the Region, but at the start of a terminal table-like data structure. See "Possible PCM Areas" below for an explanation. Conversely, in Regions 13-24, this size does end the Region. This further supports the idea of a "delta-table," as the sound may be located in Regions 1-12 instead of Regions 13-24 -- as there are 12 film slides and these Regions are longest. The shorter Regions 13-24 are conserved between cartridges, thus indicating that the audio is not stored here. The lengths of Regions 13-24 are also unchanging between cartridges, which lends evidence that they are not some sort of decryption keys for their corresponding Region 1-12 pair (ie: 1 & 13, 2 & 14, ... 12 & 24). If they were some sort of decryption/decompression key for their earlier pair, I would anticipate that their length would be proportional to the length of their Region 1-12 counterparts, which they are not since cartridge Regions 1-12 vary in length, but Regions 13-24 do not vary at all. Potentially, Regions 13-24 may be constant keys used to decrypt each corresponding Region 1-12 through some algorithm...
Phenakist originally noted that 8556 bytes prior to the start of Segment 5 is conserved among all books ("Segment 4"). However, the pointer table in Segment 1 suggests that Region 13 starts 8552 bytes prior to Segment 5. This 4-byte difference seems to be possibly deviant between books. See Segment 4 Region 13 Differentiation for an analysis of this 4-byte area leading to the correction of Segment 4's starting address.
PCM Breakdown for Regions 1-12 by Cartridge
Pulse-Code Modulation (PCM) is a type of compressed audio format. Each region can be further broken down into 2-3 distinguished parts. Regions 1-12 can be divided into Address, Body, and Map. Regions 13-24 can be simply divided into Address and Body.
This area is a 4 byte segment, the very first 4 bytes of each region. These 4 bytes actually denote the size of the PCM Body of the Region when converted into decimal. Remember retrograde conversion of hex to decimal in Little Endian (42 99 = 0x9942 = 39234 bytes).
This is suspected to be the area where the compressed audio is (in PCM format). To decompress the audio, a map of predictive changes based on the last value is provided to help reconstruct the audio coherently.
As far as I understand, this is a map of values that help predict the holes in the sampled audio of the Body.