livedecode
2022-10-25
A super hacky tool to decode unknown binary formats
Table of Contents
The Issue
I often work with either new or unknown binary formats and decoding those with a hex editor is hard work. Especially when trying to understand and reverse engineer an unknown binary format, having just a hex editor is often not enough.
I converged to the same solution several times in the last years, and the last time I thought:
Let’s make it an actual tool, not just another hacky script for the current purpose.
The Solution
The tool I created is called livedecode. It consumes two files, and dumps its output on stdout:
[user@host project]$ livedecode example/png.spec example/example.png
This will read a specification file (png.spec
), and will apply the decoding instructions to example.png
:
The png.spec
looks something like this:
endian be
print "File Header:"
u8 magic_8bit # should be 0x89
str 5 magic # PNG\r\n
u8 magic_1a # 0x1A
u8 magic_0a # 0x0a
def tEXt 1950701684
def IHDR 1229472850
def tIME 1950960965
call chunk "IHDR"
call chunk "gAMA"
call chunk "cHRM"
...
As you can see, the format is line based, and has some assembler-like syntax. Each thing you decode (e.g. u8 magic_1a
) will print its result to stdout.
You can also invoke subprograms with call <pgm> <args..>
, like you can see in call chunk IHDR
.
A smaller specification might look like this:
endian le
u32 magic
u32 type
u32 offset
u32 length
print section 1
seek *offset
dump *length
.if *type 10
seek 0x200
str 10 description
.endif
This format has a header built of a magic number, a file type, offset and length. The program will then seek to the specified offset and prints a hex dump with the length specified in the header.
Also, if the type is 10, it will seek to offset 512 and will print a 10 characters long string labelled description
.
Usage
I use livedecode in VSCode by running it periodically in a terminal:
[user@host project]$ while true; do
clear
date
livedecode docs/wmb6.spec data/wmb/wmb6/block.wmb > /tmp/dump.txt
sleep 1
done
Then I view the dump.txt
side by side with my spec file:
This way, I can type, save and immediatly see the new decoding result. Working this way is very efficient and is really supporting an explorative workflow.
The Future
At roughly the same time where livedecode was finished, I started working on a new project that has a similar goal, but a different approach:
BFDL is using a formal syntax to describe the file formats instead of just executing a loose set of instructions. The benefits of that approach are that if you’re done discovering or designing your file format, you can then simply generate a serializer/deserializer for your format straight from your specification.