Forensics with old friends: Hachoir file carving with Jupyter
by Jeff
Old Friends
When you do infosec for a while you gather a collection of ‘old friends’; tools you rely on over the years to help you get your job done. Some are simple like dd, or xxd. Some are complex (vsCode?) but it’s fun to revisit some old friends in new context to see how they play out.
Note to reader: Taylor is not actually an old friend.
Forensics
I recently had the chance to take Sara Edwards’ SANS course FOR 518 for Mac Forensics and it was a blast! A week’s worth of digging through Mac/Iphone artifacts culminating with a team challenge to solve a case using what we had learned. It was great fun and was a chance to revisit some core skills.
File Carving
If you do forensics for any length of time, you’ll have occasion to carve out a file from a binary blob of nothing. Could be a broken disk, could be a stream intercepted from the network, regardless having an old friend that helps you find and retrieve a file is invaluable.
Hachoir
Hachoir means ‘meat grinder’ in French and is the name of a python project used to parse out structures in all sorts of data. The project was a big help to me over the years when I needed to, oh say parse out MSTask job files to find malicious entries.
Jupyter
I decided to revisit this old friend to see if I could whip up a jupyter notebook for file carving making use of the variety of parsers available in Hachoir.
Scenario
In this scenario we’ve been handed a blob of data and asked to retrieve the last frame of a .gif within the blob. This could have come from a network stream, a bad flash drive, memory, anywhere. To Hachoir it doesn’t matter.
The Notebook
Lets dig into the notebook for this task. If you want to play along at home here is the source file; a blob of who knows what.
First off, we can get a peak at the parsers imbedded in Hachoir:
Any chance we just get it right off the bat?
No joy. Do we have a wild guess from the header?
No luck, ok lets turn to our old friend and see if Hachoir can step through the file and find anything.
# step through the file to see if we can recognize a portion of it
view=io.BytesIO(io.open(target,"rb").read()).getbuffer()
# step through the first x of the file 8 bits at a time looking for recognized files
for x in range(0,4096,8):
parser=guessParser(InputIOStream(io.BytesIO(view[x::])))
if parser:
print(f"{parser} found at position {x}")
Voila:
<GifFile path=/, current_size=104, current length=3> found at position 512
Now we’ve got the bytes from our blob in a readable, scannable view thanks to the Bytes IO library.
Lets take a peek at this header to validate it’s a GIF:
Sure enough, Harchoir found it. Now lets see if we can find our way around it and export the last frame.
Inspection
After re-initing our parser
parser=guessParser(InputIOStream(io.BytesIO(view[512::])))
Lets see the tags Hachoir associated:
Fields in Hachoir are lazily evaluated, lets take a look at some of what it can find:
Within the fields there are tree structures for components of the various parts, like the individual frames of the image:
You can inspect values for individual fields:
You can see the image will loop forever with a loop count of 0.
Now normally in file carving, you can count on your old friend dd to grab some bytes like so:
dd if=/Users/jeff/work/ablob_of_who_knows_what ibs=1 skip=<number> count=number of=put_it_here
Here our task is to grab the last frame and GIF’s make our task a bit harder in that each frame can have a local color map, or share a global one. Luckily Hachoir can tell us which one we are dealing with:
So we will need the help of an image processing library. Luckily the PIL/Pillow library can do just the trick.
In the home stretch, we load up the bytestream, count the frames and save out just the last one:
Did it work?! Lets use jupyter’s embedded image viewer to find out!
You can find other cells in the notebook to save out the entire gif, change the loop count, etc if that’s of interest.
File Carving: Check!
Thanks to our old friend Hachoir, we now have a handy Jupyter playbook for traversing binary data and pulling out pretty much anything!
tags: infosec - forensics - jupyter - hachoir