Reverse engineering blobs with replay and sed

I recently started re-visiting the HP Pavilion m6 1035dx with a recent coreboot master. As usual the benign VGA BIOS got in the way again. This time I decided to use coreboot’s YABEL realmode emulator to tell the story. Let’s dive in:

Executing Initialization Vector...
[0000f2cb]c000:3b69 inb(0x03c3) = 0x20
[0000f2cc]c000:3b6f inl(0x204c) = 0x00000000
[0000f2ce]c000:3e06 inl(0x2000) = 0x9ffffffc
[0000f2cf]c000:3b69 inb(0x03c3) = 0x20
[0000f2d0]c000:3b6f inl(0x204c) = 0x00000000
[0000f2d3]c000:3b69 inb(0x03c3) = 0x20
[0000f2d4]c000:3b6f inl(0x204c) = 0x00000000
[0000f2d5]c000:3e61 outl(0x00001728, 0x2000)
[0000f2d5]c000:3e67 outl(0x0008c000, 0x2004)
[0000f2d8]c000:3b69 inb(0x03c3) = 0x20
[0000f2d9]c000:3b6f inl(0x204c) = 0x00000000
[0000f2da]c000:3e49 outl(0x00003f54, 0x2000)
...

That’s how the coreboot log looks when we enable YABEL traces. Enabling direct hardware access produces a cleaner log to start with. We want to clean that up a little bit. A bit of simple regex substitution gets us there. Excuse the wrap-around:

egrep "c000:[0-9,a-f]{4}|x86emuOp_halt|runInt[0-9,a-f]{2}.*starting" $1 |
grep -v "Running option rom at c000:0003" |
sed "s/c000\:[0-9,a-f]\{4\}\ /\t/g" |

tr '\n' '\r' |

sed "s/\[[0-9,a-f]\{8\}\]//g" |
sed "s/)/);/g" |
sed "s/=\ 0x\([0-9,a-f]*\)/\/\*\ \1\ \*\//g" |

tr '\r' '\n'

That’s enough to get our trace to something that looks more like C code, which could be used as a replay function (HINT!):

 inb(0x03c3); /* 20 */
 inl(0x204c); /* 00000000 */
 inl(0x2000); /* 9ffffffc */
 inb(0x03c3); /* 20 */
 inl(0x204c); /* 00000000 */
 inb(0x03c3); /* 20 */
 inl(0x204c); /* 00000000 */
 outl(0x00001728, 0x2000);
 outl(0x0008c000, 0x2004);
 inb(0x03c3); /* 20 */
 inl(0x204c); /* 00000000 */
 outl(0x00003f54, 0x2000);
...

We’ve neatly kept the return values of IO input operations for future reference, but the current form doesn’t tell us much. We do, however see patterns emerging. IO to 0x2000 followed by 0x2004 is fairly common. That looks like an index/data pair. Also, access to 0x03c3 and 0x204c before poking the above pair is all too common. Let’s extend our script with:

iport=0x2000
dport=0x2004
stsport=0x204c

..

sed "s/outl(0x0\{0,4\}\([0-9,a-f]*\), $iport);[^\r]*\r\toutl(0x\([0-9,a-f]*\), $dport);/radeon_write(0x\1, 0x\2);/g" |
sed "s/outl(0x0\{0,4\}\([0-9,a-f]*\), $iport);[^\r]*\r\tinl($dport);/radeon_read(0x\1);/g" |

sed "s/inb(0x03c3);[^\r]*\r\tinl($stsport);[^\r]*/sync_read();/g" |

Since we’ve converted all our newlines to carriage returns, we can match patterns from multiple lines. It doesn’t matter what we think these patterns do, or how we call the new functions. We’re just interested in grouping them to see the bigger picture. I could as well have called these hamburger() and french_fries().

 sync_read();
 inl(0x2000); /* 9ffffffc */
 sync_read();
 sync_read();
 radeon_write(0x1728, 0x0008c000);
 sync_read();
 radeon_read(0x3f54); /* 002badc3 */
...
 sync_read();
 radeon_read(0x0670); /* 0000fe04 */
 sync_read();
 radeon_write(0x0670, 0x0000fe04);

Much better! We’ve replaced the low-level patterns with more meaningful descriptions. We could grep out the sync_read(), or, if we were using this for replay code, incorporate the _sync() into another function with radeon_(). Even in this form, we can start looking for higher-level patterns. If we look into the disassembly of the video BIOS provided by AtomDis, we see:

 0009: 07a59c01fc AND reg[019c] [.X..] <- fc
 000e: 0d659c0180 OR reg[019c] [..X.] <- 80

Since the registers are 32-bit, then their address would be reg << 2. Thus [019c] becomes 0x0760. These read-modify-write sequences appear in the replay trace.

Now we have an idea where we are with respect to the AtomBIOS tables, we see, at a low level, how those tables translate to hardware accesses. As more patterns are identified, they can be transformed into more meaningful function calls

Now we have the opportunity to look for more advanced patterns, and even identify any code that is not in the AtomBIOS tables. This, in turn can allow us to figure out what actually turns on the display. There is still a huge gap before having anything close to native init. What we’ve done here is develop a coreboot-level understanding of the init process, and made the first tiny step towards native VGA initialization.

While there is only so much regex substitution can do for us, it is a necessarry first step towards a larger understanding of the problem. Transforming 0x0760 to 0x019c is ill suited for regex.  Identifying more complex patterns such as waiting for a condition, or delaying a set amount of time is also a more demanding task; however, the power lies in the ability to script and automate the conversion from a nonsensical log into something more human-friendly. It then becomes trivial to diff several traces and see higher-level patterns. I’ll discuss those some other time.