Reverse engineering blobs with replay and sed

I recently started re-visiting the HP Pavilion m6 1035dx with a recent coreboot master. As usual the benign VGA BIOS got in the way again. This time I decided to use coreboot’s YABEL realmode emulator to tell the story. Let’s dive in:

Executing Initialization Vector...
[0000f2cb]c000:3b69 inb(0x03c3) = 0x20
[0000f2cc]c000:3b6f inl(0x204c) = 0x00000000
[0000f2ce]c000:3e06 inl(0x2000) = 0x9ffffffc
[0000f2cf]c000:3b69 inb(0x03c3) = 0x20
[0000f2d0]c000:3b6f inl(0x204c) = 0x00000000
[0000f2d3]c000:3b69 inb(0x03c3) = 0x20
[0000f2d4]c000:3b6f inl(0x204c) = 0x00000000
[0000f2d5]c000:3e61 outl(0x00001728, 0x2000)
[0000f2d5]c000:3e67 outl(0x0008c000, 0x2004)
[0000f2d8]c000:3b69 inb(0x03c3) = 0x20
[0000f2d9]c000:3b6f inl(0x204c) = 0x00000000
[0000f2da]c000:3e49 outl(0x00003f54, 0x2000)
...

That’s how the coreboot log looks when we enable YABEL traces. Enabling direct hardware access produces a cleaner log to start with. We want to clean that up a little bit. A bit of simple regex substitution gets us there. Excuse the wrap-around:

egrep "c000:[0-9,a-f]{4}|x86emuOp_halt|runInt[0-9,a-f]{2}.*starting" $1 |
grep -v "Running option rom at c000:0003" |
sed "s/c000\:[0-9,a-f]\{4\}\ /\t/g" |

tr '\n' '\r' |

sed "s/\[[0-9,a-f]\{8\}\]//g" |
sed "s/)/);/g" |
sed "s/=\ 0x\([0-9,a-f]*\)/\/\*\ \1\ \*\//g" |

tr '\r' '\n'

That’s enough to get our trace to something that looks more like C code, which could be used as a replay function (HINT!):

 inb(0x03c3); /* 20 */
 inl(0x204c); /* 00000000 */
 inl(0x2000); /* 9ffffffc */
 inb(0x03c3); /* 20 */
 inl(0x204c); /* 00000000 */
 inb(0x03c3); /* 20 */
 inl(0x204c); /* 00000000 */
 outl(0x00001728, 0x2000);
 outl(0x0008c000, 0x2004);
 inb(0x03c3); /* 20 */
 inl(0x204c); /* 00000000 */
 outl(0x00003f54, 0x2000);
...

We’ve neatly kept the return values of IO input operations for future reference, but the current form doesn’t tell us much. We do, however see patterns emerging. IO to 0x2000 followed by 0x2004 is fairly common. That looks like an index/data pair. Also, access to 0x03c3 and 0x204c before poking the above pair is all too common. Let’s extend our script with:

iport=0x2000
dport=0x2004
stsport=0x204c

..

sed "s/outl(0x0\{0,4\}\([0-9,a-f]*\), $iport);[^\r]*\r\toutl(0x\([0-9,a-f]*\), $dport);/radeon_write(0x\1, 0x\2);/g" |
sed "s/outl(0x0\{0,4\}\([0-9,a-f]*\), $iport);[^\r]*\r\tinl($dport);/radeon_read(0x\1);/g" |

sed "s/inb(0x03c3);[^\r]*\r\tinl($stsport);[^\r]*/sync_read();/g" |

Since we’ve converted all our newlines to carriage returns, we can match patterns from multiple lines. It doesn’t matter what we think these patterns do, or how we call the new functions. We’re just interested in grouping them to see the bigger picture. I could as well have called these hamburger() and french_fries().

 sync_read();
 inl(0x2000); /* 9ffffffc */
 sync_read();
 sync_read();
 radeon_write(0x1728, 0x0008c000);
 sync_read();
 radeon_read(0x3f54); /* 002badc3 */
...
 sync_read();
 radeon_read(0x0670); /* 0000fe04 */
 sync_read();
 radeon_write(0x0670, 0x0000fe04);

Much better! We’ve replaced the low-level patterns with more meaningful descriptions. We could grep out the sync_read(), or, if we were using this for replay code, incorporate the _sync() into another function with radeon_(). Even in this form, we can start looking for higher-level patterns. If we look into the disassembly of the video BIOS provided by AtomDis, we see:

 0009: 07a59c01fc AND reg[019c] [.X..] <- fc
 000e: 0d659c0180 OR reg[019c] [..X.] <- 80

Since the registers are 32-bit, then their address would be reg << 2. Thus [019c] becomes 0x0760. These read-modify-write sequences appear in the replay trace.

Now we have an idea where we are with respect to the AtomBIOS tables, we see, at a low level, how those tables translate to hardware accesses. As more patterns are identified, they can be transformed into more meaningful function calls

Now we have the opportunity to look for more advanced patterns, and even identify any code that is not in the AtomBIOS tables. This, in turn can allow us to figure out what actually turns on the display. There is still a huge gap before having anything close to native init. What we’ve done here is develop a coreboot-level understanding of the init process, and made the first tiny step towards native VGA initialization.

While there is only so much regex substitution can do for us, it is a necessarry first step towards a larger understanding of the problem. Transforming 0x0760 to 0x019c is ill suited for regex.  Identifying more complex patterns such as waiting for a condition, or delaying a set amount of time is also a more demanding task; however, the power lies in the ability to script and automate the conversion from a nonsensical log into something more human-friendly. It then becomes trivial to diff several traces and see higher-level patterns. I’ll discuss those some other time.

The pink room – a coreboot developer meeting in Prague

A highly unusual sight? Yes. A pink room filled to the brim with hardware, hackers and pizza boxes.
An unexpected place? Indeed. The architecture faculty building of the Czech Technical University in Prague.
An unusual schedule for four days in Prague? Of course. Intense hacking, talks, discussions and hardware destruction/modding.
Highly unusual sightseeing? Absolutely. A steam-powered wastewater treatment plant and the Prague Signal Festival.
Excellent food? Oh yes. Czech specialties at really low prices.
Disasters? Not really. There were canceled trains/planes from and to Prague affecting quite a few of the attendees, but those were eventually overcome.
Bad things? Possibly. We did unspeakable things to BIOS and EFI images, and we’re proud of it, so I’m not sure if that qualifies as bad.
Great meeting? Awesome meeting!

So what did we do?
– Found a way forward for the dozens of boards which haven’t been tested in ages.
– Overhauled the board status reporting model, so be implemented soon.
– Evaluated ways to get board testing into a state where people can do so easily without manual work.
– Discussed hardware vendor interaction.
– Agreed to ship verified boot functionality by default in a way which still keeps all the freedom for the user.
– coreboot will get a compelling security story out of the box.
– Fixed bugs in some ports.
– Ported a few new laptops.
– Learned new tricks to improve suspend-to-RAM functionality the easy way.
– Made a plan to upstream essential drivers in the chromiumos branch of flashrom.
– Adjusted the flashrom development model to something that works with the scarcity of reviewer time currently available.
– Found out that a video projector is a nice way to test VGA output in case the only VGA capable monitor is in use.
– Noticed that pretty much every question in the form of “does anybody have X” can be answered with “yes” if X is something that may be connected to a computer or is remotely useful for coreboot.
– Didn’t get a lot of sleep.
– Listened to talks about the present and future of coreboot.
– Met some longtime community members for the first time.

Photos of the meeting and sightseeing will be added soon.

coreboot.org website updates

Woot! A new look for coreboot.org.  We have shifted the landing page from the mediawiki to WordPress. DON’T PANIC!, we are still using the wiki as the primary location for developer content. The new landing page and WordPress site is more visually appealing and is the location for news, blogs,  and other basic information for those that are just discovering and learning about coreboot.

GSoC [infrastructure] : Along the way, something went terribly wrong

I started working with AMD platforms’ infrastructure with high hopes of being able  to better manage the CBMEM setup. While I have a selection of family14 boards to work with, things did not continue so well:

First off, tree had literally thousands of lines of copy-pasted or misplaced AGESA interface code remaining in the tree. A lot of that should have been caught in the reviews, but it appears a few years back the attitude was that if coreboot project was lucky enough to get some patches from an industry partner, the code must be good (as the development was paid for!) and just got rubber-stamped and committed.

Second, the agreement I have for chipset documentation is not open-source friendly. It contains a clause saying all documentation behind the site login is to be used for internal evaluation only. I was well aware of this at the beginning of GSoC and at that time I expected my mentor organisation would be able to get me in contact with right people at AMD to get this fixed. But that never happened. I am also concerned of the little amount of feedback received as essentially nobody in community has fam16kb to test.

So currently I am balancing which parts of my work on AGESA I should and can publish and what I cannot. Furthermore, first evidences that vendor has decided to withdraw from releasing  AGESA sources have appeared for review. In practice there has been zero communication with the coreboot community on this so I anticipate the mistakes that were done with FSP binary blobs will get repeated.

Needless to say the impact this has had on my motivation to further work on AGESA as my efforts are likely to go wasted with any new boards using blobs. I guess this leaves pleanty of opportunities for future positive surprises once we have things like complete timestamps, CBMEM and USBDEBUG consoles and generally any working debug output from AGESA implemented. I attempted to initiate communication around these topics already in fall 2013 without success and it is sad to see the communications between different parties interested in overall tree maintenance have not improved at all since then.

GSoC 2014 [flashrom] Support for Intel Bay Trail, Rangeley/Avoton and Wildcat Point

While we were busy updating our AMD driver code to accommodate the new SPI controller found in Kabini and Temash, Intel has also changed their SPI interface(s) in a way that required quite some effort to support it in flashrom. A pending patch set is the result of the work of a number of parties and I will shortly explain some details below.

All started with a patch for ChromiumOS’s flashrom fork in fall 2013 that introduced support for Intel’s Bay Trail SoCs which are used in a number of currently shipping or announced Chromebooks. Bay Trail is part of the Silvermont architecture also in other SoCs intended for different use cases like mobile phones (Merrifield and Moorefield) or special-purpose servers (Avoton and Rangeley). At least for the latter two the SPI interface is equivalent to Bay Trail’s. This was handy for Sage when they were developing their support package for Intel’s Mohon Peak (Rangeley reference board) which was upstreamed to the coreboot repository shortly before this blog post was written. They ported the patch to vanilla flashrom, added the necessary PCI IDs and submitted the result to our mailing list at the end of May.

Because we, the flashrom maintainers, are very picky, the code could not be incorporated as is. I took the patch and completely reworked and refactored it so that more code could be shared and we are hopefully better prepared for future variations of similar changes. Additionally, I have also backported the Intel Wildcat Point support that ChromiumOS got already in May.

The major part of my contribution was not simply integrating foreign code, but refactoring and refining it where possible as well as verifying it against datasheets. While digging these numerous datasheets and SPI programming guides I have also fixed the problem of hardware sequencing not working on Lynx Point (and Wildcat Point) as it was reported in March when I had no time to correct it.

All of this is not committed to the main repository yet, but will be soon. It is mostly untested so far and I would very welcome any testers with the respective hardware.

[GSoC-2014] Payload Loading – Success!

This post marks the completion of the payload re-structuring that I did as a second part of the project. The following summarizes the major highlights of the work:

  • We change the ‘struct payload’ ( in the src/include/payload_loader.h) to have a ‘struct cbfs_media’ and cbfs_file_handle. That way we have the two thing we need to read the contents of the file.
  • In the build_self_segment_lists() we use the above data structures to media->read() the payload metadata. 
  • We use the metadata to segregate segments on the basis of their types and form a linked list of segments for reference later.
  •  Then we do mapping and then decompression for the compressed segments and direct reading for the uncompressed segments.

Some of the major issues were: playing with the data_offset and segment offsets; to get src_address to read to.

I was able to get past all the issues and finally got a successful working boot This completes the revamp of stage and payload loading 😀

During the past week I also worked with gerrit; wherein I submitted my patches and received feedback. I am more involved with the development process of coreboot now and leaving aside some minor glitches; the process was very smooth.

 

[GSoC-2014] [cbfs_media] Stage 2

As per the plan, we set out to investigate the decompression algorithm that is being employed. Mid-way through, we found something interesting that appealed to us. That was the payload loading process. In our existing architecture, we memory map the entire payload. The selfload()’s current API assumes the payload has already been memory mapped. That’s the bad assumption that needs to change. Even if we investigate and resolve the decompression algorithm, and get a pipelined architecture relying on smaller buffer size, still this mapping will cost resources. Hence this needs to be rectified before we go on the the decompression thingy. So we decided this is what we will be targeting. 

I then spent some time trying to break the payload loading and see where we can put some map() saving efforts. Below shows some details of the process:
1. First we locate the payload. Here the process for stage loading happens where we have the following:
default_media->open()
Reading done. size = 24 bytes
Load entry 0x14440 file name (32 bytes)…
Mapping size is equal to 32
Found file:offset = 0x14478, len=95716
CBFS: Found file.
default_media->map(0x14478, 0x1761c)
Mapping size is equal to 95804
CBFS: located payload @ 7ec14298, 95716 bytes.
Thus we map the all the segments with this one big mapping.
2. Loading procedure begins by building the segment list (build_self_segment_list() does this) Here w.r.t. payload_segment_types, we check for proper destination address, file_size etc. After checking for the segment; we do simple aligning by pointing the prev and next pointers appropriately; to reach to the further places where to place the segment; and so on.
3. In load_self_segments() we run a simple for loop covering all segments. The loading that happens is
(i) A PAYLOAD_SEGMENT_CODE
–> Loading segment from rom address 0x7ec14298
code (compression=1)
New segment dstaddr 0x4a000000 memsize 0x39929 srcaddr 0x7ec142d0
filesize 0x175ac
(cleaned up) New segment addr 0x4a000000 size 0x39929 offset 0x7ec142d0 filesize 0x175ac
(ii) Next a PAYLOAD_SEGMENT_ENTRY
–> Loading segment from rom address 0x7ec142b4
Entry Point 0x4a000000
(iii) After this we come to load_self_segments()
First a bounce buffer is created:  Bounce Buffer at 7ffcf000, 186192 bytes.
We have one segment that is worked upon; which is compressed hence ulzma(src , dest) reads it.
–>Loading Segment: addr: 0x000000004a000000 memsz: 0x0000000000039929 filesz: 0x00000000000175ac
Post relocation: addr: 0x000000004a000000 memsz: 0x0000000000039929 filesz: 0x00000000000175ac
using LZMA
After that one segment that we see on the logs it says
–> Loaded segments
Hence process complete. In essence we had only 3 segments.
Currently, I am working on a strategy on deciding how to modify the architecture of the API so as to conserve as much sram memory consumption as possible.

[GSoC 2014][cbfs_media] Stage 1 : Mission Accomplished

Firstly, sorry for the delay in posting update on the work. I had been busy getting the design to code and wanted to post after its successful completion.

As I had talked about in the previous post, we did a detailed analysis on the existing read() and map() calls. The original log; with all the extra gibberish removed can be seen here. The first design modification that was done was to remove the mapping done for getting cbfs_header. These were the  0x20 size mappings we see in the log. These were unnecessary and could be done away with. And we did! 😛 This log shows the first optimized build; Stage 1 -> Part 1 ->done.

Now we moved on to the more complex and colossal mappings. A function cbfs_find_file() was created, which returned the absolute data_offset of the file based on the name and type we ask for. Once we have the whereabouts of the file; modifications were made in cbfs_load_stage() to appropriately read() and/or map() various files.

The files are arranged as  -> [  cbfs_file  ] [  cbfs_stage  ] [  data  ] <Thanks Aaron for this visualization >

cbfs_find_file() : worked with the cbfs_file to get details about the whereabouts of the file

cbfs_load_stage() : we first read fundamental information about the stage; and then do corresponding map() or read()

Voila!! Stage 1 Complete! 😀

Now, the major issue we have persisting is that the decompression of file data assumes memory mapped access to its contents, and hence is quite inefficient due the that ‘one’ large buffer. SO  this is what we tackle next, to be more precise, have a pipelined decompression strategy which would eliminate the need for one large data buffer.

Its getting fascinating to work on the project by the day! Until the next post, signing off.

P.S.  Thanks Aaron for helping out with any and every issue I face, and always finding the time to reply, even on sundays! 😀

GSoC 2014 [cbfs_media] Updates

This past week went into looking at the internal working of the cbfs_media interface. Some of the major observations were:

Locations of map() and read() calls
No read() calls at all. Also for the map() calls that were made, there weren’t any unmap() calls.

Size of mappings
The entire cbfs is pulled into the iram. There is a map call which puts about 28KB into the sram, to load romstage. The a10 has an sram of 32KB, hence we are using up most of the necessary ram.
The sequence followed is open() -> map()’s -> close().

Total Resources
Is just the sum of all the mappings, since there are no unmaps to subtract. This gives a benchmark to work upon. Now resource utilization is calculated each time coreboot loads, automatically and progress can tracked.
Now we are giving some thoughts on how to reduce the size of the mappings, one possibility being defining a limit (bound) on its size. What is happening currently, is the size is determined dynamically and hence some mappings are quite large. If we define a bound on it, and then repeat call ‘smaller’ map()s instead of one big one, that could do the job. But this wont always work as the decompression algorithm (LZMA) expects memory-mapped access to the entire compressed buffer. By the end of this week, we hope to strike a workaround this and get a more resource-efficient cbfs interface.