[GSoC] Ghidra firmware utilities, weeks 6-8

Hello everyone. It’s been a few weeks since I’ve written my last blog post, and during that time I’ve been working on the FS loader for UEFI firmware images. This FS loader aims to implement functionality similar to UEFITool in Ghidra.

As described in the previous blog post, Intel platforms divide the flash chip into several regions, including the BIOS region. On UEFI systems, the BIOS region is used to store UEFI firmware components, which are organized in a hierarchy. This hierarchy begins with UEFI firmware volumes, which consist of FFS (firmware file system) files. In turn, these FFS files can contain multiple sections. Firmware volumes can also be nested within FFS files. This helpful reference by Trammell Hudson as well as this presentation from OpenSecurityTraining have some additional information regarding UEFI firmware volumes.

For example, a UEFI firmware implementation could have a firmware volume specifically for the Driver eXecution Environment (DXE phase). Stored as FFS files, DXE drivers within the firmware volume could consist of a PE32 section to store the actual driver binary, as well as a UI section to store the name of the driver.

So far, I’ve implemented basic firmware volume parsing in the FS loader; I’ve pushed this to the GitHub repository. Currently, this doesn’t handle FFS file or section parsing.

FFS file and section parsing is still a work-in-progress, but here’s a preview:

This is mostly complete, but there are still some nasty bugs related to FFS alignment that I’m working on fixing. My focus for this week is to finish up this FS loader.

Update (2019-07-19)

I have committed support for UEFI FFS file/section parsing in the GitHub repo. Please open an issue report if you encounter any issues with it (such as missing files/sections that UEFITool or other tools parse without issues).

[GSoC] Ghidra firmware utilities, week 5

Hi everyone. As stated in my previous blogpost, I have been working on a FS loader for Intel Flash Descriptor (IFD) images. The IFD is used on Intel x86 platforms to define various regions in the SPI flash. These may include the Intel ME firmware region, BIOS region, Gigabit ethernet firmware region, etc. The IFD also defines read/write permissions for each flash region, and it may also contain various configurable chipset parameters (PCH straps). Additional information about the firmware descriptor can be found in this helpful post by plutomaniac on the Win-Raid forum, as well as these slides from Open Security Training.

For a filesystem loader, the flash regions are exposed as files. FLMAP0 in the descriptor map and the component/region sections are parsed to determine the base and limit addresses for each region; both IFD v1/v2 (since Skylake) are supported. Ghidra supports nested filesystem loaders, so the FMAP and CBFS loaders that I’ve previously written can be used for parsing the BIOS region.

If you encounter any issues with the IFD FS loader, please feel free to submit an issue report in the GitHub repository.

Plans for this week

I have started working on a filesystem loader for UEFI firmware volumes. In conjunction with the IFD loader, this will allow UEFI firmware images to be imported for analysis in Ghidra (behaving somewhat similar to the excellent UEFITool).

[GSoC] Ghidra firmware utilities, week 3

Last week, I finalized my work on the PCI option ROM loader, which was the first part described in my initial proposal for this project. This consists of a filesystem loader for hybrid/UEFI option ROMs and a binary loader for x86 option ROMs.

Background information on PCI option ROMs

Option ROMs may contain more than one executable image; for example, a graphics card may have a legacy x86 option ROM for VGA BIOS support as well as a UEFI option ROM to support the UEFI Graphics Output Protocol. x86 option ROMs are raw 16-bit binaries. The entry point is stored as a short JMP instruction in the option ROM header; the BIOS will execute this instruction to jump to the entry point. In contrast, UEFI images contain an UEFI driver, which is a PE32+ binary. This binary can be (and frequently is) compressed with the EFI compression algorithm, which is a combination of Huffman encoding and the LZ77 algorithm.

Filesystem loader

The filesystem loader allows hybrid/UEFI option ROMs to be imported. It also transparently handles the extraction of compressed UEFI executables.

Initially, I attempted to write a Java implementation of the EFI Compression Algorithm for use in the FS loader, but ran into several issues when handling the decompression of certain blocks. I eventually decided to reuse the existing C decompression implementation in EDK2, and wrote a Java Native Interface (JNI) wrapper to call the functions in the C library.

With the FS loader, UEFI drivers in option ROMs can be imported for analysis with Ghidra’s native PE32+ loader.

x86 option ROM binary loader

This loader allows x86 option ROMs to be imported for analysis. Various PCI structures are automatically defined, and the entry function is resolved by decoding the JMP instruction in the option ROM header.

PCI option ROM header data type
PCI data structure data type
Disassembled entry point

Plans for this week

I’ve started to work on filesystem loader for FMAP/CBFS (used by coreboot firmware images). After that, I plan on working on additional FS loaders for Intel flash images (IFD parsing) and UEFI firmware volumes.

As usual, the source code is available in my GitHub repository. Installation and usage instructions are included in the README; feel free to open an issue report if anything goes awry.

[GSoC] Ghidra firmware utilities, weeks 1-2

Hi everyone. I’m Alex James (theracermaster on IRC) and I’m working on developing modules for Ghidra to assist with firmware reverse engineering as a part of GSoC 2019. Martin Roth and Raul Rangel are my mentors for this project; I would like to thank them for their support thus far.

Ghidra is an open-source software reverse engineering suite developed by the NSA, offering similar functionality to existing tools such as IDA Pro. My GSoC project aims to augment its functionality for firmware RE. This project will consist of three parts: a loader for PCI option ROMs, a loader for firmware images, and various scripts to assist with UEFI binary reverse engineering (importing common types, GUIDs, etc).

The source code for this project is available here.

Week 1

During my first week, I started implementing the filesystem loader for PCI option ROMs. This allows option ROMs (and their enclosed images) to be loaded into Ghidra for analysis. So far, option ROMs containing uncompressed UEFI binaries can be successfully loaded as PE32+ executables in Ghidra. The loader also calculates the entry point address for legacy x86 option ROMs.

Plans for this week

So far this week, I’ve worked on writing a simple JNI wrapper for the reference C implementation of the EFI decompressor from EDK2, and have used this to add support for compressed EFI images to the option ROM FS loader. Additionally, I plan on making further improvements to the option ROM loader for legacy option ROMs; while the entry point address is properly calculated, they still have to be manually imported as a raw binary.

Update: coreboot conference in Europe, October 2015

UPDATE: Invitations published, venue is decided, few bed+breakfast rooms at the venue are still available

TL;DR: coreboot conference Oct 9-11, more info at http://coreboot.org/Coreboot_conference_Bonn_2015

 

Dear coreboot developers, users and interested parties,

we are currently trying to organize a coreboot conference and developer meeting in October 2015 in Germany.

This is not intended to be a pure developer meeting, we also hope to reach out to manufacturers of processors, chipsets, mainboards and servers/laptops/tablets/desktops with an interest in coreboot and the possibilities it offers.

My plan (which is not final yet) is to have the Federal Office for Information Security (BSI) in Germany host the conference in Bonn, Germany. As a national cyber security authority, the goal of the BSI is to promote IT security in Germany. For this reason, the BSI has funded coreboot development in the past for security reasons.

The preliminary plans are to coordinate the exact date of the conference to be before or after Embedded Linux Conference Europe, scheduled for October 5-7 in Dublin, Ireland. Planned duration is 3 days. This means we can either use the time window from Thursday Oct 1 to Sunday Oct 4, or from Thursday Oct 8 to Monday Oct 12. The former has the advantage of having cheaper hotel rooms available in Bonn, while the latter has the advantage of avoiding Oct 3, a national holiday in Germany (all shops closed). UPDATE: Preliminary dates are Friday Oct 9 to Sunday Oct 11. The doodle has been updated accordingly. Thursday and Monday could be filled with some cultural attractions if desired.

ATTENTION vendors/manufacturers: If your main interest is forging business relationships and/or strategic coordination and you want to skip the technical workshops and soldering, we’ll try make sure there is one outreach day of talks, presentations and discussions on a regular business day. Please indicate that with “(strategic)” next to your name in the doodle linked below.

If you wonder about how to reach Bonn, there are three options available by plane:
The closest is Cologne Airport (CGN), 30 minutes by bus to Bonn main station.
Next is Düsseldorf Airport (DUS), 1 hour by train to Bonn main station.
The airport with most international destinations is Frankfurt Airport (FRA), 2.5 hours by train to Bonn main station.
There’s the option to travel by train as well. Bonn is reachable by high-speed train (ICE), and other high-speed train stations are reasonably close (30 minutes).

What I’m looking for right now is a rough show of hands who’d like to attend so I can book a conference venue. I’d also like feedback on which weekend would be preferable for you. If you have any questions, please feel free to ask me directly <c-d.hailfinger.devel.2006@gmx.net> or our mailing list <coreboot@coreboot.org>.

Please enter your participation abilities in the doodle below:
http://doodle.com/bw52xs4fc7pxte6d

Regards,
Carl-Daniel Hailfinger

Reverse engineering blobs: adding diff to the toolkit

Last time I talked about the benefits of using sed to transform repetitive low-level patterns into meaninful function calls.  And still, doing all that regex magic did not get us a fully working replay. A great portion of the hardware initialization flow is based on situational awareness. What hardware is connected? What are our capabilities? What if …?

That means a simple sequence of writes is, in most cases, not sufficient. We may need to modify registers, wait on other hardware, or respond differently to hardware states. While that seems daunting and tedious, it gives us an unexpected advantage: that every execution of the blob produces a different trace.

This is where diff comes in. By getting a bunch of traces and diffing them, we can see the points where the firmware takes different decisions, and the states which determine those decisions.  It won’t tell us what condition triggers path A or path B, but it allows us to infer that by comparing the hardware states. Let’s have a look:

@@ -305,28 +305,28 @@ void run_replay(void)
 radeon_read_sync(0x6430); /* 04040101 */
 radeon_write_sync(0x6430, 0x04000101);
 radeon_write_sync(0x3f50, 0x00000000);
- radeon_read(0x3f54); /* 000dda12 */
- inl(0x2004); /* 000dda8a */
- inl(0x2004); /* 000ddb1a */
- inl(0x2004); /* 000ddb9c */
- ...
- inl(0x2004); /* 000de3a0 */
- inl(0x2004); /* 000de41a */
+ radeon_read(0x3f54); /* 000d9efb */
+ inl(0x2004); /* 000d9f73 */
+ inl(0x2004); /* 000da003 */
+ inl(0x2004); /* 000da081 */
+ ...
+ inl(0x2004); /* 000da883 */
+ inl(0x2004); /* 000da8fd */
 radeon_read_sync(0x611c); /* 00000000 */
 radeon_write_sync(0x611c, 0x00000002);
 radeon_write_sync(0x6ccc, 0x00007fff);

I’ve chosen an example that shows similarities rather than differences, as I find this to be a more interesting case. Since we’ve already established that 0x2004 is our data port, as long as we don’t touch the index port, we’ll be reading from the same register, in this case 0x3f54.

Now the values returned by this register are completely different in every trace, yet the behavior of the blob is strikingly similar every time. The first key observation is that this register increase monotonically in every trace. It also increases by roughly the same amount on successive reads. The differences between the last read and first read of the register are also strikingly similar in both traces: 0xa08 and 0xa02 respectively.

This register looks to be a monotonic timer, and the loop has all the elements of a delay loop. To determine the actual delay, we could try to extract absolute timing information when collecting the trace; however, in this specific case, I had the AtomBIOS tables handy. By comparing register accesses around this loop, I was able to figure out where in the tables this delay is occuring:

 0200: 0d250c1901 OR reg[190c] [...X] <- 01
 0205: 54300c19 CLEAR reg[190c] [.X..]
 0209: 5132 DELAY_MicroSec 32

The ’32’ in the delay is a hex number. Doing a bit of hex math we see we’re waiting about 51 ticks per microsecond. Comparing more loops, we get between 50 to 52 ticks per microsecond. Since a delay loop normally waits until the minimum time has elapsed, we now have a very convincing case that register 0x3f54 implements a 50MHz monotonic timer. Every time before accessing this register, we also poke register 0x3f50. That looks very much like the timer control register.

We now extend our sed script with:

timerctl=0x3f50
timer=0x3f54
...
sed "s/radeon_read($timer);[^$hex\r]*\([$hex]*\)[^\r]*\(\r\tinl($dport);[^$hex\r]*\([$hex]*\)[^\r]*\)*/radeon_delay(0x\3 - 0x\1);/g" |
sed "s/radeon_write_sync($timerctl, 0x0\{1,8\});[^\r]*\r\tradeon_delay(/radeon_delay(/g"

Now when we rerun our logs through the script, the results decrease in size from 20K lines to 13K lines. The diffs between processed logs also decrease in size significantly. All the more proof we were right!

There’s another way in which diff is excellent for our purpose. We can implement our helpers to generate the same output as the processed logs. That allows us to poke the replay from userspace, yet get the same output format. Now we can diff the replay and original log, and observe how the hardware state changes. We can even go as far as implementing our delay with usleep() instead of the timer at 0x3f54. When the diff is independent of the delay method we use, we have another strong proof our assumptions are true. This is the case here.

‘diff’ is an extremely powerful tool. Despite its name, it can show similarities just as well as differences. While regular expressions exaust their usability with simple patterns, diff can take us a lot further. Now that we’ve cracked the delay implementation of the blob, we can more easily see delay and wait loops — again, using diff. Complex, multivariable patterns are too awkward to handle with sed. I’ll go over those some other time. However, once such patterns are simplified to a function call, diff can once again show the story. Different GPU model? diff. New display? diff. HDMI connected? diff. It’s almost as versatile as det cord .

cbfs_media [Week 1]

This post covers the complete set-up and building coreboot for cubieboard. Due to lack of documentation for this, I had to spend sometime figuring out the details; hence decided to write it myself to help others in the future.

  1. Step 1: Build Payload

As mentioned here, what we have to do first is to build a payload to use later for coreboot.rom. A suitable ARM payload is the sunxi/uboot. Now, there are two ways to build uboot: natively or from another system. To build from another system, we need to get a suitable toolchain. For that you need to do:

apt-get install gcc-arm-linux-gnueabihf

Too many issues are faced to get this toolchain set up right 😐  A more suitable and convenient method is to build uboot natively from the cubieboard itself (thanks #cubieboard for the tip :P).  For this follow the instructions here. tl;dr Clone repository, choose you target board, make (without CROSS_COMPILE). This completes building the payload.  NOTE: The correct file to use as the payload is “u-boot”, not u-boot.bin. The “u-boot” file is the non-SPL part of uboot in elf format. The log for successful build can be seen here.

file u-boot
u-boot: ELF 32-bit LSB shared object, ARM, version 1 (SYSV), dynamically linked (uses shared libs), not stripped

2.  Step 2: Build coreboot

Before following the instructions on the coreboot/Build_HOWTO, you first need the latest development code; which contains the mmc driver needed to load romstage, etc.

You need to make crossgcc first. This might take a lot of time: be patient 😛 Some missing toolchain errors can arise. Get past them by:

apt-get install bison flex patch
add-apt-repository ppa:linaro-maintainers/toolchain

Once this is done, set your suitable configuration in make menuconfig. Make sure to disable CONFIG_VGA_ROM_RUN (set by default), since it doesnt work for ARM boards. Just make and wait. Your coreboot.rom is ready. 🙂

This image needs to be placed on the SD card:

dd if=build/BOOT0 of=/path/to/sdcard/blockdev bs=1024 seek=8

Now pfff! this is enough to get coreboot up and running! 😀

For the next week; our plan is to Identify locations of the map() and read() calls; and to determine size of each map(). The driver is currently configured to pull the entire cbfs into ram; so we work to reduce size of these mappings. 

GSoC [early debugging] The very short introduction

Oh dear, what did I get into again. My GSoC 2014 project page  gives you an idea of things to expect during this summer of code and seems like I have promised to deliver a lot this time. More like a complete in-circuit-debugging solution of x86 boot firmware over some readily available and low-cost USB hardware. I only scratched the surface with my GSoC 2013 when I did not get much further than a working usbdebug and some intense clean-up and preparation on the CBMEM side.

There has been serious use of usbdebug combined with SerialICE to troubleshoot and/or reverse-engineer proprietary firmwares. Tests have been done to connect GDB stub built into coreboot over BeagleBone even before debug target has initialized RAM.  Also other pieces of my project plan have already seen proof-of-concepts but the quality or the flexibility have not reached the requirements to see them in widespread use in coreboot community.

We can see more practical uses for SerialICE if we could connect QEMU, GDB, SerialIce and radare together, and visualize some of the system bus topologies at runtime. With the amount of support CPU and chipset vendors have shown towards open-source firmware development the last years, I consider this as a key part for any further community-driven mainboard ports on coreboot.

GSoC (coreboot): Test interface board complete

Apologies for the late update. The design that I posted in the last post was more challenging than I had thought. However I’m happy to announce that my test hardware that I call ‘coreboot test interface board’ (TIB) is now complete. Only some of the software interface part is remaining in the project. So let me share with you a very quick update of last month. Continue reading GSoC (coreboot): Test interface board complete

A brief progress sheet

Last week, I laid out a list of things to do in order to get more of the protocol finalized. Most of the items are crossed off, however, one little item remains standing. This item, while innocent and seemingly harmless is more painful than falling on your buttocks from a 10 story building on solid granite. Let’s have a look at why this small item is of such significance.

  • new API call set_chip_size()

When people think of QiProg, they think of one gadget with one flash chip connected. This is the common case, and, for the foreseeable future, will be the de-facto way of using QIProg. However, the original USB specification was intended for a broader use case: a programmer with several, individually addressable chips connected. One who observes the qiprog_read_chip_id() call will notice that it translates to a READ_CHIP_ID request over USB. This request will return identification data for up to nine chips. Aye, there’s the rub.

How does this play into set_chip_size()? Simple, set_chip_size for which chip? Do we send a flat list of nine uint32_t sizes, thus only needing one round-trip (control request) for all chips? Do we use the wIndex field of the round-trip, at the cost of needing one such trip for each chip? Once this question is answered, it will determine the answer for set_[erase/write]_[size/command] call and their respective USB round-trips, thus completing the USB protocol, and bringing QiProg to a usable state.

It’s easy to see why this one little detail is a blocker for all other remaining issues. I am leaning towards the use of wIndex (not the glass cleaner). Implementing a new control request in software and firmware is a matter of minutes. Testing it, and making sure it works properly is, at most, a two hour endeavor. Getting the design right: priceless.