[GSoC] Ghidra firmware utilities, wrap-up

Hi everyone. The official programming period for GSoC 2019 is now over, and it’s time for final evaluations. I will use this post to summarize what I’ve worked on this summer, as well as how to use the Ghidra plugin.

The project is available on GitHub: https://github.com/al3xtjames/ghidra-firmware-utils

Project details

In my initial project proposal, I planned on writing various filesystem loaders (for hybrid PCI option ROMs, Intel flash descriptor images, coreboot File System images, and UEFI firmware volumes), a binary loader for legacy x86 PCI option ROMs, and a UEFI helper script. I ended up implementing all of these in the Ghidra plugin, and also worked on a UEFI Terse Executable binary loader. You can look at my previous blogposts to see my progress throughout the summer.

Here is a description of the components included in the project:

FS loaders allow files stored within binary images to be imported directly into Ghidra. The following FS loaders are implemented in this project:

Hybrid PCI option ROM

Some PCI option ROMs may contain multiple executable ROMs. This is usually used to support multiple firmware types (e.g. a video card with legacy BIOS VGA support and UEFI Graphics Output Protocol support). The FS loader allows each embedded executable ROM image to be imported.

Intel firmware descriptor (IFD)

Recent Intel platforms have multiple regions on the SPI flash (used to store system firmware). The descriptor region describes the layout of these flash regions. The FS loader allows each flash region to be imported. Ghidra supports nested FS loaders, so other FS loaders (FMAP/CBFS or UEFI FV) can be used to parse certain regions, such as the BIOS region.

Flash Map (FMAP)

This is another standard for describing flash regions, used by coreboot and various Google devices. Like the IFD FS loader, this allows each defined flash region to be imported, and it can be used with other FS loaders (e.g. the COREBOOT region can be parsed with the CBFS loader).

coreboot File System (CBFS)

coreboot uses a simple file system to store independent binaries and data files. The CBFS loader can be used to import each CBFS file for analysis; for example, PCI option ROMs stored as CBFS files can be imported. Optional CBFS file compression (LZ4/LZMA) is supported.

UEFI firmware volume (FV)/firmware file system (FFS)

UEFI firmware images use firmware volumes for storing firmware files, which may consist of multiple sections. The UEFI FV FS loader allows UEFI firmware volumes to be imported, including embedded firmware files/sections.

This project also implements a couple of binary loaders:

Legacy x86 option ROM

PCI option ROMs that target the x86 legacy BIOS contain a raw 16-bit executable image. They also have additional header fields, including a field with the entry point instruction. The binary loader resolves the entry point and specifies that 16-bit x86 disassembly should be used.

UEFI Terse Executable (TE)

UEFI binaries can use one of two executable formats: the Portable Executable (PE32) format (also used on Windows), and the Terse Executable (TE) format. Terse Executables are essentially simplified PE32 binaries – the numerous DOS/NT/optional headers are condensed into a single TE header, without any superfluous header fields. The binary loader resolves the entry point and defines memory blocks corresponding to the sections defined in the TE header.

Finally, a helper script for assisting with the analysis of UEFI binaries is
included. The UEFI helper script does the following:

  • Imports a UEFI data type library
  • Defines the entry point signature
  • Searches for known EFI GUIDs in the .data/.text segments
  • Attempts to locate global EFI table pointers (gST/gBS/gRT)
  • Attempts to perform propagation of some EFI types to called functions

Project usage

Instructions for how to build and use the Ghidra plugin are included in the project’s README, but I’ll restate them here.

Building the plugin

Like other Ghidra plugins (and Ghidra itself), this project uses Gradle as the build system. Set the GHIDRA_INSTALL_DIR environment variable (point it to your Ghidra installation directory) and run gradle to build the plugin. Install the generated ZIP (in the dist directory) by selecting
File > Install Extensions in Ghidra, and then clicking the green plus icon.

Using the FS loaders

Load the specified input file into Ghidra (drag and drop or use File > Import File). Assuming the input file is supported by a FS loader, Ghidra should indicate that a container file was detected, and will allow you to batch import all enclosed files or view the file system.

Note that Ghidra does support parsing nested filesystems with multiple FS loaders. For example, UEFI firmware volumes in the BIOS region of an Intel firmware image can be parsed by first importing the Intel firmware image and then importing the BIOS region (select Import or Open File System in the right-click menu).

Using the UEFI helper script

After loading a UEFI executable (PE32 or TE), you can run the UEFI Helper script from the Script Manager window (under Window). Select UEFIHelper.java and click the green “Run Script” button.

Currently, the UEFI helper script assumes the entry point matches the standard driver/application signature (with EFI_HANDLE and EFI_SYSTEM_TABLE * parameters). SEC/PEI/SMM modules have different entry point parameters, which will have to be manually specified.

Future work

While my work for GSoC 2019 is complete, I think the following additions would be useful for this project (and UEFI reverse-engineering in general):

Processor module for disassembling EFI Byte Code (EBC)

EFI Byte Code is a byte code format used for platform-independent UEFI applications/drivers. Ghidra currently doesn’t support the EBC virtual machine architecture. Fortunately, it is possible to add support for an architecture by creating a SLEIGH processor specification.

Upstreamed Terse Executable loader

As previously described, TE binaries are very similar to PE binaries. Ghidra already has parsers for the data directory and section header structures, which are present in both PE and TE binaries. My TE loader had to reimplement these parsers, as the existing parsers depended on the NT header, which isn’t present in TE binaries. Removing the NT header dependency from the data directory/section header parsers would allow Ghidra’s existing parsers to be reused by the TE loader. This would also make it easier to upstream the TE loader.

Support for SEC/PEI/SMM modules (UEFI helper script)

Instead of assuming the entry point parameters, the script could prompt the user to select the module type, or somehow retrieve the module type from the FFS header (if the FS loader was used).

Additional GUID heuristics (UEFI helper script)

The script could locate calls to EFI_BOOT_SERVICES/EFI_RUNTIME_SERVICES functions with GUID parameters and automatically apply the EFI_GUID data type.

Protocol database (UEFI helper script)

Similar to the existing GUID->name database (imported from UEFITool), a database for mapping protocol definitions to the structure name could be created. The script could use this database to automatically apply the correct protocol structure type in calls to LocateProtocol/etc.

Very basic dependency graph (inspired by this UEFITool issue) (UEFI helper script)

The script could locate all calls to protocol consumption/production functions in EFI_BOOT_SERVICES (such as LocateProtocol, InstallProtocol, etc) and use this to generate a basic overview of the protocols used by the current UEFI binary.

Acknowledgements

I would like to thank my mentors Martin Roth and Raul Rangel for their continued assistance during the past 12 weeks. This has been a great opportunity, and it certainly wouldn’t have been possible without their help. I look forward to contributing to coreboot and other related projects (including Ghidra) in the future.

[GSoC] Ghidra firmware utilities, week 10

As stated in last week’s blogpost, I have started working on the UEFI helper script (aptly named UEFIHelper). The aim of this script is to assist with reverse engineering UEFI binaries. Similar projects exist for IDA Pro, including ida-efiutils, ida-efitools, and EFISwissKnife.

Background information

UEFI executables are either PE32(+) or TE binaries. The signature of the entry point function depends on the module type; some examples are PEI modules, DXE drivers, and UEFI applications. DXE drivers and standard UEFI applications use the following entry point function:

EFI_STATUS
_ModuleEntryPoint (
  IN EFI_HANDLE        ImageHandle,
  IN EFI_SYSTEM_TABLE  *SystemTable
  );

Other types of modules (such as PEI modules and PEI/DXE core modules) may have different parameters and return types for the entry point function. Nevertheless, we’ll focus on the standard entry point for now. ImageHandle is a firmware-allocated handle for the current EFI application. SystemTable is a pointer to the EFI_SYSTEM_TABLE structure, which in turn has pointers to other EFI tables (such as EFI_BOOT_SERVICES and EFI_RUNTIME_SERVICES). These tables provide data structures and function pointers for standard UEFI functionality, such as getting/setting NVRAM variables, locating/installing UEFI protocols, loading additional UEFI images, rebooting the system, etc.

UEFI’s extensibility is largely implemented through the use of protocols. Protocols are data structures used to enable communication between different UEFI modules, and can be identified by a GUID. A simplified example could be a a UEFI driver for a graphics card. It could support pre-boot graphics output by installing an implementation of the EFI_GRAPHICS_OUTPUT_PROTOCOL, which could then be located and used by other UEFI applications and drivers for graphics output.

UEFIHelper progress

MdePkg in EDK2 includes headers for core UEFI types and protocols. Given a parser configuration file, the C parser in Ghidra can be used to generate data type archives, which are used for storing type definitions in Ghidra. I used the MdePkg headers to generate UEFI data type archives for x86, x86_64, ARMv7, and ARMv8 (AArch64). UEFIHelper will automatically load the correct data type library for the current program’s architecture.

UEFIHelper will search for known GUIDs in the .data segment of the current UEFI program and apply the EFI_GUID type definition. UEFIHelper will also fix the entry point function signature to match the standard entry point for UEFI DXE drivers and applications.

UEFIHelper is a part of the ghidra-firmware-utils extension, and is available on GitHub as usual. I will continue to work on UEFIHelper during the next week.

[GSoC] Ghidra firmware utilities, week 9

Last week, I finished up my work on the UEFI firmware volume FS loader. This was the last FS loader I planned on writing for this project, so now it’s time to work on writing additional binary loaders and helper scripts to assist with UEFI reverse engineering. During the past couple of days, I’ve been working on a loader for Terse Executable (TE) binaries.

For the most part, UEFI binaries are standard PE32(+) executables. Standard headers such as the DOS stub, COFF header, and image headers are present. In order to reduce the size of binaries required for UEFI Platform Initialization, the TE binary format was created. The TE header only includes the fields needed for execution, dropping unnecessary fields such as the DOS stub. TE binaries are otherwise similar to PE32 binaries. EDK2 has additional documentation regarding the TE header.

Like the existing PE32 binary loader, the TE binary loader defines the program sections and defines the entry point function. It can be used in conjunction with the UEFI firmware volume FS loader to import TE image sections for analysis.

The TE binary loader is included in the latest commit in ghidra-firmware-utils. As always, feel free to submit an issue report if you encounter any problems with it.

Plans for this week

I have started working on the UEFI helper script. This script aims to assist with UEFI reverse engineering by loading UEFI type definitions, defining GUIDs, and fixing the entry point.

[GSoC] Ghidra firmware utilities, weeks 6-8

Hello everyone. It’s been a few weeks since I’ve written my last blog post, and during that time I’ve been working on the FS loader for UEFI firmware images. This FS loader aims to implement functionality similar to UEFITool in Ghidra.

As described in the previous blog post, Intel platforms divide the flash chip into several regions, including the BIOS region. On UEFI systems, the BIOS region is used to store UEFI firmware components, which are organized in a hierarchy. This hierarchy begins with UEFI firmware volumes, which consist of FFS (firmware file system) files. In turn, these FFS files can contain multiple sections. Firmware volumes can also be nested within FFS files. This helpful reference by Trammell Hudson as well as this presentation from OpenSecurityTraining have some additional information regarding UEFI firmware volumes.

For example, a UEFI firmware implementation could have a firmware volume specifically for the Driver eXecution Environment (DXE phase). Stored as FFS files, DXE drivers within the firmware volume could consist of a PE32 section to store the actual driver binary, as well as a UI section to store the name of the driver.

So far, I’ve implemented basic firmware volume parsing in the FS loader; I’ve pushed this to the GitHub repository. Currently, this doesn’t handle FFS file or section parsing.

FFS file and section parsing is still a work-in-progress, but here’s a preview:

This is mostly complete, but there are still some nasty bugs related to FFS alignment that I’m working on fixing. My focus for this week is to finish up this FS loader.

Update (2019-07-19)

I have committed support for UEFI FFS file/section parsing in the GitHub repo. Please open an issue report if you encounter any issues with it (such as missing files/sections that UEFITool or other tools parse without issues).

[GSoC] Ghidra firmware utilities, week 5

Hi everyone. As stated in my previous blogpost, I have been working on a FS loader for Intel Flash Descriptor (IFD) images. The IFD is used on Intel x86 platforms to define various regions in the SPI flash. These may include the Intel ME firmware region, BIOS region, Gigabit ethernet firmware region, etc. The IFD also defines read/write permissions for each flash region, and it may also contain various configurable chipset parameters (PCH straps). Additional information about the firmware descriptor can be found in this helpful post by plutomaniac on the Win-Raid forum, as well as these slides from Open Security Training.

For a filesystem loader, the flash regions are exposed as files. FLMAP0 in the descriptor map and the component/region sections are parsed to determine the base and limit addresses for each region; both IFD v1/v2 (since Skylake) are supported. Ghidra supports nested filesystem loaders, so the FMAP and CBFS loaders that I’ve previously written can be used for parsing the BIOS region.

If you encounter any issues with the IFD FS loader, please feel free to submit an issue report in the GitHub repository.

Plans for this week

I have started working on a filesystem loader for UEFI firmware volumes. In conjunction with the IFD loader, this will allow UEFI firmware images to be imported for analysis in Ghidra (behaving somewhat similar to the excellent UEFITool).

[GSoC] Ghidra firmware utilities, week 4

During the previous week, I worked on additional filesystem loaders to support parsing Flash Map (FMAP) images and the coreboot file system (CBFS). As of this week, these FS loaders are mostly complete, and can be used to import raw binaries within compiled coreboot ROMs. Support for CBFS file compression (with either LZMA or LZ4) is also implemented; compressed files will be automatically extracted. Here are some screenshots of the new FS loaders:

While these might not be the most useful FS loaders (as FMAP and CBFS are mainly used by coreboot itself), I gained additional familiarity with Ghidra’s plugin APIs for FS loaders. This will be useful, as I will be writing additional FS loaders for this project.

Plans for this week

I’ll continue to make minor changes to the existing FS loaders (various cleanups/etc). I’ll also start to write a FS loader for parsing ROMs with an Intel firmware descriptor (IFD), which shouldn’t be too complicated. After this is completed, I plan on writing a FS loader for UEFI firmware volumes (ideally similar to UEFITool or uefi-firmware-parser). I anticipate that this loader will be more complex, so I’ve reserved additional time to ensure its completion.

[GSoC] Ghidra firmware utilities, week 3

Last week, I finalized my work on the PCI option ROM loader, which was the first part described in my initial proposal for this project. This consists of a filesystem loader for hybrid/UEFI option ROMs and a binary loader for x86 option ROMs.

Background information on PCI option ROMs

Option ROMs may contain more than one executable image; for example, a graphics card may have a legacy x86 option ROM for VGA BIOS support as well as a UEFI option ROM to support the UEFI Graphics Output Protocol. x86 option ROMs are raw 16-bit binaries. The entry point is stored as a short JMP instruction in the option ROM header; the BIOS will execute this instruction to jump to the entry point. In contrast, UEFI images contain an UEFI driver, which is a PE32+ binary. This binary can be (and frequently is) compressed with the EFI compression algorithm, which is a combination of Huffman encoding and the LZ77 algorithm.

Filesystem loader

The filesystem loader allows hybrid/UEFI option ROMs to be imported. It also transparently handles the extraction of compressed UEFI executables.

Initially, I attempted to write a Java implementation of the EFI Compression Algorithm for use in the FS loader, but ran into several issues when handling the decompression of certain blocks. I eventually decided to reuse the existing C decompression implementation in EDK2, and wrote a Java Native Interface (JNI) wrapper to call the functions in the C library.

With the FS loader, UEFI drivers in option ROMs can be imported for analysis with Ghidra’s native PE32+ loader.

x86 option ROM binary loader

This loader allows x86 option ROMs to be imported for analysis. Various PCI structures are automatically defined, and the entry function is resolved by decoding the JMP instruction in the option ROM header.

PCI option ROM header data type
PCI data structure data type
Disassembled entry point

Plans for this week

I’ve started to work on filesystem loader for FMAP/CBFS (used by coreboot firmware images). After that, I plan on working on additional FS loaders for Intel flash images (IFD parsing) and UEFI firmware volumes.

As usual, the source code is available in my GitHub repository. Installation and usage instructions are included in the README; feel free to open an issue report if anything goes awry.

[GSoC] Ghidra firmware utilities, weeks 1-2

Hi everyone. I’m Alex James (theracermaster on IRC) and I’m working on developing modules for Ghidra to assist with firmware reverse engineering as a part of GSoC 2019. Martin Roth and Raul Rangel are my mentors for this project; I would like to thank them for their support thus far.

Ghidra is an open-source software reverse engineering suite developed by the NSA, offering similar functionality to existing tools such as IDA Pro. My GSoC project aims to augment its functionality for firmware RE. This project will consist of three parts: a loader for PCI option ROMs, a loader for firmware images, and various scripts to assist with UEFI binary reverse engineering (importing common types, GUIDs, etc).

The source code for this project is available here.

Week 1

During my first week, I started implementing the filesystem loader for PCI option ROMs. This allows option ROMs (and their enclosed images) to be loaded into Ghidra for analysis. So far, option ROMs containing uncompressed UEFI binaries can be successfully loaded as PE32+ executables in Ghidra. The loader also calculates the entry point address for legacy x86 option ROMs.

Plans for this week

So far this week, I’ve worked on writing a simple JNI wrapper for the reference C implementation of the EFI decompressor from EDK2, and have used this to add support for compressed EFI images to the option ROM FS loader. Additionally, I plan on making further improvements to the option ROM loader for legacy option ROMs; while the entry point address is properly calculated, they still have to be manually imported as a raw binary.

GSoC 2010, TianoCore as a payload :(

It’s been an interesting summer.  It didn’t at all turn out how I expected, but it is what it is.  TianoCore as a software project turned out to be massively more complex than I anticipated when I submitted my proposal, and the level of knowledge required was quite a bit deeper than I expected… it’s one of those cases where I didn’t know what I didn’t know.  I’ll have to talk to a couple of my professors about that, to see if there’s some elective class that explains the things I’ve missed.

Sorry, that’s vague, let me give an example.  Coreboot does it’s thing, hardware initialization, then passes control to the payload.  This seems to be the equivalent of the dreaded “goto”, which is actually pretty cool.  Coreboot doesn’t care what happens next.  So hypothetically, I have some code, anything, I want to use it as a payload.  I compile it, then what?  Well, it depends (as you all know), how was it compiled?  Is it an elf?  PE32?  Something else?  Where exactly is the entry point to this binary blob?  (That’s a rhetorical question, please don’t answer it in the comments.)  You would have thought at some point in one of my classes executable formats would have come up, just as an example.  Or calling conventions.  Or hundreds of other little things that I’d never seen or heard of before I suddenly realized that I needed to understand them.  So that’s what I ended up spending much of the summer on.  Write code… stop and realize what I’m doing doesn’t make sense/won’t work/is the wrong approach, then start over.

One of the things that drew me to coreboot as a project was that as a computer engineering student, I took a lot of classes focusing on the physical side of computing, starting with physics and circuits classes, moving up through logic gates to chip design.  On the other side, programming started at a pretty high level with c++, then worked down, till I got to the computer architecture and operating system classes, and assembly language (not x86, unfortunately).   I would expect that as a “computer engineer” I should understand the whole stack, that the physical, EE stuff and the CS stuff would meet in the middle.  But they haven’t (and they won’t: I’m about to graduate, and there aren’t any crucial classes left to take).  I knew this going into GSoC, and coreboot seemed like the perfect project to fill the gap (and give something back to the open source world that I’ve gotten so much out of).  Well, like I said, the gap turned out to be a lot bigger than I expected.  (To abuse the metaphor a little more, anyone remember when Evel Knievel tried to jump the Snake River Canyon?  That’s kind of how I feel about my summer of code.) Continue reading GSoC 2010, TianoCore as a payload 🙁