[GSoC] Coreboot Coverity, Final Update

It is now the final week of GSoC, and it is time for me to write my final blog post. Over the past summer I have worked on fixing the Coverity scan issues in coreboot, with the goal of making the code base “Coverity clean”. This has involved writing a substantial number of patches, the vast majority of which are in coreboot, with a sprinkling in a few other projects:

  • 146 patches in coreboot
  • 6 patches in flashrom
  • 6 patches in vboot
  • 3 patches in Chromium EC
  • 4 patches in OpenSBI
  • 2 patches in em100
  • 1 in the Linux kernel

At the time of writing, a few of my patches are still under review on Gerrit, so it is possible (and hopeful!) that this list will increase over the next few weeks.

In total, these patches resolved 172 Coverity issue reports of actual bugs. However, Coverity also isn’t always right, and some issues weren’t actually problems that required patches. These issues, 91 in total, were either false positives or intentional and were ignored. At the moment, there are currently 223 remaining reports in the issue tracker for coreboot. Despite being a substantial number, this is almost entirely composed of issues from third-party projects (such as OpenSBI or vboot, which probably shouldn’t be counted in the coreboot tracker anyway), and the AMD vendorcode. The original plan at the beginning of the summer was to work on the AMD vendorcode; however, after discussion with my mentors we decided to skip it, since with the upcoming deprecations for coreboot 4.11 it might not be around much longer. Aside from this, there are roughly 20 remaining issues, which mostly required refactoring or technical knowledge that I don’t have.

With the summary out of the way, I’d like to give everyone a sample of the sort of bugs I’ve worked on during the project, and hopefully give advice for avoiding them in the future. Here is a list of the most common, nasty, or subtle types of bugs I’ve found over the summer.

Missing Break Statements

In switch statements in C, every case statement implicitly falls through to the next one. However, this is almost never the desired behavior, and so to avoid this every case needs to be manually terminated by a break to prevent the fall-through. This unfortunately is very tedious to do and is often accidentally left out. For a prototypical example, let’s look at CB:32180 from the AGESA vendorcode.

switch (AccessWidth) {
        case AccessS3SaveWidth8:
                RegValue = *(UINT8 *) Value;
                break;
        case AccessS3SaveWidth16:
                RegValue = *(UINT16 *) Value;
                break;
        case AccessS3SaveWidth32:
                RegValue = *(UINT32 *) Value;
        default:
                ASSERT (FALSE);
}

In this switch there is a missing break after the AccessS3SaveWidth32 case, which will then fall-through to the false assertion. Clearly not intentional! Other examples of this, though not as severe, can be found in CB:32088 and CB:34293. Fortunately, these errors today can be prevented by the compiler. GCC recently added the -Wimplicit-fallthrough option, which will warn on all implicit fall throughs and alert to a potentially missing break. However, some fall throughs are intentional, and these can be annotated by a /* fall through */ comment to silence the warning. Since CB:34297 and CB:34300 this warning has been enabled in coreboot, so this should be the last we see of missing break statements.

Off-by-One Errors

There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.

Anonymous

Everyone has been bitten by off-by-one errors. Let’s take a look at CB:32125 from the Baytrail graphics code.

static void gfx_lock_pcbase(struct device *dev)
{
        const u16 gms_size_map[17] = { 0, 32, 64, 96, 128, 160, 192, 224, 256,
                                       288, 320, 352, 384, 416, 448, 480, 512 };
        ...
        u32 gms, gmsize, pcbase;
        gms = pci_read_config32(dev, GGC) & GGC_GSM_SIZE_MASK;
        gms >>= 3;
        if (gms > ARRAY_SIZE(gms_size_map))
                return;
        gmsize = gms_size_map[gms];
        ...
}

Here we have an array gms_size_map of 17 elements, and a bounds check on the gms variable before it is used to index into the array. However, there’s a problem. The bounds check misses the case when gms == ARRAY_SIZE(gms_size_map) == 17, which is one past 16 – the index of the last array element. The fix is to use >= in the check instead of >. This exact error when performing a bounds check is very common: see at least CB:32244, CB:34498, and CL:1752766 for other examples.

Another nasty place where off-by-one errors strike is with strings – in particular, when making sure they are null terminated. Here is CB:34374 from the ACPI setup of the Getac P470.

static long acpi_create_ecdt(acpi_ecdt_t * ecdt)
{
        ...
        static const char ec_id[] = "\_SB.PCI0.LPCB.EC0";
        ...
        strncpy((char *)ecdt->ec_id, ec_id, strlen(ec_id));
        ...
}

The problem is that strncpy() will only copy at most strlen(ec_id) characters, which excludes the null character. The author might have been thinking of the similar strlcpy(), which does explicitly null terminate the string buffer even if it never reaches a null character. In this case none of the string-copying functions are needed, since ec_id is a string buffer and so can be copied using a simple memcpy().

Boolean vs Bitwise Operators

In C, all integers are implicitly convertible to boolean values and can be used with all boolean operators. While somewhat convenient, this also makes it very easy to mistakenly use a boolean operator when a bitwise one was intended. Let’s take a look at CB:33454 from the CIMX southbridge code.

void sb_poweron_init(void)
{
        u8 data;
        ...
        data = inb(0xCD7);
        data &= !BIT0;
        if (!CONFIG(PCIB_ENABLE)) {
                data |= BIT0;
        }
        outb(data, 0xCD7);
        ...
}

Here BIT0 is the constant 0x1, so !BIT0 expands to 0, with the net effect of data being completely cleared, regardless of the previous value from inb(). The intended operator to use was the bitwise negation ~, which would only clear the lowest bit. For more examples of this sort of bug, see CB:34560 and OpenSBI 3f738f5.

Implicit Integer Conversions

C allows implicit conversions between all integer types, which opens the door for many accidental or unintentional bugs. For an extremely subtle example of this, let’s take a look at OpenSBI 5e4021a.

void *sbi_memset(void *s, int c, size_t count);

void sbi_fifo_init(struct sbi_fifo *fifo, void *queue_mem,
                   u16 entries, u16 entry_size)
{
        ...
        sbi_memset(fifo->queue, 0, entries * entry_size);
}

Do you see the problem? The issue is that entries and entry_size are both 16-bit integers, and by the rules of C are implicitly converted to int before the multiplication. An int cannot hold all possible values of a u16 * u16, and so if the multiplication overflows the intermediate result could be a negative number. On 64-bit platforms size_t will be a u64, and the negative result will then be sign-extended to a massive integer. As the last argument to sbi_memset(), this could lead to a very large out-of-bounds write. The solution is to cast one of the variables to a size_t before the multiplication, which is wide enough to prevent the implicit promotion to int. For other examples of this problem, see CB:33986 and CB:34529.

Another situation where implicit conversions strike is in error handling. Here is CB:33962 in the x86 ACPI code.

static ssize_t acpi_device_path_fill(const struct device *dev, char *buf,
                                     size_t buf_len, size_t cur);

const char *acpi_device_path_join(const struct device *dev, const char *name)
{
        static char buf[DEVICE_PATH_MAX] = {};
        size_t len;
        if (!dev)
                return NULL;

        /* Build the path of this device */
        len = acpi_device_path_fill(dev, buf, sizeof(buf), 0);
        if (len <= 0)
                return NULL;
        ...
}

With the function prototype right there, the problem is obvious: acpi_device_path_fill() returns negative values in a ssize_t to indicate errors, but len is a size_t, so all those negative error values are converted to extremely large integers, thus passing the subsequent error check. During code review this may not at all be obvious though.

Both these errors could be prevented using the -Wconversion compiler option, which will warn about all implicit integer conversions. However, there are an incredible number of such conversions in coreboot, and it would be a mammoth task to fix them all.

Null Pointers

Null pointers need no introduction – they are well known to cause all sorts of problems. For a simple example, let’s take a look at CB:33134 from the HiFive Unleashed mainboard.

static void fixup_fdt(void *unused)
{
        void *fdt_rom;
        struct device_tree *tree;
        
        /* load flat dt from cbfs */
        fdt_rom = cbfs_boot_map_with_leak("fallback/DTB", CBFS_TYPE_RAW, NULL);

        /* Expand DT into a tree */
        tree = fdt_unflatten(fdt_rom);
        ...
}

This code attempts to load a device tree from a location in the CBFS. However, cbfs_boot_map_with_leak() will return a null pointer if the object in the CBFS can’t be found, which will then be dereferenced in the call to fdt_unflatten(). On most systems dereferencing a null pointer will lead to a segfault, since the operating system has set up permissions that prevent accessing the memory at address 0. However, coreboot runs before the operating systems has even started, so there are no memory permissions at all! If fdt_rom is a null pointer, fdt_unflatten() will attempt to expand the device tree from whatever memory is at address 0, leading to who knows what problems. A simple null check will avoid this, but requires the programmer to always remember to put them in.

Another common issues with null pointers is that even if you do a check, it might not actually matter if the pointer has already been dereferenced. For example, here is a problem with the EDID parser in CB:32055.

int decode_edid(unsigned char *edid, int size, struct edid *out)
{
        ...
        dump_breakdown(edid);

        memset(out, 0, sizeof(*out));

        if (!edid || memcmp(edid, "\x00\xFF\xFF\xFF\xFF\xFF\xFF\x00", 8)) {
                printk(BIOS_SPEW, "No header found\n");
                return EDID_ABSENT;
        }
        ...
}

In this case the EDID is dumped doing the null pointer check, but at worst there should only be a wonky dump if edid is null, right? Not necessarily. Since dereferencing a null pointer is undefined behavior, the compiler is allowed to assume that no null pointer dereferences occur in the program. In this case, dereferencing the edid pointer in dump_breakdown() is an implicit assertion that edid is not null, so an over-zealous compiler could remove the following null check! This optimization can be disabled using -fno-delete-null-pointer-checks (which is done in coreboot), but does not prevent any problems that could have happened in the null dereference before the check took place. See this article in LWN for details on how a vulnerability from this problem was dealt with in the Linux kernel.

Conclusion

C has always had the mantra of “trust the programmer”, which makes mistakes and errors very easy to do. Some of these errors can be prevented at compile time using compiler warnings, but many cannot. Coverity and other static analyzers like it are very useful and powerful tools for catching bugs that slip past the compiler and through peer review. However, it is no silver bullet. All of these errors were present in production code that were only caught after the fact, and there are certainly bugs of this sort left that Coverity hasn’t found. What do we do about them, and how can we ever be sure that we’ve caught them all? Today, there are new languages designed from the beginning to enable safe and correct programming. For example, libgfxinit is written in SPARK, a subset of Ada that can be formally verified at compile time to avoid essentially all of the above errors. There is also the new oreboot project written in Rust, which has similar compile time guarantees due to its extensive type system. I hope to see these languages and others increasingly used in the future so that at some point, this job will have become obsolete. 🙂

[GSoC] How to Use ARM Trusted Firmware in Coreboot

Hello, I’m Asami. In this article, I’m going to talk about how to use ARM Trusted Firmware in coreboot for ARMv8 (AArch64). ARM Trusted Firmware provides a reference implementation of secure world software for Armv8-A and Armv8-M. You can see the code via https://github.com/ARM-software/arm-trusted-firmware.

Trusted Firmware has 5 steps which are called as BL1, BL2, BL3-1, BL3-2, and BL3-3. BL1 step is similar to the bootblock stage and BL2 is similar to the romstage of coreboot. In the coreboot project, we only use the BL3-1 part that is expected to work on EL3 exception level. The code of BL3-1 will execute just after the ramstage and before the payload when we enable the Trusted Firmware.

How to enable Trusted Firmware

It’s very easy to enable Trusted Firmware on coreboot. You just need to ‘select ARM64_USE_ARM_TRUSTED_FIRMWARE’ in your Kconfig. If you want to run coreboot on QEMU/AArch64, you need to add the ‘select ARM64_USE_ARM_TRUSTED_FIRMWARE’ at src/mainboard/emulation/qemu-aarch64/Kconfig. The next step switches depending on the configuration at src/arch/arm64/boot.c. Here is the code to switch the next step:

// src/arch/arm64/boot.c
static void run_payload(struct prog *prog)
{
	void (*doit)(void *);
	void *arg;

	doit = prog_entry(prog);
	arg = prog_entry_arg(prog);
	u64 payload_spsr = get_eret_el(EL2, SPSR_USE_L);

	if (CONFIG(ARM64_USE_ARM_TRUSTED_FIRMWARE))
		arm_tf_run_bl31((u64)doit, (u64)arg, payload_spsr);
	else
		transition_to_el2(doit, arg, payload_spsr);
}

Why using Trusted Firmware

Coreboot for ARMv8 has 2 options to pass an execution from it to a payload. The first is passing execution to a payload directly and the second one is passing to the BL3-1 code before a payload. You always don’t have to use Trusted Firmware. However, you need to enable Trusted Firmware if you want to run Linux because it expects to work with PSCI. PSCI is an abbreviation of Power State Coordination Interface which is a standard interface for power management that can be used by OS vendors for supervisory software working at different levels of privilege on an ARM device. Coreboot doesn’t have the setup for PSCI but Trusted Firmware does.

Current Status

Unfortunately, QEMU/AArch64 in coreboot doesn’t support Trusted Firmware yet. It means we can’t run Linux with QEMU for ARMv8. I’m now trying to support Trusted Firmware for QEMU/AArch64.

[GSoC] Coreboot Coverity, weeks 11-12

Hello again! For the past two weeks I have been working on Coverity issues in various third party repositories, notably flashrom and vboot. The majority of issues in both repositories are now fixed, with the remaining ones mostly being memory leaks. Also, support for OpenSBI (a RISC-V supervisor binary interface) was recently added to coreboot, and Coverity picked up several issues in that. With one more week left in the summer, the plan now is to tidy up the last few remaining issues in vboot, and shepherd my still in-progress patches through review. As usual, you can see the current status of my patches on Gerrit.

Following up from my post last week about neutralizing the ME, this week I decided to try replacing SeaBIOS with GRUB. While originally designed as a bootloader, GRUB can be compiled as a coreboot payload, which removes the need of having a BIOS or UEFI interface at all. Not only does this mean having a legacy-free boot process, but GRUB also has an incredible amount of flexibility over the previous systems. For example, one problem with current full disk encryption schemes is that the bootloader must always remain unencrypted, since factory BIOS/UEFI implementations are unable of unlocking encrypted disks. By embedding GRUB in the coreboot image we can move it off the hard drive and avoid this problem entirely.

With this in mind, I decided to try installing GRUB on a system with a single LUKS 1 partition formatted with Btrfs. (GRUB currently doesn’t support LUKS 2, which is what many distributions use by default.) This setup will require two configuration files: one embedded in the coreboot image with instructions on how to unlock the partition, and one on the hard drive for booting the kernel. The first configuration file has to be hand-written, while the second can be generated using grub-mkconfig -o /boot/grub/grub.cfg. The advantage of this two-stage system is that the first configuration file only has to be flashed once, and then all subsequent changes (e.g. adding kernel parameters) can be done with the system file. After tinkering around with the official GRUB documentation, the coreboot wiki, and this blog post, I pieced together the following bare-bones config file for the payload:

# Tell GRUB where to find its modules
set prefix=(memdisk)/boot/grub

# Keep the same screen resolution when switching to Linux
set gfxpayload=keep

# I'm not exactly sure what these do, but they look important
terminal_output --append cbmemc
terminal_output --append gfxterm

# Default to first option, automatically boot after 3 seconds
set default="0"
set timeout=3

# Enable the pager when viewing large files
set pager=1

menuentry 'Load Operating System' {

        # Load the LUKS and Btrfs modules
        insmod luks
        insmod btrfs

        # Unlock all crypto devices (should only be one)
        cryptomount -a

        # Load the system GRUB file from the first crypto device
        set root=(crypto0)
        configfile /boot/grub/grub.cfg
}

Using menuconfig, we can now configure coreboot to use GRUB as a payload.

Payload ---> Add a Payload ---> GRUB2
        ---> GRUB2 version ---> 2.04
        ---> Extra modules to include in GRUB image
             (luks btrfs gcry_sha256 gcry_rijndael all_video cat)
             [*] Include GRUB2 runtime config file into ROM image
                 (grub.cfg)

We need to add a few modules to the GRUB image to enable all the features we want: luks and btrfs are self-explanatory, gcry_sha256 and gry_rijndael add support for SHA256 and AES (the default hash and cipher for LUKS), all_video adds support for more video and graphics drivers, and cat is useful for printing config files during debugging. We also set here the path to the above grub.cfg file, which by default is in the root of the coreboot tree.

After flashing the new coreboot image, you will be greeted by two GRUB menus after booting: the above one for unlocking the partition, and then the system one for booting the kernel.

The above GRUB config file is extremely basic, and although all advanced functionality can be done manually by dropping to the GRUB command line, this is very inconvenient. Let’s add two menu entries to the config file for shutting down and rebooting the system. Fortunately, this can be done without rebuilding the ROM from scratch. By default, the GRUB config file is stored as etc/grub.cfg in the CBFS (coreboot file system), which is used to store objects in the coreboot image. We can print the layout of the CBFS using the cbfstool utility:

$ cbfstool coreboot.rom print
FMAP REGION: COREBOOT
Name                           Offset     Type           Size   Comp
cbfs master header             0x0        cbfs header        32 none
fallback/romstage              0x80       stage           58068 none
cpu_microcode_blob.bin         0xe3c0     microcode      122880 none
fallback/ramstage              0x2c440    stage          105072 none
config                         0x45f00    raw               488 none
revision                       0x46140    raw               674 none
cmos.default                   0x46440    cmos_default      256 none
vbt.bin                        0x46580    raw              1412 LZMA (3863 decompressed)
cmos_layout.bin                0x46b40    cmos_layout      1808 none
fallback/postcar               0x472c0    stage           18540 none
fallback/dsdt.aml              0x4bb80    raw             15257 none
fallback/payload               0x4f780    simple elf     470937 none
etc/grub.cfg                   0xc2780    raw               670 none
(empty)                        0xc2a80    null          7523608 none
bootblock                      0x7ef7c0   bootblock        1512 none

This shows all the objects located in the image, including the stages (bootblock, romstage, ramstage, and payload) and various other configuration files and binaries. Next, we extract the existing config file from this image:

$ cbfstool coreboot.rom extract -n etc/grub.cfg -f grub.cfg

Now add the following two entries to the end of the extracted file:

menuentry 'Poweroff' {
        halt
}

menuentry 'Reboot' {
        reboot
}

Now we delete the existing configuration file, and then add our new one back in:

$ cbfstool coreboot.rom remove -n etc/grub.cfg
$ cbfstool coreboot.rom add -n etc/grub.cfg -f grub.cfg -t raw

That’s it! Flash back the new ROM, and you’re good to go.

Even with these changes, we still have a very basic configuration file. For more advanced setups (such as booting from a live USB, LVM support, signing kernels with GPG, etc.) I recommend looking at the comprehensive Libreboot GRUB file and its hardening guide.

[GSoC] Coreboot Coverity, weeks 8-10

Hello everyone! Coverity has come back online again after its long upgrade, and with it a pile of new scan issues (over 100). I put aside a week to fix all the new issues in parts of the code base I had already worked through, and then spent the next two weeks finishing soc and the Cavium vendorcode. Referring to the overview page, there are now 300 scan issues left in the code base. While a considerable number, this consists almost entirely of patches still going through review, vboot, and the AMD vendorcode — everything outside of this is done. At this point, the plan at the beginning of the summer was to continue working on the AMD vendorcode, but after discussing with my mentors we have decided to do something else. The AMD vendorcode is extremely dense, and with the upcoming deprecations in 4.11 it may not be around for much longer. The plan now is to work on vboot and the recently resurrected scan page for flashrom.

Anyway, enough with the boring schedule stuff. I finally got around this week to wiping the ME on my T500, and like last time it was a lot more involved than I expected. Onwards!

The Intel Management Engine (ME) is an autonomous coprocessor integrated into all Intel chipsets since 2006. Providing the basis for Intel features such as AMT, Boot Guard, and Protected Audio Video Path (used for DRM), the ME has complete access to the system memory and networking. The ME runs continuously as long as the system has power, even if the computer is asleep or turned off, and on systems with AMT can even be accessed remotely. Naturally, this is an incredible security risk. On modern systems it is impossible to disable the ME completely, since any attempt to do so will start a watchdog timer that will shutdown the computer after 30 minutes. However, in older chipsets (such as the one in the T500), no such timer exists, allowing attempts to disable it entirely.

The ME firmware is located in the SPI flash chip, which is divided into multiple regions. Using the coreboot utility ifdtool, we can analyze a dumped ROM to check the regions in the layout along with their offsets within the flash image.

$ ifdtool -f layout.txt factory.rom
File factory.rom is 8388608 bytes
Wrote layout to layout.txt
$ cat layout.txt
00000000:00000fff fd
00600000:007fffff bios
00001000:005f5fff me
005f6000:005f7fff gbe
005f8000:005fffff pd

These regions are as follows:

  • Intel Flash Descriptor (fd) – Describes the layout of the rest of the flash chip, as well as various chipset configuration options. This blog post by Alex James has further information about the IFD.
  • BIOS (bios) – Contains the factory BIOS. Normally this is the only part of the flash chip that you overwrite when installing coreboot.
  • Management Engine (me) – Firmware for the ME coprocessor, including a kernel (ThreadX on older models, MINIX 3 on newer ones), as well as a variety of other modules for network access and remote management.
  • Gigabit Ethernet (gbe) – Configuration for the Gigabit Ethernet controller, including the system’s MAC address.
  • Platform Data (pd) – Other miscellaneous data for the factory BIOS, which coreboot doesn’t use.

The IFD also contains read and write permissions for the rest of the flash chip, which we can analyze using ifdtool.

$ ifdtool -d factory.rom
...
FLMSTR1:   0x1a1b0000 (Host CPU/BIOS)
   Platform Data Region Write Access: enabled
   GbE Region Write Access:           enabled
   Intel ME Region Write Access:      disabled
   Host CPU/BIOS Region Write Access: enabled
   Flash Descriptor Write Access:     disabled
   Platform Data Region Read Access:  enabled
   GbE Region Read Access:            enabled
   Intel ME Region Read Access:       disabled
   Host CPU/BIOS Region Read Access:  enabled
   Flash Descriptor Read Access:      enabled
   Requester ID:                      0x0000
...

Unfortunately, ME write access is disabled for the host CPU, which prevents us from overwriting it using an internal flash — we’re gonna have to get out the external programmer. If you recall from last time, accessing the SOIC of the T500 required a laborious process of extracting the motherboard from the case, so this time I cut a little hole in the frame to make flashing easier. As it turned out, having this easy access port was very handy.

The old way.
The new — a bit hacky, but it works.

After reading back the ROM image, I decided to use try using me_cleaner, a tool for partially deblobbing the ME firmware on newer laptops, and for mine removing it entirely. Following the instructions for an external flash, we use the -S flag to clean the firmware.

$ python me_cleaner.py -S -O coreboot-nome.rom coreboot-orig.rom
Full image detected
Found FPT header at 0x1010
Found 13 partition(s)
Found FTPR header: FTPR partition spans from 0xd2000 to 0x142000
ME/TXE firmware version 4.1.3.1038 (generation 1)
Public key match: Intel ME, firmware versions 4.x.x.x
The meDisable bit in ICHSTRP0 is NOT SET, setting it now…
The meDisable bit in MCHSTRP0 is NOT SET, setting it now…
Disabling the ME region…
Wiping the ME region…
Done! Good luck!

After wiping the image, we can use ifdtool again to see that the ME region has been completely erased.

00000000:00000fff fd
00600000:007fffff bios
005f6000:005f7fff gbe
005f8000:005fffff pd

While this is the most sure-fire way to ensure the ME has been disabled, it’s not perfect — on boot the ME coprocessor will still attempt to read from its firmware in the flash region, and finding it not there will hang in an error state. However, it has been discovered through reverse engineering that Intel has incorporated various bits into the IFD that will disable the ME soon after boot. (One of these, the High Assurance Platform “HAP” bit, was incorporated at the request of the US government, which also understands the security risks of an unfettered ME.) For my laptop, this consists of setting two bits in the ICHSTRP and MCHSTRP, which me_cleaner does by default. After enabling this, the ME should gracefully shutdown after boot, which hopefully will cause the fewest system stability issues.

Finally after all that, let’s use ifdtool again to unlock all regions of the flash chip, which should ideally negate the need of ever doing an external flash again.

$ ifdtool -u coreboot-nome.rom

Let’s write it! Unlike last time we write the entire image, not just the BIOS region.

$ flashrom -p linux_spi:dev=/dev/spidev1.0,spispeed=4096 -w coreboot-nome-unlock.rom

Now the moment of truth. After hitting the power button … nothing. Gah.

Well, not exactly nothing. The screen doesn’t turn on, but the CD drive is making noises, so it’s not completely dead. Hmmm, let’s try again, but this time only setting the disable bits without wiping the ME region. This can be done using the -s flag of me_cleaner (which incidentally has a bug that required a small patch).

$ python me_cleaner.py -s -O coreboot-nome.rom coreboot-orig.rom

Reflashing again … same problem. Nothing.

Not completely sure what to do and worried that the ME was actually needed to boot, I decided to flash a pre-compiled Libreboot image on the laptop, which has the ME disabled out of the box and should hopefully Just Work. … And it did! The computer booted gracefully into the OS with no glitches or stability issues that I could see. Perfect. So disabling the ME is technically possible, but how exactly do I do it? Asking around a bit on IRC, I was directed to the coreboot bincfg utility, which is capable of generating an ME-less IFD from scratch. Following this similar guide for the X200, I was able to piece together a complete ROM.

First, we need to generate a new IFD. The bincfg directory contains configuration files for the X200, which we can fortunately re-use since the T500 has the same controller hub.

$ bincfg ifd-x200.spec ifd-x200.set ifd.bin

This IFD comes with only three regions, and has the ME disable bits set by default.

00000000:00000fff fd
00003000:007fffff bios
00001000:00002fff gbe

There are also configuration files for generating a new GbE region, but that requires certain fiddlings with the MAC address that I don’t know how to do — for now we can re-use the GbE from the old ROM, using ifdtool to chop it out.

$ ifdtool -x coreboot-orig.rom
File coreboot-orig.rom is 8388608 bytes
  Flash Region 0 (Flash Descriptor): 00000000 - 00000fff
  Flash Region 1 (BIOS): 00600000 - 007fffff
  Flash Region 2 (Intel ME): 00001000 - 005f5fff
  Flash Region 3 (GbE): 005f6000 - 005f7fff
  Flash Region 4 (Platform Data): 005f8000 - 005fffff
$ mv flashregion_3_gbe.bin gbe.bin

Next, we place the IFD and GbE regions in a blobs directory in the root of the coreboot tree, and then direct the build system to use them when generating an image. Here are the menuconfig settings:

General setup ---> [*] Use CMOS for configuration values
              ---> [*] Allow use of binary-only repository
Mainboard ---> Mainboard vendor ---> Lenovo
          ---> Mainboard model ---> ThinkPad T500
          ---> Size of CBFS filesystem in ROM (0x7fd000)
Chipset ---> [*] Add Intel descriptor.bin file (blobs/ifd.bin)
        ---> [*] Add gigabit ethernet configuration (blobs/gbe.bin)
        ---> Protect flash regions ---> Unlock flash regions
Devices ---> Display ---> Linear "high-resolution" framebuffer
Payloads ---> Add a payload ---> SeaBIOS
         ---> SeaBIOS version ---> master

Note that since the ME and Platform Data regions have been deleted from the new layout, we can expand the CBFS to the entire bios size — 0x7fd000 bytes, nearly the entire flash chip. This will allow in the future installing bigger and more interesting payloads, such as GRUB or LinuxBoot.

Finally, I wrote this new image to the flash chip. It worked. Huzzah!

After installation, we can check the status of the ME using intelmetool.

$ sudo intelmetool -m
Can't find ME PCI device

Perfect.

[GSoC] Ghidra firmware utilities, week 10

As stated in last week’s blogpost, I have started working on the UEFI helper script (aptly named UEFIHelper). The aim of this script is to assist with reverse engineering UEFI binaries. Similar projects exist for IDA Pro, including ida-efiutils, ida-efitools, and EFISwissKnife.

Background information

UEFI executables are either PE32(+) or TE binaries. The signature of the entry point function depends on the module type; some examples are PEI modules, DXE drivers, and UEFI applications. DXE drivers and standard UEFI applications use the following entry point function:

EFI_STATUS
_ModuleEntryPoint (
  IN EFI_HANDLE        ImageHandle,
  IN EFI_SYSTEM_TABLE  *SystemTable
  );

Other types of modules (such as PEI modules and PEI/DXE core modules) may have different parameters and return types for the entry point function. Nevertheless, we’ll focus on the standard entry point for now. ImageHandle is a firmware-allocated handle for the current EFI application. SystemTable is a pointer to the EFI_SYSTEM_TABLE structure, which in turn has pointers to other EFI tables (such as EFI_BOOT_SERVICES and EFI_RUNTIME_SERVICES). These tables provide data structures and function pointers for standard UEFI functionality, such as getting/setting NVRAM variables, locating/installing UEFI protocols, loading additional UEFI images, rebooting the system, etc.

UEFI’s extensibility is largely implemented through the use of protocols. Protocols are data structures used to enable communication between different UEFI modules, and can be identified by a GUID. A simplified example could be a a UEFI driver for a graphics card. It could support pre-boot graphics output by installing an implementation of the EFI_GRAPHICS_OUTPUT_PROTOCOL, which could then be located and used by other UEFI applications and drivers for graphics output.

UEFIHelper progress

MdePkg in EDK2 includes headers for core UEFI types and protocols. Given a parser configuration file, the C parser in Ghidra can be used to generate data type archives, which are used for storing type definitions in Ghidra. I used the MdePkg headers to generate UEFI data type archives for x86, x86_64, ARMv7, and ARMv8 (AArch64). UEFIHelper will automatically load the correct data type library for the current program’s architecture.

UEFIHelper will search for known GUIDs in the .data segment of the current UEFI program and apply the EFI_GUID type definition. UEFIHelper will also fix the entry point function signature to match the standard entry point for UEFI DXE drivers and applications.

UEFIHelper is a part of the ghidra-firmware-utils extension, and is available on GitHub as usual. I will continue to work on UEFIHelper during the next week.

Announcing coreboot 4.10

The 4.10 release covers commit a2faaa9a2 to commit ae317695e3 There is a pgp signed 4.10 tag in the git repository, and a branch will be created as needed.

In nearly 8 months since 4.9 we had 198 authors commit 2538 changes to master. Of these, 85 authors made their first commit to coreboot: Welcome!

Between the releases the tree grew by about 11000 lines of code plus 5000 lines of comments.

Again, a big Thank You to all contributors who helped shape the coreboot project, community and code with their effort, no matter if through development, review, testing, documentation or by helping people asking questions on our venues like IRC or our mailing list.

What’s New

Most of the changes were to mainboards, and on the chipset side, lots of activity concentrated on x86. However compared to previous releases activity (and therefore interest, probably) increased in vboot and in non-x86 architectures. However it’s harder this time to give this release a single topic like the last: This release accumulates some of everything.

Clean Up

As usual, there was a lot of cleaning up going on, and there notably, a good chunk of this year’s Google Summer of Code project to clean out the issues reported by Coverity Scan is already in.

The only larger scale change that was registered in the pre-release notes was also about cleaning up the tree:

device_t is no more

coreboot used to have a data type, device_t that changed shape depending on whether it is compiled for romstage (with limited memory) or ramstage (with unlimited memory as far as coreboot is concerned). It’s an old relic from the time when romstage wasn’t operated in Cache-As-RAM mode, but compiled with our romcc compiler.

That data type is now gone.

Release Notes maintenance

Speaking of pre-release notes: After 4.10 we’ll start a document for 4.11 in the git repository. Feel free to add notable achievements there so we remember to give them a shout out in the next release’s notes.

Known Issues

Sadly, Google Cyan is broken in this release. It doesn’t work with the "C environment" bootblock (as compared to the old romcc type bootblock) which is now the default. Sadly it doesn’t help to simply revert that change because doing so breaks other boards.

If you want to use Google Cyan with the release (or if you’re tracking the master branch), please keep an eye on https://review.coreboot.org/c/coreboot/+/34304 where a solution for this issue is sought.

Deprecations

As announced in the 4.9 release notes, there are no deprecations after 4.10. While 4.10 is also released late and we target a 4.11 release in October we nonetheless want to announce deprecations this time: These are under discussion since January, people are working on mitigations for about as long and so it should be possible to resolve the outstanding issues by the end of October.

Specifically, we want to require code to work with the following Kconfig options so we can remove the options and the code they disable:

  • C_ENVIRONMENT_BOOTBLOCK
  • NO_CAR_GLOBAL_MIGRATION
  • RELOCATABLE_RAMSTAGE

These only affect x86. If your platform only works without them, please look into fixing that.

Added 28 mainboards:

  • ASROCK H110M-DVS
  • ASUS H61M-CS
  • ASUS P5G41T-M-LX
  • ASUS P5QPL-AM
  • ASUS P8Z77-M-PRO
  • FACEBOOK FBG1701
  • FOXCONN G41M
  • GIGABYTE GA-H61MA-D3V
  • GOOGLE BLOOG
  • GOOGLE FLAPJACK
  • GOOGLE GARG
  • GOOGLE HATCH-WHL
  • GOOGLE HELIOS
  • GOOGLE KINDRED
  • GOOGLE KODAMA
  • GOOGLE KOHAKU
  • GOOGLE KRANE
  • GOOGLE MISTRAL
  • HP COMPAQ-8200-ELITE-SFF-PC
  • INTEL COMETLAKE-RVP
  • INTEL KBLRVP11
  • LENOVO R500
  • LENOVO X1
  • MSI MS7707
  • PORTWELL M107
  • PURISM LIBREM13-V4
  • PURISM LIBREM15-V4
  • SUPERMICRO X10SLM-PLUS-F
  • UP SQUARED

Removed 7 mainboards:

  • GOOGLE BIP
  • GOOGLE DELAN
  • GOOGLE ROWAN
  • PCENGINES ALIX1C
  • PCENGINES ALIX2C
  • PCENGINES ALIX2D
  • PCENGINES ALIX6

Removed 3 processors:

  • src/cpu/amd/geode_lx
  • src/cpu/intel/model_69x
  • src/cpu/intel/model_6dx

Added 2 socs:

  • src/soc/amd/picasso
  • src/soc/qualcomm/qcs405

Toolchain

  • Update to gcc 8.3.0, binutils 2.32, IASL 20190509, clang 8

[GSoC] Ghidra firmware utilities, weeks 6-8

Hello everyone. It’s been a few weeks since I’ve written my last blog post, and during that time I’ve been working on the FS loader for UEFI firmware images. This FS loader aims to implement functionality similar to UEFITool in Ghidra.

As described in the previous blog post, Intel platforms divide the flash chip into several regions, including the BIOS region. On UEFI systems, the BIOS region is used to store UEFI firmware components, which are organized in a hierarchy. This hierarchy begins with UEFI firmware volumes, which consist of FFS (firmware file system) files. In turn, these FFS files can contain multiple sections. Firmware volumes can also be nested within FFS files. This helpful reference by Trammell Hudson as well as this presentation from OpenSecurityTraining have some additional information regarding UEFI firmware volumes.

For example, a UEFI firmware implementation could have a firmware volume specifically for the Driver eXecution Environment (DXE phase). Stored as FFS files, DXE drivers within the firmware volume could consist of a PE32 section to store the actual driver binary, as well as a UI section to store the name of the driver.

So far, I’ve implemented basic firmware volume parsing in the FS loader; I’ve pushed this to the GitHub repository. Currently, this doesn’t handle FFS file or section parsing.

FFS file and section parsing is still a work-in-progress, but here’s a preview:

This is mostly complete, but there are still some nasty bugs related to FFS alignment that I’m working on fixing. My focus for this week is to finish up this FS loader.

Update (2019-07-19)

I have committed support for UEFI FFS file/section parsing in the GitHub repo. Please open an issue report if you encounter any issues with it (such as missing files/sections that UEFITool or other tools parse without issues).

[GSoC] How to Run C Code in Bootblock Stage for QEMU/AArch64

Hi, I’m Asami. My project “adding QEMU/AArch64 support to coreboot” is making good progress. I’ve almost done to write porting code and I’m now writing a new tool to make a FIT payload for QEMU/AArch64. Here is my CL. In this article, I’m going to talk about how to run C code in the bootblock stage.

The way to run C code before DRAM has been initialized is various in the coreboot project. The most famous and used in the x86 system is known as the Cache-As-Ram (CAR). CAR can be set up by 1. enable CPU cache 2. enable the ‘no eviction’ mode 3. change cache mode from write-through to write-back. ‘No-eviction’ means that the CPU doesn’t write any data to DRAM as long as the size of data is less than the CPU cache. Then, you can get all the data from the CPU cache. The implementation for CAR is src/cpu/intel/car/non-evict/cache_as_ram.S.

Another way to run C code especially implemented in ARM system is relocating the bootblock code to SRAM. System on a Chip (SoC) has an ARM CPU and often includes SRAM as a cache. For example, QEMU VExpress machine has SRAM is located at 0x48000000 that we can know in qemu/hw/arm/vexpress.c#L120.

However, what should the system that doesn’t have SRAM do? My target machine, QEMU virt machine, doesn’t have SRAM according to the implementation of qemu/hw/arm/virt.c. In this case, should we initialize DRAM earlier?

The answer is No but it’s only for my project. Because QEMU is not actual hardware so DRAM already works. I just need to relocate the bootblock code to DRAM directly. I believe that other ARM systems that have no SRAM should initialize DRAM earlier.

You can see the relocation code in bootblock_custom.S under review. https://review.coreboot.org/c/coreboot/+/33387

[GSoC] Coreboot Coverity, weeks 5-7

Hello again! It’s been three weeks since my last post (well, one of those weeks I was on vacation, so more like two weeks), so there are many many updates to write about. If you recall from my last post, Coverity has been down for maintenance, so I started looking around for other things to do. Here is a list of some of the highlights:

  • Coverity isn’t the only static analyzer we use – coreboot also runs nightly scans using the Clang Static Analyzer, an open source tool from the LLVM project. The results from these scans aren’t as detailed as Coverity and unfortunately contain many duplicates and false positives, but I was able to submit patches for some of the issues. Usability of the analyzer is also still somewhat limited, since we have no way of recording fixes for past issues or ignoring false positives. (A framework like CodeChecker might be able to help with this.)
  • coreboot will (hopefully) soon be (almost) VLA free! Variable length arrays (VLAs) were a feature added in C99 that allows the length of an array to be determined at run time, rather than compile time. While convenient, this is also very dangerous, since it allows use of an unlimited amount of stack memory, potentially leading to stack overflow. This is especially dangerous in coreboot, which often has very little stack space to begin with. Fortunately, almost all of the VLAs in coreboot can be replaced with small fixed-sized buffers, aside from one tricky exception that will require a larger rewrite. This patch was inspired by a similar one for Linux, and is currently under review.
  • Similar to the above, coreboot will also soon be free of implicit fall throughs! Switches in C have the unfortunate property of each case statement implicitly falling through to the next one, which has led to untold number of bugs over the decades and at least 38 in coreboot itself (by Coverity’s count). GCC however recently added the -Wimplicit-fallthrough warning, which we can use to banish them for good. As of this patch, all accidental fall throughs have been eliminated, with the intentional ones marked with the /* fall through */ comment. Hopefully this is the last we see of this type of bug.
  • coreboot now has a single implementation of <stdint.h>. Previously, each of the six supported architectures (arm, arm64, mips, ppc64, riscv, and x86) had their own custom headers with arch-specific quirks, but now they’ve all been tidied up and merged together. This should make implementing further standard library headers easier to do (such as <inttypes.h>).
  • Finally, many coreboot utilities and tools such as flashrom, coreinfo, nvramtool, inteltool, ifdtool, and cbmem have stricter compiler warnings, mostly from -Wmissing-prototypes and -Wextra. Fixing and adding compiler warnings like these are very easy to do, and make great first time contributions (-Wconversion and -Wsign-compare in particular can always use more work).

That’s it for this week! Thankfully Coverity is back online now, so expect more regular blog posts in the future.

Creating hardware where no hardware exists

The laptop industry was still in its infancy back in 1990, but it still faced a core problem that we do today - power and thermal management are hard, but also critical to a good user experience (and potentially to the lifespan of the hardware). This is in the days where DOS and Windows had no memory protection, so handling these problems at the OS level would have been an invitation for someone to overwrite your management code and potentially kill your laptop. The safe option was pushing all of this out to an external management controller of some sort, but vendors in the 90s were the same as vendors now and would do basically anything to avoid having to drop an extra chip on the board. Thankfully(?), Intel had a solution. The 386SL was released in October 1990 as a low-powered mobile-optimised version of the 386. Critically, it included a feature that let vendors ensure that their power management code could run without OS interference. A small window of RAM was hidden behind the VGA memory[1] and the CPU configured so that various events would cause the CPU to stop executing the OS and jump to this protected region. It could then do whatever power or thermal management tasks were necessary and return control to the OS, which would be none the wiser. Intel called this System Management Mode, and we've never really recovered. Step forward to the late 90s. USB is now a thing, but even the operating systems that support USB usually don't in their installers (and plenty of operating systems still didn't have USB drivers). The industry needed a transition path, and System Management Mode was there for them. By configuring the chipset to generate a System Management Interrupt (or SMI) whenever the OS tried to access the PS/2 keyboard controller, the CPU could then trap into some SMM code that knew how to talk to USB, figure out what was going on with the USB keyboard, fake up the results and pass them back to the OS. As far as the OS was concerned, it was talking to a normal keyboard controller - but in reality, the "hardware" it was talking to was entirely implemented in software on the CPU. Since then we've seen even more stuff get crammed into SMM, which is annoying because in general it's much harder for an OS to do interesting things with hardware if the CPU occasionally stops in order to run invisible code to touch hardware resources you were planning on using, and that's even ignoring the fact that operating systems in general don't really appreciate the entire world stopping and then restarting some time later without any notification. So, overall, SMM is a pain for OS vendors. Change of topic. When Apple moved to x86 CPUs in the mid 2000s, they faced a problem. Their hardware was basically now just a PC, and that meant people were going to try to run their OS on random PC hardware. For various reasons this was unappealing, and so Apple took advantage of the one significant difference between their platforms and generic PCs. x86 Macs have a component called the System Management Controller that (ironically) seems to do a bunch of the stuff that the 386SL was designed to do on the CPU. It runs the fans, it reports hardware information, it controls the keyboard backlight, it does all kinds of things. So Apple embedded a string in the SMC, and the OS tries to read it on boot. If it fails, so does boot[2]. Qemu has a driver that emulates enough of the SMC that you can provide that string on the command line and boot OS X in qemu, something that's documented further here. What does this have to do with SMM? It turns out that you can configure x86 chipsets to trap into SMM on arbitrary IO port ranges, and older Macs had SMCs in IO port space[3]. After some fighting with Intel documentation[4] I had Coreboot's SMI handler responding to writes to an arbitrary IO port range. With some more fighting I was able to fake up responses to reads as well. And then I took qemu's SMC emulation driver and merged it into Coreboot's SMM code. Now, accesses to the IO port range that the SMC occupies on real hardware generate SMIs, trap into SMM on the CPU, run the emulation code, handle writes, fake up responses to reads and return control to the OS. From the OS's perspective, this is entirely invisible[5]. We've created hardware where none existed. The tree where I'm working on this is here, and I'll see if it's possible to clean this up in a reasonable way to get it merged into mainline Coreboot. Note that this only handles the SMC - actually booting OS X involves a lot more, but that's something for another time. [1] If the OS attempts to access this range, the chipset directs it to the video card instead of to actual RAM. [2] It's actually more complicated than that - see here for more. [3] IO port space is a weird x86 feature where there's an entire separate IO bus that isn't part of the memory map and which requires different instructions to access. It's low performance but also extremely simple, so hardware that has no performance requirements is often implemented using it. [4] Some current Intel hardware has two sets of registers defined for setting up which IO ports should trap into SMM. I can't find anything that documents what the relationship between them is, but if you program the obvious ones nothing happens and if you program the ones that are hidden in the section about LPC decoding ranges things suddenly start working. [5] Eh technically a sufficiently enthusiastic OS could notice that the time it took for the access to occur didn't match what it should on real hardware, or could look at the CPU's count of the number of SMIs that have occurred and correlate that with accesses, but good enough comment count unavailable comments