Updating LZ4 during my GSoC

Introduction

In this blog post, I will give an insight into my Google Summer of Code project, where I worked on coreboot. My target goals were updating compression and decompression algorithms LZ4 and LZMA to their newest version.

In the whole process, I learned a lot about coreboot as an open-source community, and about GSoC, which could help students considering applying for a GSoC themselves.

This blog won’t be addressing technical details, there will be a section at the end about my work and the results of it, but the rest of the blog will be a bit more meta.

What did I learn about open-source communities?

Before this project, I never contributed to any open-source project. My studies don’t involve much programming, therefore I’m not that professional, my programming skills are homebrewed. Now open source communities, especially as big as coreboot, really intimidated me (they still do somehow). There are many people, all of them probably better programmers than me, having more knowledge of the project as well. How could I help them with anything useful at all? And what if I push my first patch and it’s just embarrassing? Having a mentor helped, I think without someone to talk to, I would have never stuck my nose into any big open-source project.

And all the people I met were very kind and helpful if needed. And I learned that critics, especially in text form, may sound harsher in my head than they are meant to.

Thoughts about Google’s Summer of Code

GSoC is a really good opportunity to get a feeling for working on open source. If you are a student searching for insight into open source, guided by a mentor, a GSoC project can be just right for you.

But something I underestimated, was, how time-intensive such a project (even a medium one) is. I probably was too naive, but beforehand, I just talked myself into “Yeah this won’t be a problem, I can just work around two days a week for this, just shift stuff to other days.” Well, it turns out, that’s not how workload and mental load work. For me at least. I do work besides my studies and in the first weeks of GSoC, I was overwhelmed by the combination. Besides having fewer hours to do other things, just having more things to think about can be quite stressful under some conditions.

GSoC allows you to stretch your project, so there is less work per week. In total, the project hours stay the same. This opportunity really helped me because I underestimated the load the weekly hours would apply.

If I were to apply to GSoC again, I would double-check if such a commitment in terms of work is feasible for me. And that I would recommend to everyone thinking about applying. GSoC can be interesting and fun, but you need to be able to provide the time needed for the project.

What did I do and achieve?

I started my project by updating LZ4 code in coreboot. After that, I planned to move on to another compression algorithm, LZMA. My hopes were to increase the decompression speed and the compression ratio at the same time. Looking at the release notes for LZ4 since the version currently used by coreboot (spanning 8 years of releases), they stated an increase in speed and compression factor.

To get an overview, I first searched for all places where LZ4 was used and checked the version which the code was from. I found out, that there are five files containing code from the LZ4 repository, where every file contains another subset of functions from the original file. 3 of these files were imported from LZ4 wrapper files.

Then I fetched LZ4 code from the newest LZ4 release and merged it into the files where old LZ4 code was used. After merging, there were many LZ4 functions never used in our use case. Therefore I searched for each source file through each wrapper file to find out which functions were used and which I could remove. This way I could ensure there was no unused code in these files, providing a better overview

After that, there still were many errors that needed to be resolved. That took the most time, being more complicated than I assumed. In the end, all tests passed, and building a rom and running it with QEMU worked just fine, my last ToDo was to test how much the new version was faster than the old if it was at all.

Release notes stated many speed improvements, so I hoped this would show up after my work.

The whole process took longer than I thought. Therefore I will miss the goal of updating LZMA. As I am writing this, my patch is open for review and I am in the process of creating statistics that may show that there is a speed improvement. If there is not, maybe there is no good reason to change to a newer LZ4 version. As one comment states, coreboot does require safe in-place decompression, which may be a problem with the new version and thus would have to be checked.

My work is public on the coreboot Gerrit under this link https://review.coreboot.org/c/coreboot/+/77148. I do hope my patch will be merged, as I want to contribute to this project. But even if a merge may be rejected, that is part of open-source work too. I’ll try to improve on my next patch. This is my first open source contribution and it’s not perfect, but it certainly won’t be my last.

What could be done in the future?

As stated in a comment on my patch, in-place decompression is important for coreboot. There is a merged pull request from 2016 resolving that issue, but its functionality may have been lost in further development. Therefore, the new LZ4 version has to be checked for in-place safety.

To tweak compression/decompression speeds and compression factor one might want to edit compression parameters. One could do this using a Kconfig entry.

Also, it occurred that after building coreboot (running make in coreboot dictionary), when memory regions and sizes of the coreboot.rom are printed, there is no compression listed for the payload, even when it certainly has been compressed.

Statistics

I tested this on my laptop with KVM and QEMU, on an AMD® Ryzen 5 5500u. I changed my system’s CPU affiliation, to make 2 of my cores unused (they still got used, but not much, usually ~1%). Then I assigned these 2 CPUs to my VM to make different runs more comparable. Still, different runs got different timings, so I made 6 runs for the old LZ4 and the new LZ4 version each.

Avg ramstage decompression timeRamstage decompression time std devAvg payload decompression timePayload decompression time std dev
Current version420µs54µs229µs64µs
Updated version520µs113µs219µs4µs
Ramstage compression ratioPayload compression ratio
Current version1,5071,452
Updated version1,511,454

These values indicate that there is a very small improvement in the compression ratio. Regarding decompression time, most tests have a relatively high standard deviation, which makes the results less statistically relevant. Ramstage decompression seems to have slowed down (~24%) while average payload decompression got 4-5% faster.

All in all, these results are everything but exciting to me. Although a newer version may have other advantages, it seems that there is no decompression time improvement as I hoped and compression ratio improvements are so small, that they might not be noticeable.

[GSoC] Optimize Erase Function Selection, wrap-up

GSoC 2022 coding period is about to come to an end this week. It has been an enriching 12 weeks of reading old code, designing algorithms and structures, coding, testing, and hanging out over IRC! I’d like to take this opportunity to present my work and details on how it has impacted Flashrom. 🙂

You can find the complete list of commits I made during GSoC with this gerrit query. Some of the patches aren’t currently merged and are under review. In any case, you are most welcome to join the review (which will likely be very helpful for me).

Definitions

Let me clarify some terminologies used hereafter

  • Region: A list of contiguous addresses on the chip
  • Layout: List of regions
  • Locked region: There are some regions on the chip that cannot be accessed directly or in some cases cannot be accessed at all.
  • Page: The entire flash memory is divided into uniform pages that can be written on. You have to write on an entire page at once.
  • Erase block/sector: According to each erase function, flash memory is divided into regions (not necessarily uniform sized) that can be erased. You have to erase an entire erase block at once. So each erase function has a different erase block layout.
  • Sub-block: An erase block is a sub-block of another erase block if it is completely inside the latter.
  • Homogenous erase layout: All erase blocks in a layout are of uniform size.
  • Nonhomogenous erase layout: There might be erase blocks of varying sizes in this layout.

Need for this project

To change any 0 to a 1 in the contents of a NOR-flash chip, a full block of data needs to be erased. So to change a single byte, it is sometimes required to erase a whole block, which we’ll call erase overhead. Most flash chips support multiple erase-block sizes. Flashrom keeps a list of these sizes and ranges for each supported flash chip and maps them to internal erase functions. Most of these lists were sorted by ascending size. I added a simple test to ensure this and corrected the ones which gave errors.

Earlier, Flashrom tried all available erase functions for a flash chip during writes. Usually, only the first function was used, but if anything went wrong on the wire, or the programmer rejected a function, it falls back to the next. As the functions are sorted by size, this results in a minimum erase overhead.

However, if big portions of the flash contents are re-written, using the smallest erase block size results in transaction overhead, and also erasing bigger blocks at once often results in shorter erase times inside the flash chip.

So in a nutshell, the erase function selection in Flashrom was sequential, i.e. they started from the function with the smallest erase block and moved to the next one only in case of an error. This was inefficient in cases where bigger chunks of the flash had to be erased.

Current implementation

After rigorous discussion with my mentors and with the community, we decided on a look-ahead algorithm for selecting the erase functions or the erase blocks to be more precise. From the layout of erase blocks as seen by different erase functions, we make an optimal selection.

We start with the first erase function (remember this erase function has the smallest erase blocks) and mark those blocks which lie completely inside the region to be erased. Then moving on to subsequent erase layouts and marking those erase blocks that have more than half of the sub-blocks marked and then unmarking the marked sub-blocks. Finally, erase all the marked erase blocks. This strategy ensures that if we have a large contiguous region to be erased then we would use an appropriately large erase function to erase it.

The above-mentioned strategy had some challenges and required a few tweaks:

  • The region to be erased may not be aligned to any erase blocks. This could be easily overcome by extending the ‘to be erased’ region to align it with some erase block.
  • The above solution of extending the region brought a new challenge – the newly aligned region might overlap with write-protected regions. In this case, the erase function would fail to erase the erase block containing this region.
  • It may happen that some erase functions are not supported by the programmer. This would lead to catastrophic failure. To this end, we thought of first getting a list of erase functions supported by the programmer and then running the algorithm on this subset of erase functions.

So with this, we were all set to begin the implementation of the algorithm. Or so I thought. Flashrom had stored information regarding the erase layout in a nested format which was very difficult to use so I had to flatten them out and fit them in a data structure that would give me quick information about sub-blocks as well. 😮‍💨

For a better understanding of the algorithm, I made the following flowchart.

Testing

Testing on hardware is difficult because you need all the different chips and programmers and test on each one of them to be sure that your code is working correctly. On top of that, identifying all possible scenarios adds to the burden. This definitely takes a lot of time and effort and my mentors and the community is helping a lot with it.

The current progress of testing can be seen here. Feel free to suggest more cases and help with the testing. It will definitely help a lot.

Future work

Definitely, the code or algorithm isn’t fully optimized and can be improved further. Also one can fix the progress bar feature to Flashrom, which got broken due to my code😅.

Acknowledgments

I’d like to thank Thomas Heijligen and Simon Buhrow for being excellent mentors, Nico for providing constant code reviews and giving regular inputs, and everyone in the Flashrom community for being helpful and friendly.

[GSoC] Address Sanitizer, Wrap-up

Hello everyone. The coding period for GSoC 2020 is now officially over and it’s time for the final evaluation. I’ll use this blog post to summarize the project details, illustrate the instructions to use ASan, and discuss some ideas on what can be done further to enhance this feature.

You can find the complete list of commits I made during GSoC with this Gerrit query.


Project details

Memory safety is hard to achieve. We, as humans, are bound to make mistakes in our code. While it may be straightforward to detect memory corruption bugs in few lines of code, it becomes quite challenging to find those bugs in a massive code. In such cases, ‘Address Sanitizer’ may prove to be useful and could help save time.

Address Sanitizer, also known as ASan, is a runtime memory debugger designed to find out-of-bounds accesses and use-after-scope bugs. Over the past couple of weeks, I’ve been working to add support for ASan to coreboot. You can read my previous blog posts (Part 1, Part 2, and Part 3) to see my progress throughout the summer.

Here is a description of the components included in the project:

GCC patch

The design of ASan in coreboot is based on its implementation in Linux kernel, also known as Kernel Address Sanitizer (KASAN). However, we can’t directly port the code from Linux.

Unlike the Linux kernel which has a static shadow region layout, we have multiple stages in coreboot and thus require a different shadow offset address. Unfortunately, GCC currently only supports adding a static shadow offset at compile time using -fasan-shadow-offset flag. Therefore the foremost task was to add support for dynamic shadow offset to GCC.

We enabled GCC to determine the shadow offset address at runtime using a callback function named __asan_shadow_offset. This supersedes the need to specify this address at compile time. GCC then makes use of this shadow offset in its internal mem_to_shadow translation function to poison stack variables’ redzones.

The patch further allowed us to place the shadow region in a separate linker section. This ensured if a platform didn’t have enough memory space to hold the shadow buffer, the build would fail.

The way the patch was introduced to GCC’s code base ensures that if
one compiles a piece of code with the new switch enabled i.e. --param asan-use-shadow-offset-callback=1 but has not applied the patch itself to GCC, the compiler will throw the following error because the newly introduced switch is unknown for an out of box GCC: invalid --param name 'asan-use-shadow-offset-callback‘.

I believe this patch might also be useful to the developers who contribute to other open-source projects. Hence, I’ve put this patch on GCC’s mailing list and asked GCC’s developers to include this feature in their upcoming release.

ASan in ramstage

Since ramstage uses DRAM, regardless of the platform, it should always have enough room in the memory to hold the shadow buffer. Therefore, I began by adding support for ASan in ramstage on x86 architecture.

To reserve space in memory for the shadow region, I created a separate linker section and named it asan_shadow. Here, instead of allocating shadow memory for the whole memory region which includes drivers and hardware mapped addresses, I only defined shadow region for the data and heap sections.

Then I started porting KASAN library functions, tweaking them to make them suitable for coreboot.

The next task was to initialize the shadow memory at runtime. I created a function called asan_init which unpoisons i.e. sets the shadow memory corresponding to the addresses in the data and heap sections to zero.

In the case of global variables, instead of poisoning the redzones directly, the compiler inserts constructors invoking the library function named __asan_register_globals to populate the relevant shadow memory regions. So, I wrote a function named asan_ctors which calls these constructors at runtime and added a call to this function to asan_init().

After doing some tests, I realized that compiler’s ASan instrumentation cannot insert asan_load or asan_store state checks in the memory functions like memset, memmove and memcpy as they are written in assembly. So, I added manual checks using the library function named check_memory_region for both source and destination pointers.

ASan in romstage

Once I had ASan in ramstage working as expected, I started adding support for ASan to romstage.

It was challenging because of two reasons. First, even within the same architecture, the size of L1 cache varies across the platforms from 32KB in Braswell to 80KB in Ice Lake and thus we can’t enable ASan in romstage for all platforms by doing tests on a handful of devices. Second, the size of a cache is very small compared to RAM making it difficult to fit asan_shadow section in the limited memory.

Thankfully, the latter issue, to a large extent, was solved by our GCC patch which allowed us to append asan_shadow section to the region already occupied by the coreboot program and make efficient use of limited memory.

Now to resolve the first issue, I introduced a Kconfig option called HAVE_ASAN_IN_ROMSTAGE to denote if a particular platform supports ASan in romstage. This allowed us to enable ASan in romstage only for the platforms which have been tested.

Based on the hardware available with me and my mentor, I enabled ASan in romstage for Haswell and Apollo Lake platforms, apart from QEMU.


Project usage

Instructions for how to use ASan are included in ASan documentation. I’ll restate them with an example here.

Suppose there is a stack-out-of-bounds error in cbfs.c that we aren’t aware of. Let’s see if ASan can help us detect it.

int cbfs_boot_region_device(struct region_device *rdev)
{
	int stack_array[5], i;
	boot_device_init();

	for (i = 10; i > 0; i--)
		stack_array[i] = i;

	return vboot_locate_cbfs(rdev) &&
	       fmap_locate_area_as_rdev("COREBOOT", rdev);
}

First, we have to enable ASan from the configuration menu. Just select Address sanitizer support from General setup menu. Now, build coreboot and run the image.

ASan will report the following error in the console log:

ASan: stack-out-of-bounds in 0x7f7432fd
Write of 4 bytes at addr 0x7f7c2ac8

Here 0x7f7432fd is the address of the last good instruction before the bad access. In coreboot, stages are relocated. So, we have to normalize this address to find the instruction which causes this error.

For this, let’s subtract the start address of the stage i.e. 0x7f72c000. The difference we get is 0x000172fd. As per our console log, this error happened in the ramstage. So, let’s look at the sections headers of ramstage from ramstage.debug.

 $ objdump -h build/cbfs/fallback/ramstage.debug

build/cbfs/fallback/ramstage.debug:     file format elf32-i386

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00070b20  00e00000  00e00000  00001000  2**12
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .ctors        0000036c  00e70b20  00e70b20  00071b20  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
  2 .data         0001c8f4  00e70e8c  00e70e8c  00071e8c  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
  3 .bss          00012940  00e8d780  00e8d780  0008e780  2**7
                  ALLOC
  4 .heap         00004000  00ea00c0  00ea00c0  0008e780  2**0
                  ALLOC

Here the offset of the text segment is 0x00e00000. Let’s add this offset to the difference we calculated earlier. The resultant address is 0x00e172fd.

Next, we read the contents of the symbol table and search for a function having an address closest to 0x00e172fd.

 $ nm -n build/cbfs/fallback/ramstage.debug
........
........
00e17116 t _GLOBAL__sub_I_65535_1_gfx_get_init_done
00e17129 t tohex16
00e171db T cbfs_load_and_decompress
00e1729b T cbfs_boot_region_device
00e17387 T cbfs_boot_locate
00e1740d T cbfs_boot_map_with_leak
00e174ef T cbfs_boot_map_optionrom
........

The symbol having an address closest to 0x00e172fd is cbfs_boot_region_device and its address is 0x00e1729b. This is the function in which our memory bug is present.

Now, as we know the affected function, we read the assembly contents of cbfs_boot_region_device which is present in cbfs.o to find the faulty instruction.

 $ objdump -d build/ramstage/lib/cbfs.o
........
........
  51:   e8 fc ff ff ff          call   52 <cbfs_boot_region_device+0x52>
  56:   83 ec 0c                sub    $0xc,%esp
  59:   57                      push   %edi
  5a:   83 ef 04                sub    $0x4,%edi
  5d:   e8 fc ff ff ff          call   5e <cbfs_boot_region_device+0x5e>
  62:   83 c4 10                add    $0x10,%esp
  65:   89 5f 04                mov    %ebx,0x4(%edi)
  68:   4b                      dec    %ebx
  69:   75 eb                   jne    56 <cbfs_boot_region_device+0x56>
........

Let’s look for the last good instruction before the error happens. It would be the one present at the offset 62 (0x00e172fd0x00e1729b).

The instruction is add $0x10,%esp and it corresponds to for (i = 10; i > 0; i--) in our code. It means the very next instruction i.e. mov %ebx,0x4(%edi) is the one that causes the error. Now, if you look at C code of cbfs_boot_region_device() again, you’ll find that this instruction corresponds to stack_array[i] = i.

Voilà! we just caught the memory bug using ASan.


Future work

While my work for GSoC 2020 is complete, I think the following extensions would be useful for this project:

Heap buffer overflow

Presently, ASan doesn’t detect out-of-bounds accesses for the objects defined in heap. Fortunately, the support for these types of memory bugs can be added easily.

We just have to make sure that whenever some block of memory is allocated in the heap, the surrounding areas (redzones) are poisoned. Correspondingly, these redzones should be unpoisoned when the memory block is de-allocated.

Post-processing script

Unlike Linux, coreboot doesn’t have %pS printk format to dereference a pointer to its symbolic name. Therefore, we normalize the pointer address manually as I showed above to determine the name of the affected function and further use it to find the instruction which causes the error.

A custom script can be written to automate this process.

Support for other platforms and architectures

Jenkins builder built successfully for all x86 boards except for the ones that hold either Braswell SoC or i440bx northbridge where the cache area got full and thus couldn’t fit the asan_shadow section. It shows that support for ASan in romstage can be easily added to most x86 platforms. We just have to test them by selecting HAVE_ASAN_IN_ROMSTAGE option and resolve the compilation errors if any.

Enabling ASan in ramstage on other architectures like ARM or RISC-V should be easy too. We just have to make sure the shadow memory is initialized as early as possible when ramstage is loaded. This can be done by making a function call to asan_init() at the appropriate place.

Similarly, ASan in romstage can be enabled for other architectures. I have mentioned some key points in ASan documentation which could be used by someone who might be interested in doing so.

For the platforms that don’t have enough space in the cache to hold the asan_shadow section, we have to come up with a new translation function that uses a much compact shadow memory. Since the stack buffers are protected by the compiler, we’ll also have to create another GCC patch forcing it to use the new translation function for this particular platform.


Acknowledgement

I’d like to thank my mentor Werner Zeh for his continued assistance during the past 13 weeks. This project certainly wouldn’t have been possible without his valuable suggestions and the knowledge he shared. I’d also like to thank Patrick Georgi for helping me with the work authorization initially and later supervising my work during the time when Werner was on vacation.

Further, I am grateful to every member of the community for assisting me whenever I got stuck, reviewing my code, reading my blogs, and sharing their feedback.


It has been an amazing journey and I look forward to contributing to coreboot in the future.

[GSoC] Address Sanitizer, Part 3

Hello again! The third and final phase of GSoC is coming to an end and I’m glad that I made it this far. In this blog post, I’d like to outline the work done in the last two weeks.


Memory functions

Compiler’s ASan instrumentation cannot insert asan_load or asan_store memory checks in the memory functions like memset, memmove and memcpy. This is because these functions are written in assembly.

While Linux kernel replaces these functions with their variants which are written in C, I took a different approach.

In coreboot, the assembly instructions for these memory functions are embedded into C code using GNU’s asm extension. This provided me with an opportunity to use the ASan library function named check_memory_region to manually check the memory state before performing each of these operations. At the start of each function, I added the following code snippet:

#if (ENV_ROMSTAGE && CONFIG(ASAN_IN_ROMSTAGE)) || (ENV_RAMSTAGE && CONFIG(ASAN_IN_RAMSTAGE))
	check_memory_region((unsigned long)src, n, false, _RET_IP_);
	check_memory_region((unsigned long)dest, n, true, _RET_IP_);
#endif

This way neither I had to fiddle with the assembly instructions nor I had to replace these functions with their C variants which might have caused some performance issues. [CB:44307]


Documentation

Since I finished a little early with what I had proposed to deliver, Werner suggested that I should write documentation on ASan and I am happy that he did. When I read the intro of the documentation guidelines, I realized how a feature as significant as ASan might go unnoticed and unused by many if it lacks proper documentation.

In ASan documentation, I have tried my best to answer questions like how to use ASan, what kind of bugs can be detected, what devices are currently supported, and how ASan support can be added to other architectures like ARM or RISC-V.

The documentation is not final yet and I’d really appreciate inputs from the community. So, please give it a read and share your feedback. [CB:44814]


In the end, I’d like to announce that ASan patches have been merged into the coreboot source tree. You can go ahead and make use of this debugging tool to look for memory corruption bugs in your code.

[ASan patches]

[GSoC] Address Sanitizer, Part 2

Hello again! Its been a month since my last blog post. So, there are many updates I’d like to share. I’ll first cover the Address Sanitizer (ASan) algorithm in detail and then summarize the progress made until the second evaluation period.


If you recall my last post, I had briefly talked about the principle behind ASan. Now, let us discuss the ASan algorithm in much more depth. But first, we need to understand the memory layout and how the compiler’s instrumentation works.

Memory mapping

ASan divides the memory space into 2 disjoint classes:

  • Main Memory (mem): This represents the original memory used by our coreboot program in a particular stage.
  • Shadow Memory (shadow): This memory region contains the shadow values. The state of each 8 aligned bytes of mem is encoded in a byte in shadow. As a consequence, the size of this region is equal to 1/8th of the size of mem. To reserve a space in memory for this class, we added a new linker section and named it asan_shadow.

This is how we added asan_shadow in romstage:

#if ENV_ROMSTAGE && CONFIG(ASAN_IN_ROMSTAGE)
	_shadow_size = (_ebss - _car_region_start) >>  3;
	REGION(asan_shadow, ., _shadow_size, ARCH_POINTER_ALIGN_SIZE)
#endif

and in ramstage:

#if ENV_RAMSTAGE && CONFIG(ASAN_IN_RAMSTAGE)
	_shadow_size = (_eheap - _data) >> 3;
	REGION(asan_shadow, ., _shadow_size, ARCH_POINTER_ALIGN_SIZE)
#endif

The linker symbol pairs (_car_region_start, _ebss) and ( _data, _eheap) are references to the boundary addresses of mem in romstage and ramstage respectively.

Now, there exists a correspondence between mem and shadow classes and we have a function named asan_mem_to_shadow that performs this translation:

void *asan_mem_to_shadow(const void *addr)
{
#if ENV_ROMSTAGE
	return (void *)((uintptr_t)&_asan_shadow + (((uintptr_t)addr -
		(uintptr_t)&_car_region_start) >> ASAN_SHADOW_SCALE_SHIFT));
#elif ENV_RAMSTAGE
	return (void *)((uintptr_t)&_asan_shadow + (((uintptr_t)addr -
		(uintptr_t)&_data) >> ASAN_SHADOW_SCALE_SHIFT));
#endif
}

In other words, asan_mem_to_shadow maps each 8 bytes of mem to 1 byte of shadow.

You may wonder what is stored in shadow? Well, there are only 9 possible shadow values for any aligned 8 bytes of mem:

  • The shadow value is 0 if all 8 bytes in qword are unpoisoned (i.e. addressable).
  • The shadow value is negative if all 8 bytes in qword are poisoned (i.e. not addressable).
  • The shadow value is k if the first k bytes are unpoisoned but the rest 8-k bytes are poisoned. Here k could be any integer between 1 and 7 (1 <= k <= 7).

When we say a byte in mem is poisoned, we mean one of these special values are written into the corresponding shadow.

Instrumentation

Compiler’s ASan instrumentation adds a runtime check to every memory instruction in our program i.e. before each memory access of size 1, 2, 4, 8, or 16, a function call to either __asan_load(addr) or __asan_store(addr) is added.

Next, it protects stack variables by inserting gaps around them called ‘redzones’. Let’s look at an example:

int foo ()
{
char a[24] = {0};
int b[2] = {0};
int i;

a[5] = 1;

for (i = 0; i < 10; i++)
    b[i] = i;

return a[5] + b[1];
}

For this function, the instrumented code will look as follows:

int foo ()
{
char redzone1[32]; // Slot 1, 32-byte aligned
char redzone2[8];  // Slot 2
char a[24] = {0};  // Slot 3
char redzone3[32]; // Slot 4, 32-byte aligned
int redzone4[6];   // Slot 5
int b[2] = {0};    // Slot 6
int redzone5[8];   // Slot 7, 32-byte aligned
int redzone6[7];
int i;
int redzone7[8];

__asan_store1(&a[5]);
a[5] = 1;

for (i = 0; i < 10; i++)
    __asan_store4(&b[i]);
    b[i] = i;

__asan_load1(&a[5]);
__asan_load4(&b[1]);
return a[5] + b[1];
}

As you can see, the compiler has inserted redzones to pad each stack variable. Also, it has inserted function calls to __asan_store and __asan_load before writes and reads respectively.

The shadow memory for this stack layout is going to look like this:

Slot 1:        0xF1F1F1F1
Slots 2, 3:    0xF1000000  
Slot 4:        0xF1F1F1F1
Slots 5, 6:    0xF1F1F100  
Slot 7:        0xF1F1F1F1

Here F1 being a negative value represents that all 8 bytes in qword are poisoned whereas the shadow value of 0 represents that all 8 bytes in qword are accessible. Notice that in the slots 2 and 3, the variable ‘a’ is concatenated with a partial redzone of 8 bytes to make it 32 bytes aligned. Similarly, a partial redzone of 24 bytes is added to pad the variable ‘b’.

The process of protecting global variables is a little different from this. We’ll talk about it later in this blog post.

Now, as we have looked at the memory mapping and the instrumented code, let’s dive into the algorithm.

ASan algorithm

The algorithm is pretty simple. For every read and write operation, we do the following:

  • First, we find the address of the corresponding shadow memory for the location we are writing to or reading from. This is done by asan_mem_to_shadow().
  • Then we determine if the access is valid based on the state stored in the shadow memory and the size requested. For this, we pass the address returned from asan_mem_to_shadow() to memory_is_poisoned(). This function dereferences the shadow pointer to check the memory state.
  • If the access is not valid, it reports an error in the console mentioning the instruction pointer address, the address where the bug was found, and whether it was read or write. In our ASan library, we have a dedicated function asan_report() for this.
  • Finally, we perform the operation (read or write).
  • Then, we continue and move over to the next instruction.

Here’s a pictorial representation of the algorithm:

Note that whether access is valid or not, ASan never aborts the current operation. However, the calls to ASan functions do add a performance penalty of about ~1.5x.

Now, as we have understood the algorithm, let’s look at the function foo() again.

int foo ()
{
char a[24] = {0};
int b[2] = {0};
int i;

__asan_store1(&a[5]);
a[5] = 1;

for (i = 0; i < 10; i++)
    __asan_store4(&b[i]);
    b[i] = i;

__asan_load1(&a[5]);
__asan_load4(&b[1]);
return a[5] + b[1];
}

Have a look at the for loop. The array ‘b’ is of length 2 but we are writing to it even beyond index 1.

As the loop is executed for index 2, ASan checks the state of the corresponding shadow memory for &b[2]. Now, let’s look at the shadow memory state for slot 7 again (shown in the previous section). It is 0xF1F1F1F1. So, the shadow value for the location b[2] is F1. It means the address we are trying to access is poisoned and thus ASan is triggered and reports the following error in the console log:

ASan: stack-out-of-bounds in 0x07f7ccb1
Write of 4 bytes at addr 0x07fc8e48

Notice that 0x07f7ccb1 is the address where the instruction pointer was pointing to and 0x07fc8e48 is the address of the location b[2].


ASan support for romstage

In the romstage, coreboot uses cache to act as a memory for our stack and heap. This poses a challenge when adding support for ASan to romstage because of two reasons.

First, even within the same architecture, the size of cache varies across the platforms. So, unlike ramstage, we can’t enable ASan in romstage for all platforms by doing tests on a handful of devices. We have to test ASan on each platform before adding this feature. So, we decided to introduce a new Kconfig option HAVE_ASAN_IN_ROMSTAGE to denote if a particular platform supports ASan in romstage. Now, for each platform for which ASan in romstage has been tested, we’ll just select this config option. Similarly, we also introduced HAVE_ASAN_IN_RAMSTAGE to denote if a given platform supports ASan in ramstage.

The second reason is that the size of a cache is very small compared to the RAM. This is critical because the available memory in the cache is quite low. In order to fit the asan_shadow section on as many platforms as possible, we have to make efficient use of the limited memory available. Thankfully, to a large extent, this problem was solved by our GCC patch which allowed us to append the shadow memory buffer to the region already occupied by the coreboot program. (I ran Jenkins builder and it built successfully for all boards except for the ones that hold either Braswell SoC or i440bx northbridge where the cache area got full and thus couldn’t fit the asan_shadow section.)

In the first stage, we have enabled ASan in romstage for QEMU, Haswell and Apollolake platforms as they have been tested.

Further, the results of Jenkins builder indicate that, with the current translation function, the support for ASan in romstage can be added on all x86 platforms except the two mentioned above. Therefore, I’ve asked everyone in the community to participate in the testing of ASan so that this debugging tool can be made available on as many platforms as possible before GSoC ends.


Global variables

When we initially added support for ASan in ramstage, it wasn’t able to detect out-of-bounds bugs in case of global variables. After debugging the code, I found that the redzones for the global variables were not poisoned. So, I went through GCC’s ASan and Linux’s KASAN implementation again and realized that the way in which the compiler protects global variables was very much different from its instrumentation for stack variables.

Instead of padding the global variables directly with redzones, it inserts constructors invoking the library function named __asan_register_globals to populate the relevant shadow memory regions. To this function, compiler also passes an instance of the following type:

struct asan_global {
       /* Address of the beginning of the global variable. */
       const void *beg;

       /* Initial size of the global variable. */
       size_t size;

       /* Size of the global variable + size of the red zone. This
          size is 32 bytes aligned. */
       size_t size_with_redzone;

       /* Name of the global variable. */
       const void *name;

       /* Name of the module where the global variable is declared. */
       const void *module_name;

       /* A pointer to struct that contains source location, could be NULL. */
       struct asan_source_location *location;
     }

So, to enable the poisoning of global variables’ redzones, I created a function named asan_ctors which calls these constructors at runtime and added it to ASan initialization code for the ramstage.

You may wonder why asan_ctors() is only added to ramstage? This is because the use of global variables is prohibited in coreboot for romstage and thus there is no need to detect global out-of-bounds bugs.


Next steps..

In the third coding period, I’ve started working on adding support for ASan to memory functions like memset, memmove and memcpy. I’ll push the patch on Gerrit pretty soon.

Once, this is done, I’ll start writing documentation on ASan answering questions like how to use ASan, what kind of bugs can be detected, what devices are currently supported, and how ASan support can be added to other architectures like ARM or RISC-V.

[ASan patches]

[GSoC] Address Sanitizer, Part 1

Hello everyone. My name is Harshit Sharma (hst on IRC). I am working on the project to add the “Address Sanitizer” feature to coreboot as a part of GSoC 2020. Werner Zeh is my mentor for this project and I’d like to thank him for his constant support and valuable suggestions.

It’s been a fun couple of weeks since I started working on this project. Though I found the initial few weeks quite challenging, I am glad that I was able to get past that and learned some amazing stuff I’d cherish for a long time.

Also, being a student, I find it incredible to have got a chance to work with and learn from such passionate, knowledgeable, and helpful people who are always available over IRC to assist.

A few of you may already know about the progress we’ve made through Werner’s message on the mailing list. This is a much comprehensive report on the work we had done prior to the first evaluation period.


What is Address Sanitizer ?

Address Sanitizer (ASan) is a runtime memory debugging tool, designed to find out of bounds and use after free memory bugs. It is a part of toolset Google has developed which also includes Undefined Behaviour Sanitizer (UBSan), a Thread Sanitizer (TSan), and a Leak Sanitizer (LSan).

This tool would help in improving code quality and thus would make our runtime code more robust.


Compile with ASan

The firstmost task was to ensure that coreboot compiles without any errors after Asan is enabled. For this, we created dummy functions and added the relevant GCC flags to build coreboot with ASan. We also introduced a Kconfig option (currently only available on x86 architecture) to enable this feature. (Patch [1])


Porting code from Linux

The design of ASan in coreboot is based on its implementation in Linux kernel, also known as Kernel Address Sanitizer (KASAN). However, coreboot differs a lot from Linux kernel due to multiple stages and that is what poses a challenge.


Design of ASan

ASan uses compile-time instrumentation that adds a run-time check before every memory instructions.

The basic idea is to encode the state of each 8 aligned bytes of memory in a byte in the shadow region. As a consequence, the shadow memory is allocated to 1/8th of the available memory. The instrumentation of the compiler adds __asan_load and __asan_store function calls before memory accesses which seeks the address of the corresponding shadow memory and then decides if access is legitimate depending on the stored state.

If the access is not valid, it throws an error telling the instruction pointer address, the address where a bug is found and whether it was read or write.


Need for GCC patch

Unlike Linux kernel which has a static shadow memory layout, coreboot has multiple stages with very different memory maps. Thus we require different shadow offset addresses for each stage.

Unfortunately, GCC presently, only supports using a static shadow offset address which is specified at compile time using -fasan-shadow-offset flag.

So, we came up with a GCC patch that introduces a feature to use a dynamic shadow offset address. Through this, we enable GCC to determine shadow offset address at runtime using a callback function __asan_shadow_offset().

Though we’ve presented our patch to GCC developers to convince them to include this change in their upcoming version, for now, ASan on coreboot only works if this patch is applied. (Patches [2] and [3])


Allocating and initializing shadow buffer

Instead of allocating a shadow buffer for the whole memory region, we decided to only define shadow memory for data and heap sections. Since we don’t want to run ASan for hardware mapped addresses, this way we save a lot of memory space.

Once the shadow region is allocated by the linker, the next task is to initialize it as early as possible at runtime. This is done by a call to asan_init() which unpoisons i.e. sets the shadow memory corresponding to the addresses in the data and heap sections to zero.


ASan support for ramstage

We decided to begin in a comparatively simpler stage i.e. ramstage. Since ramstage uses DRAM, the implementation is common across all platforms of a particular architecture. Moreover, ramstage provides enough room in the memory to place our shadow region.

For now, we have only enabled this feature for x86 architecture and it is able to detect stack out of bounds and use after scope bugs. Though the patches are still in the review stage, we did a test on QEMU and siemens mc_apl3 (Apollo Lake based) and so far it looks good to us. (Patch [4])


Next steps..

In the second coding period, we’ve started working on adding ASan to romstage. I will push a change on Gerrit once we make some considerable progress.


We further encourage everyone in the community to review the patches and test this feature on their hardware for better test coverage. The more feedback we get, the more are the chances to have a high-quality debugging tool.

[GSoC] Wrap-up for Adding QEMU/AArch64 Support to Coreboot

Hello, I’m Asami. It is the time for the final evaluation of GSoC 2019 now. My project is adding QEMU/AArch64 support to coreboot. It would help developers for compatibility testing and make sure that changes to architecture code don’t break current implementations for ARMv8.

Here is my work on Gerrit. CL 33387 is the main patch for this project and it is successfully merged. We can now run coreboot on QEMU/AArch64.

What I’ve Done for My Main Project

Firstly I made a new directory qemu-aarch64 in src/mainboard/emulation/ and I was basically working on in it. In the bootblock stage, I have written custom bootblock code in assembly because an ARM virt machine doesn’t have SRAM, which means I had to relocate code inside ROM to DRAM during the first stage. I also made a memory layout with reference to the QEMU implementation. In the romstage and the ramstage, I registered a custom handler for detecting DRAM size because AArch64 throws a Synchronous External Abort that happens when you try to access something that is not memory.

I wrote documentation for how to use coreboot with QEMU/AArch64. Here is the page: https://doc.coreboot.org/mainboard/emulation/qemu-aarch64.html

Future Work

ARMv8 architecture has an integrated LinuxBios as a payload. However, I can’t run coreboot with it yet. It now causes an error while building. I also need to make sure that coreboot works well with ARM Trusted Firmware. So, I’d like to solve problems and I hope to see “Hello World” with LinuxBios.

Conclusion

I believe I almost succeed in my project. The main code has already merged and I also solved small problems found while working on the main project. The members of coreboot are really great and they, especially my mentor and reviewers, helped me a lot. It was not an easy project for me because I had never experienced to work on coreboot and I had to know the basic code flow of it. I’ve read much code and it made me grown up. Thank you all of the members for such an excellent time.

My Previous Blog Posts

[GSoC] Coreboot Coverity, Final Update

It is now the final week of GSoC, and it is time for me to write my final blog post. Over the past summer I have worked on fixing the Coverity scan issues in coreboot, with the goal of making the code base “Coverity clean”. This has involved writing a substantial number of patches, the vast majority of which are in coreboot, with a sprinkling in a few other projects:

  • 146 patches in coreboot
  • 6 patches in flashrom
  • 6 patches in vboot
  • 3 patches in Chromium EC
  • 4 patches in OpenSBI
  • 2 patches in em100
  • 1 in the Linux kernel

At the time of writing, a few of my patches are still under review on Gerrit, so it is possible (and hopeful!) that this list will increase over the next few weeks.

In total, these patches resolved 172 Coverity issue reports of actual bugs. However, Coverity also isn’t always right, and some issues weren’t actually problems that required patches. These issues, 91 in total, were either false positives or intentional and were ignored. At the moment, there are currently 223 remaining reports in the issue tracker for coreboot. Despite being a substantial number, this is almost entirely composed of issues from third-party projects (such as OpenSBI or vboot, which probably shouldn’t be counted in the coreboot tracker anyway), and the AMD vendorcode. The original plan at the beginning of the summer was to work on the AMD vendorcode; however, after discussion with my mentors we decided to skip it, since with the upcoming deprecations for coreboot 4.11 it might not be around much longer. Aside from this, there are roughly 20 remaining issues, which mostly required refactoring or technical knowledge that I don’t have.

With the summary out of the way, I’d like to give everyone a sample of the sort of bugs I’ve worked on during the project, and hopefully give advice for avoiding them in the future. Here is a list of the most common, nasty, or subtle types of bugs I’ve found over the summer.

Missing Break Statements

In switch statements in C, every case statement implicitly falls through to the next one. However, this is almost never the desired behavior, and so to avoid this every case needs to be manually terminated by a break to prevent the fall-through. This unfortunately is very tedious to do and is often accidentally left out. For a prototypical example, let’s look at CB:32180 from the AGESA vendorcode.

switch (AccessWidth) {
        case AccessS3SaveWidth8:
                RegValue = *(UINT8 *) Value;
                break;
        case AccessS3SaveWidth16:
                RegValue = *(UINT16 *) Value;
                break;
        case AccessS3SaveWidth32:
                RegValue = *(UINT32 *) Value;
        default:
                ASSERT (FALSE);
}

In this switch there is a missing break after the AccessS3SaveWidth32 case, which will then fall-through to the false assertion. Clearly not intentional! Other examples of this, though not as severe, can be found in CB:32088 and CB:34293. Fortunately, these errors today can be prevented by the compiler. GCC recently added the -Wimplicit-fallthrough option, which will warn on all implicit fall throughs and alert to a potentially missing break. However, some fall throughs are intentional, and these can be annotated by a /* fall through */ comment to silence the warning. Since CB:34297 and CB:34300 this warning has been enabled in coreboot, so this should be the last we see of missing break statements.

Off-by-One Errors

There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.

Anonymous

Everyone has been bitten by off-by-one errors. Let’s take a look at CB:32125 from the Baytrail graphics code.

static void gfx_lock_pcbase(struct device *dev)
{
        const u16 gms_size_map[17] = { 0, 32, 64, 96, 128, 160, 192, 224, 256,
                                       288, 320, 352, 384, 416, 448, 480, 512 };
        ...
        u32 gms, gmsize, pcbase;
        gms = pci_read_config32(dev, GGC) & GGC_GSM_SIZE_MASK;
        gms >>= 3;
        if (gms > ARRAY_SIZE(gms_size_map))
                return;
        gmsize = gms_size_map[gms];
        ...
}

Here we have an array gms_size_map of 17 elements, and a bounds check on the gms variable before it is used to index into the array. However, there’s a problem. The bounds check misses the case when gms == ARRAY_SIZE(gms_size_map) == 17, which is one past 16 – the index of the last array element. The fix is to use >= in the check instead of >. This exact error when performing a bounds check is very common: see at least CB:32244, CB:34498, and CL:1752766 for other examples.

Another nasty place where off-by-one errors strike is with strings – in particular, when making sure they are null terminated. Here is CB:34374 from the ACPI setup of the Getac P470.

static long acpi_create_ecdt(acpi_ecdt_t * ecdt)
{
        ...
        static const char ec_id[] = "\_SB.PCI0.LPCB.EC0";
        ...
        strncpy((char *)ecdt->ec_id, ec_id, strlen(ec_id));
        ...
}

The problem is that strncpy() will only copy at most strlen(ec_id) characters, which excludes the null character. The author might have been thinking of the similar strlcpy(), which does explicitly null terminate the string buffer even if it never reaches a null character. In this case none of the string-copying functions are needed, since ec_id is a string buffer and so can be copied using a simple memcpy().

Boolean vs Bitwise Operators

In C, all integers are implicitly convertible to boolean values and can be used with all boolean operators. While somewhat convenient, this also makes it very easy to mistakenly use a boolean operator when a bitwise one was intended. Let’s take a look at CB:33454 from the CIMX southbridge code.

void sb_poweron_init(void)
{
        u8 data;
        ...
        data = inb(0xCD7);
        data &= !BIT0;
        if (!CONFIG(PCIB_ENABLE)) {
                data |= BIT0;
        }
        outb(data, 0xCD7);
        ...
}

Here BIT0 is the constant 0x1, so !BIT0 expands to 0, with the net effect of data being completely cleared, regardless of the previous value from inb(). The intended operator to use was the bitwise negation ~, which would only clear the lowest bit. For more examples of this sort of bug, see CB:34560 and OpenSBI 3f738f5.

Implicit Integer Conversions

C allows implicit conversions between all integer types, which opens the door for many accidental or unintentional bugs. For an extremely subtle example of this, let’s take a look at OpenSBI 5e4021a.

void *sbi_memset(void *s, int c, size_t count);

void sbi_fifo_init(struct sbi_fifo *fifo, void *queue_mem,
                   u16 entries, u16 entry_size)
{
        ...
        sbi_memset(fifo->queue, 0, entries * entry_size);
}

Do you see the problem? The issue is that entries and entry_size are both 16-bit integers, and by the rules of C are implicitly converted to int before the multiplication. An int cannot hold all possible values of a u16 * u16, and so if the multiplication overflows the intermediate result could be a negative number. On 64-bit platforms size_t will be a u64, and the negative result will then be sign-extended to a massive integer. As the last argument to sbi_memset(), this could lead to a very large out-of-bounds write. The solution is to cast one of the variables to a size_t before the multiplication, which is wide enough to prevent the implicit promotion to int. For other examples of this problem, see CB:33986 and CB:34529.

Another situation where implicit conversions strike is in error handling. Here is CB:33962 in the x86 ACPI code.

static ssize_t acpi_device_path_fill(const struct device *dev, char *buf,
                                     size_t buf_len, size_t cur);

const char *acpi_device_path_join(const struct device *dev, const char *name)
{
        static char buf[DEVICE_PATH_MAX] = {};
        size_t len;
        if (!dev)
                return NULL;

        /* Build the path of this device */
        len = acpi_device_path_fill(dev, buf, sizeof(buf), 0);
        if (len <= 0)
                return NULL;
        ...
}

With the function prototype right there, the problem is obvious: acpi_device_path_fill() returns negative values in a ssize_t to indicate errors, but len is a size_t, so all those negative error values are converted to extremely large integers, thus passing the subsequent error check. During code review this may not at all be obvious though.

Both these errors could be prevented using the -Wconversion compiler option, which will warn about all implicit integer conversions. However, there are an incredible number of such conversions in coreboot, and it would be a mammoth task to fix them all.

Null Pointers

Null pointers need no introduction – they are well known to cause all sorts of problems. For a simple example, let’s take a look at CB:33134 from the HiFive Unleashed mainboard.

static void fixup_fdt(void *unused)
{
        void *fdt_rom;
        struct device_tree *tree;
        
        /* load flat dt from cbfs */
        fdt_rom = cbfs_boot_map_with_leak("fallback/DTB", CBFS_TYPE_RAW, NULL);

        /* Expand DT into a tree */
        tree = fdt_unflatten(fdt_rom);
        ...
}

This code attempts to load a device tree from a location in the CBFS. However, cbfs_boot_map_with_leak() will return a null pointer if the object in the CBFS can’t be found, which will then be dereferenced in the call to fdt_unflatten(). On most systems dereferencing a null pointer will lead to a segfault, since the operating system has set up permissions that prevent accessing the memory at address 0. However, coreboot runs before the operating systems has even started, so there are no memory permissions at all! If fdt_rom is a null pointer, fdt_unflatten() will attempt to expand the device tree from whatever memory is at address 0, leading to who knows what problems. A simple null check will avoid this, but requires the programmer to always remember to put them in.

Another common issues with null pointers is that even if you do a check, it might not actually matter if the pointer has already been dereferenced. For example, here is a problem with the EDID parser in CB:32055.

int decode_edid(unsigned char *edid, int size, struct edid *out)
{
        ...
        dump_breakdown(edid);

        memset(out, 0, sizeof(*out));

        if (!edid || memcmp(edid, "\x00\xFF\xFF\xFF\xFF\xFF\xFF\x00", 8)) {
                printk(BIOS_SPEW, "No header found\n");
                return EDID_ABSENT;
        }
        ...
}

In this case the EDID is dumped doing the null pointer check, but at worst there should only be a wonky dump if edid is null, right? Not necessarily. Since dereferencing a null pointer is undefined behavior, the compiler is allowed to assume that no null pointer dereferences occur in the program. In this case, dereferencing the edid pointer in dump_breakdown() is an implicit assertion that edid is not null, so an over-zealous compiler could remove the following null check! This optimization can be disabled using -fno-delete-null-pointer-checks (which is done in coreboot), but does not prevent any problems that could have happened in the null dereference before the check took place. See this article in LWN for details on how a vulnerability from this problem was dealt with in the Linux kernel.

Conclusion

C has always had the mantra of “trust the programmer”, which makes mistakes and errors very easy to do. Some of these errors can be prevented at compile time using compiler warnings, but many cannot. Coverity and other static analyzers like it are very useful and powerful tools for catching bugs that slip past the compiler and through peer review. However, it is no silver bullet. All of these errors were present in production code that were only caught after the fact, and there are certainly bugs of this sort left that Coverity hasn’t found. What do we do about them, and how can we ever be sure that we’ve caught them all? Today, there are new languages designed from the beginning to enable safe and correct programming. For example, libgfxinit is written in SPARK, a subset of Ada that can be formally verified at compile time to avoid essentially all of the above errors. There is also the new oreboot project written in Rust, which has similar compile time guarantees due to its extensive type system. I hope to see these languages and others increasingly used in the future so that at some point, this job will have become obsolete. 🙂

[GSoC] How to Use ARM Trusted Firmware in Coreboot

Hello, I’m Asami. In this article, I’m going to talk about how to use ARM Trusted Firmware in coreboot for ARMv8 (AArch64). ARM Trusted Firmware provides a reference implementation of secure world software for Armv8-A and Armv8-M. You can see the code via https://github.com/ARM-software/arm-trusted-firmware.

Trusted Firmware has 5 steps which are called as BL1, BL2, BL3-1, BL3-2, and BL3-3. BL1 step is similar to the bootblock stage and BL2 is similar to the romstage of coreboot. In the coreboot project, we only use the BL3-1 part that is expected to work on EL3 exception level. The code of BL3-1 will execute just after the ramstage and before the payload when we enable the Trusted Firmware.

How to enable Trusted Firmware

It’s very easy to enable Trusted Firmware on coreboot. You just need to ‘select ARM64_USE_ARM_TRUSTED_FIRMWARE’ in your Kconfig. If you want to run coreboot on QEMU/AArch64, you need to add the ‘select ARM64_USE_ARM_TRUSTED_FIRMWARE’ at src/mainboard/emulation/qemu-aarch64/Kconfig. The next step switches depending on the configuration at src/arch/arm64/boot.c. Here is the code to switch the next step:

// src/arch/arm64/boot.c
static void run_payload(struct prog *prog)
{
	void (*doit)(void *);
	void *arg;

	doit = prog_entry(prog);
	arg = prog_entry_arg(prog);
	u64 payload_spsr = get_eret_el(EL2, SPSR_USE_L);

	if (CONFIG(ARM64_USE_ARM_TRUSTED_FIRMWARE))
		arm_tf_run_bl31((u64)doit, (u64)arg, payload_spsr);
	else
		transition_to_el2(doit, arg, payload_spsr);
}

Why using Trusted Firmware

Coreboot for ARMv8 has 2 options to pass an execution from it to a payload. The first is passing execution to a payload directly and the second one is passing to the BL3-1 code before a payload. You always don’t have to use Trusted Firmware. However, you need to enable Trusted Firmware if you want to run Linux because it expects to work with PSCI. PSCI is an abbreviation of Power State Coordination Interface which is a standard interface for power management that can be used by OS vendors for supervisory software working at different levels of privilege on an ARM device. Coreboot doesn’t have the setup for PSCI but Trusted Firmware does.

Current Status

Unfortunately, QEMU/AArch64 in coreboot doesn’t support Trusted Firmware yet. It means we can’t run Linux with QEMU for ARMv8. I’m now trying to support Trusted Firmware for QEMU/AArch64.

[GSoC] Coreboot Coverity, weeks 11-12

Hello again! For the past two weeks I have been working on Coverity issues in various third party repositories, notably flashrom and vboot. The majority of issues in both repositories are now fixed, with the remaining ones mostly being memory leaks. Also, support for OpenSBI (a RISC-V supervisor binary interface) was recently added to coreboot, and Coverity picked up several issues in that. With one more week left in the summer, the plan now is to tidy up the last few remaining issues in vboot, and shepherd my still in-progress patches through review. As usual, you can see the current status of my patches on Gerrit.

Following up from my post last week about neutralizing the ME, this week I decided to try replacing SeaBIOS with GRUB. While originally designed as a bootloader, GRUB can be compiled as a coreboot payload, which removes the need of having a BIOS or UEFI interface at all. Not only does this mean having a legacy-free boot process, but GRUB also has an incredible amount of flexibility over the previous systems. For example, one problem with current full disk encryption schemes is that the bootloader must always remain unencrypted, since factory BIOS/UEFI implementations are unable of unlocking encrypted disks. By embedding GRUB in the coreboot image we can move it off the hard drive and avoid this problem entirely.

With this in mind, I decided to try installing GRUB on a system with a single LUKS 1 partition formatted with Btrfs. (GRUB currently doesn’t support LUKS 2, which is what many distributions use by default.) This setup will require two configuration files: one embedded in the coreboot image with instructions on how to unlock the partition, and one on the hard drive for booting the kernel. The first configuration file has to be hand-written, while the second can be generated using grub-mkconfig -o /boot/grub/grub.cfg. The advantage of this two-stage system is that the first configuration file only has to be flashed once, and then all subsequent changes (e.g. adding kernel parameters) can be done with the system file. After tinkering around with the official GRUB documentation, the coreboot wiki, and this blog post, I pieced together the following bare-bones config file for the payload:

# Tell GRUB where to find its modules
set prefix=(memdisk)/boot/grub

# Keep the same screen resolution when switching to Linux
set gfxpayload=keep

# I'm not exactly sure what these do, but they look important
terminal_output --append cbmemc
terminal_output --append gfxterm

# Default to first option, automatically boot after 3 seconds
set default="0"
set timeout=3

# Enable the pager when viewing large files
set pager=1

menuentry 'Load Operating System' {

        # Load the LUKS and Btrfs modules
        insmod luks
        insmod btrfs

        # Unlock all crypto devices (should only be one)
        cryptomount -a

        # Load the system GRUB file from the first crypto device
        set root=(crypto0)
        configfile /boot/grub/grub.cfg
}

Using menuconfig, we can now configure coreboot to use GRUB as a payload.

Payload ---> Add a Payload ---> GRUB2
        ---> GRUB2 version ---> 2.04
        ---> Extra modules to include in GRUB image
             (luks btrfs gcry_sha256 gcry_rijndael all_video cat)
             [*] Include GRUB2 runtime config file into ROM image
                 (grub.cfg)

We need to add a few modules to the GRUB image to enable all the features we want: luks and btrfs are self-explanatory, gcry_sha256 and gry_rijndael add support for SHA256 and AES (the default hash and cipher for LUKS), all_video adds support for more video and graphics drivers, and cat is useful for printing config files during debugging. We also set here the path to the above grub.cfg file, which by default is in the root of the coreboot tree.

After flashing the new coreboot image, you will be greeted by two GRUB menus after booting: the above one for unlocking the partition, and then the system one for booting the kernel.

The above GRUB config file is extremely basic, and although all advanced functionality can be done manually by dropping to the GRUB command line, this is very inconvenient. Let’s add two menu entries to the config file for shutting down and rebooting the system. Fortunately, this can be done without rebuilding the ROM from scratch. By default, the GRUB config file is stored as etc/grub.cfg in the CBFS (coreboot file system), which is used to store objects in the coreboot image. We can print the layout of the CBFS using the cbfstool utility:

$ cbfstool coreboot.rom print
FMAP REGION: COREBOOT
Name                           Offset     Type           Size   Comp
cbfs master header             0x0        cbfs header        32 none
fallback/romstage              0x80       stage           58068 none
cpu_microcode_blob.bin         0xe3c0     microcode      122880 none
fallback/ramstage              0x2c440    stage          105072 none
config                         0x45f00    raw               488 none
revision                       0x46140    raw               674 none
cmos.default                   0x46440    cmos_default      256 none
vbt.bin                        0x46580    raw              1412 LZMA (3863 decompressed)
cmos_layout.bin                0x46b40    cmos_layout      1808 none
fallback/postcar               0x472c0    stage           18540 none
fallback/dsdt.aml              0x4bb80    raw             15257 none
fallback/payload               0x4f780    simple elf     470937 none
etc/grub.cfg                   0xc2780    raw               670 none
(empty)                        0xc2a80    null          7523608 none
bootblock                      0x7ef7c0   bootblock        1512 none

This shows all the objects located in the image, including the stages (bootblock, romstage, ramstage, and payload) and various other configuration files and binaries. Next, we extract the existing config file from this image:

$ cbfstool coreboot.rom extract -n etc/grub.cfg -f grub.cfg

Now add the following two entries to the end of the extracted file:

menuentry 'Poweroff' {
        halt
}

menuentry 'Reboot' {
        reboot
}

Now we delete the existing configuration file, and then add our new one back in:

$ cbfstool coreboot.rom remove -n etc/grub.cfg
$ cbfstool coreboot.rom add -n etc/grub.cfg -f grub.cfg -t raw

That’s it! Flash back the new ROM, and you’re good to go.

Even with these changes, we still have a very basic configuration file. For more advanced setups (such as booting from a live USB, LVM support, signing kernels with GPG, etc.) I recommend looking at the comprehensive Libreboot GRUB file and its hardening guide.