
Embracing a New Era: 9elements and AMD Collaborate to Bring Open-Source Firmware to 4th Gen EPYC Server Systems

The 4.19 release was completed on the 16th of January 2023.
Since the last release, the coreboot project has merged over 1600 commits from over 150 authors. Of those authors, around 25 were first-time committers to the coreboot project.
As always, we are very grateful to all of the contributors for helping to keep the project going. The coreboot project is different from many open source projects in that we need to keep constantly updating the codebase to stay relevant with the latest processors and technologies. It takes constant effort to just stay afloat, let alone improve the codebase. Thank you very much to everyone who has contributed, both in this release and in previous times.
The 4.20 release is planned for the 20th of April, 2023.
The coreboot build system automatically adds a ‘config’ file to CBFS that lists the exact Kconfig configuration that the image was built with. This is useful to reproduce a build after the fact or to check whether support for a specific feature is enabled in the image.
This file has been generated using the ‘savedefconfig’ Kconfig command, which generates the minimal .config file that is needed to produce the required config in a coreboot build. This is fine for reproduction, but bad when you want to check if a certain config was enabled, since many options get enabled by default or pulled in through another option’s ‘select’ statement and thus don’t show up in the defconfig.
Instead coreboot now includes a larger .config instead. In order to save some space, all of the comments disabling options are removed from the file, except for those included in the defconfig.
We can also LZMA compress the file since it is never read by firmware itself and only intended for later re-extraction via cbfstool, which always has LZMA support included.
Until recently, coreboot still contained lots of code using the legacy ASL syntax. However, all ASL code was ported over to make use of the ASL 2.0 syntax and from this point on new ASL code should make use of it.
Intel Icelake is unmaintained and the only user of this platform ever was the Intel CRB (Customer Reference Board). From the looks of the code, it was never ready for production as only engineering sample CPUIDs are supported.
Intel Icelake code will be removed following 4.19 and any maintenance will be done on the 4.19 branch. This consists of the Intel Icelake SoC and Intel Icelake RVP mainboard.
The SoC Intel Quark is unmaintained and different efforts to revive it failed. Also, the only user of this platform ever was the Galileo board.
Thus, to reduce the maintenance overhead for the community, support for the following components will be removed from the master branch and will be maintained on the release 4.20 branch.
Issues from the coreboot bugtracker: https://ticket.coreboot.org/
# | Subject |
449 | ThinkPad T440p fail to start, continuous beeping & LED blinking |
448 | Thinkpad T440P ACPI Battery Value Issues |
446 | Optiplex 9010 No Post |
445 | Thinkpad X200 wifi issue |
439 | Lenovo X201 Turbo Boost not working (stuck on 2,4GHz) |
427 | x200: Two battery charging issues |
414 | X9SAE-V: No USB keyboard init on SeaBIOS using Radeon RX 6800XT |
412 | x230 reboots on suspend |
393 | T500 restarts rather than waking up from suspend |
350 | I225 PCIe device not detected on Harcuvar |
327 | OperationRegion (OPRG, SystemMemory, ASLS, 0x2000) causes BSOD |
The 4.18 release was quite late, but was completed on October 16, 2022.
In the 4 months since the 4.17 release, the coreboot project has merged more than 1800 commits from over 200 different authors. Over 50 of those authors submitted their first patches.
Welcome and thank you to all of our new contributors, and of course the work of all of the seasoned contributors is greatly appreciated.
Currently we only have runtime mechanisms to assign device operations to a node in our devicetree (with one exception: the root device). The most common method is to map PCI IDs to the device operations with a struct pci_driver
. Another accustomed way is to let a chip driver assign them.
For very common drivers, e.g. those in soc/intel/common/blocks/, the PCI ID lists grew very large and are incredibly error-prone. Often, IDs are missing and sometimes IDs are added almost mechanically without checking the code for compatibility. Maintaining these lists in a central place also reduces flexibility.
Now, for onboard devices it is actually unnecessary to assign the device operations at runtime. We already know exactly what operations should be assigned. And since we are using chipset devicetrees, we have a perfect place to put that information.
This patch adds a simple mechanism to sconfig
. It allows us to speci- fy operations per device, e.g.
device pci 00.0 alias system_agent on ops system_agent_ops end
The operations are given as a C identifier. In this example, we simply assume that a global struct device_operations system_agent_ops
exists.
Historically, ChromeOS devices have worked around the problem of OEMs using several different parts for touchpads/touchscreens by using a ChromeOS kernel-specific ‘probed’ flag (rejected by the upstream kernel) to indicate that the device may or may not be present, and that the driver should probe to confirm device presence.
Since release 4.18, coreboot supports detection for i2c devices at runtime when creating the device entries for the ACPI/SSDT tables, rendering the ‘probed’ flag obsolete for touchpads. Switch all touchpads in the tree from using the ‘probed’ flag to the ‘detect’ flag.
Touchscreens require more involved power sequencing, which will be done at some future time, after which they will switch over as well.
Firmware is typically delivered as one large binary image that gets flashed. Since this final image consists of binaries and data from a vast number of different people and companies, it’s hard to determine what all the small parts included in it are. The goal of the software bill of materials (SBOM) is to take a firmware image and make it easy to find out what it consists of and where those pieces came from.
Basically, this answers the question, who supplied the code that’s running on my system right now? For example, buyers of a system can use an SBOM to perform an automated vulnerability check or license analysis, both of which can be used to evaluate risk in a product. Furthermore, one can quickly check to see if the firmware is subject to a new vulnerability included in one of the software parts (with the specified version) of the firmware.
Further reference: https://web.archive.org/web/20220310104905/https://blogs.gnome.org/hughsie/2022/03/10/firmware-software-bill-of-materials/
The following are changes across a number of patches, or changes worth noting, but not needing a full description.
coreboot uses TianoCore interchangeably with EDK II, and whilst the meaning is generally clear, it’s not the payload it uses. Consequentially, TianoCore has been renamed to EDK II (2).
The option to use the already deprecated CorebootPayloadPkg has been removed.
Recent changes to both coreboot and EDK means that UefiPayloadPkg seems to work on all hardware. It has been tested on:
CorebootPayloadPkg can still be found here.
The recommended option to use is EDK2_UEFIPAYLOAD_MRCHROMEBOX
as EDK2_UEFIPAYLOAD_OFFICIAL
will no longer work on any SoC.
Legacy SMP init will be removed from the coreboot master branch immediately following this release. Anyone looking for the latest version of the code should find it on the 4.18 branch or tag.
This also includes the codepath for SMM_ASEG. This code is used to start APs and do some feature programming on each AP, but also set up SMM. This has largely been superseded by PARALLEL_MP, which should be able to cover all use cases of LEGACY_SMP_INIT, with little code changes. The reason for deprecation is that having 2 codepaths to do the virtually the same increases maintenance burden on the community a lot, while also being rather confusing.
Intel Icelake is unmaintained. Also, the only user of this platform ever was the Intel CRB (Customer Reference Board). From the looks of it the code was never ready for production as only engineering sample CPUIDs are supported. This reduces the maintanence overhead for the coreboot project.
Intel Icelake code will be removed with release 4.19 and any maintenence will be done on the 4.19 branch. This consists of the Intel Icelake SoC and Intel Icelake RVP mainboard.
The SoC Intel Quark is unmaintained and different efforts to revive it failed. Also, the only user of this platform ever was the Galileo board.
Thus, to reduce the maintanence overhead for the community, support for the following components will be removed from the master branch and will be maintained on the release 4.20 branch.
A couple of issues were discovered immediately following the release that will be fixed in a follow-on point release in the upcoming weeks.
A pair of changes (CB:67754 + CB:67662) merged shortly before the 4.18 release have created an issue on Intel Apollo Lake platform boards which prevents SMM/SMI from functioning; this affects only Apollo Lake (but not Gemini Lake) devices. A fix has been identified and tested and will be available soon.
Another issue applies to all Intel-based boards with onboard I2C TPMs when verified boot is not enabled. The I2C buses don’t get initialized until after the TPM, causing timeouts, TPM initialization failures, and long boot times.
GSoC 2022 coding period is about to come to an end this week. It has been an enriching 12 weeks of reading old code, designing algorithms and structures, coding, testing, and hanging out over IRC! I’d like to take this opportunity to present my work and details on how it has impacted Flashrom. 🙂
You can find the complete list of commits I made during GSoC with this gerrit query. Some of the patches aren’t currently merged and are under review. In any case, you are most welcome to join the review (which will likely be very helpful for me).
Definitions
Let me clarify some terminologies used hereafter
Need for this project
To change any 0 to a 1 in the contents of a NOR-flash chip, a full block of data needs to be erased. So to change a single byte, it is sometimes required to erase a whole block, which we’ll call erase overhead. Most flash chips support multiple erase-block sizes. Flashrom keeps a list of these sizes and ranges for each supported flash chip and maps them to internal erase functions. Most of these lists were sorted by ascending size. I added a simple test to ensure this and corrected the ones which gave errors.
Earlier, Flashrom tried all available erase functions for a flash chip during writes. Usually, only the first function was used, but if anything went wrong on the wire, or the programmer rejected a function, it falls back to the next. As the functions are sorted by size, this results in a minimum erase overhead.
However, if big portions of the flash contents are re-written, using the smallest erase block size results in transaction overhead, and also erasing bigger blocks at once often results in shorter erase times inside the flash chip.
So in a nutshell, the erase function selection in Flashrom was sequential, i.e. they started from the function with the smallest erase block and moved to the next one only in case of an error. This was inefficient in cases where bigger chunks of the flash had to be erased.
Current implementation
After rigorous discussion with my mentors and with the community, we decided on a look-ahead algorithm for selecting the erase functions or the erase blocks to be more precise. From the layout of erase blocks as seen by different erase functions, we make an optimal selection.
We start with the first erase function (remember this erase function has the smallest erase blocks) and mark those blocks which lie completely inside the region to be erased. Then moving on to subsequent erase layouts and marking those erase blocks that have more than half of the sub-blocks marked and then unmarking the marked sub-blocks. Finally, erase all the marked erase blocks. This strategy ensures that if we have a large contiguous region to be erased then we would use an appropriately large erase function to erase it.
The above-mentioned strategy had some challenges and required a few tweaks:
So with this, we were all set to begin the implementation of the algorithm. Or so I thought. Flashrom had stored information regarding the erase layout in a nested format which was very difficult to use so I had to flatten them out and fit them in a data structure that would give me quick information about sub-blocks as well. 😮💨
For a better understanding of the algorithm, I made the following flowchart.
Testing
Testing on hardware is difficult because you need all the different chips and programmers and test on each one of them to be sure that your code is working correctly. On top of that, identifying all possible scenarios adds to the burden. This definitely takes a lot of time and effort and my mentors and the community is helping a lot with it.
The current progress of testing can be seen here. Feel free to suggest more cases and help with the testing. It will definitely help a lot.
Future work
Definitely, the code or algorithm isn’t fully optimized and can be improved further. Also one can fix the progress bar feature to Flashrom, which got broken due to my code😅.
Acknowledgments
I’d like to thank Thomas Heijligen and Simon Buhrow for being excellent mentors, Nico for providing constant code reviews and giving regular inputs, and everyone in the Flashrom community for being helpful and friendly.
The coreboot 4.17 release was done on June 3, 2022.
Since the 4.16 release, we’ve had over 1300 new commits by around 150 contributors. Of those people, roughly 15 were first-time contributors.
As always, we appreciate everyone who has contributed and done the hard work to make the coreboot project successful.
These changes are a few that were selected as a sampling of particularly interesting commits.
Instead of having per stage x_CBMEM_INIT_HOOK, we now have only 2 hooks:
coreDOOM is a port of DOOM to libpayload, based on the doomgeneric source port. It renders the game to the coreboot linear framebuffer, and loads WAD files from CBFS.
This code was hard to read as it did too much and had a lot of state to keep track of.
It also looks like the staggered entry points were first copied and only later the parameters of the first stub were filled in. This means that only the BSP stub is actually jumping to the permanent smihandler. On the APs the stub would jump to wherever c_handler happens to point to, which is likely 0. This effectively means that on APs it’s likely easy to have arbitrary code execution in SMM which is a security problem.
Note: This patch fixes CVE-2022-29264 for the 4.17 release.
This code is much easier to read if one does not have to keep track of mutable variables.
This also fixes the alignment code on the TSEG smihandler setup code. It was aligning the code upwards instead of downwards which would cause it to encroach a part of the save state.
The sinkhole exploit exists in placing the lapic base such that it messes with GDT. This can be mitigated by checking the lapic MSR against the current program counter.
This removes the need for a tool to generate simple identity pages. Future patches will link this page table directly into the stages on some platforms so having an assembly file makes a lot of sense.
This also optimizes the size of the page of each 4K page by placing the PDPE_table below the PDE.
This change will allow the SMI handler to write to the cbmem console buffer. Normally SMIs can only be debugged using some kind of serial port (UART). By storing the SMI logs into cbmem we can debug SMIs using ‘cbmem -1’. Now that these logs are available to the OS we could also verify there were no errors in the SMI handler.
Since SMM can write to all of DRAM, we can’t trust any pointers provided by cbmem after the OS has booted. For this reason we store the cbmem console pointer as part of the SMM runtime parameters. The cbmem console is implemented as a circular buffer so it will never write outside of this area.
On platforms where the bootblock is not included in CBFS anymore because it is part of another firmware section (IFWI or a different CBFS), the CRTM measurement fails.
This patch adds a new function to provide a way at SoC level to measure the bootblock. Following patches will add functionality to retrieve the bootblock from the SoC related location and measure it from there. In this way the really executed code will be measured.
Add Platform Secure Boot (PSB) enablement via the PSP if it is not already enabled. Upon receiving psb command, PSP will program PSB fuses as long as BIOS signing key token is valid. Refer to the AMD PSB user guide doc# 56654, Revision# 1.00. Unfortunately this document is only available with NDA customers.
This patch implements coreboot native debug handler to manage the FSP event messages.
‘FSP Event Handlers’ feature introduced in FSP to generate event messages to aid in the debugging of firmware issues. This eliminates the need for FSP to directly write debug messages to the UART and FSP might not need to know the board related UART port configuration. Instead FSP signals the bootloader to inform it of a new debug message. This allows the coreboot to provide board specific methods of reporting debug messages, example: legacy UART or LPSS UART etc.
This implementation has several advantages as:
Note: Need to modify FSP source code to remove ‘MDEPKG_NDEBUG’ as compilation flag for release build and generate FSP binary with non-NULL FSP debug wrapper module injected (to allow FSP event handler to execute even with FSP non-serial image) in the final FSP.fd.
In order to abstract bus-specific logic from TPM logic, the prototype for two vendor-specific tis functions are added in this patch. tis_vendor_read() can be used to read directly from TPM registers, and tis_vendor_write() can be used to write directly to TPM registers.
This commit adds support for catching null dereferences and execution through x86’s debug registers. This is particularly useful when running 32-bit coreboot as paging is not enabled to catch these through page faults. This commit adds three new configs to support this feature: DEBUG_HW_BREAKPOINTS, DEBUG_NULL_DEREF_BREAKPOINTS and DEBUG_NULL_DEREF_HALT.
Add ‘detect’ flag which can be attached to devices which may or may not be present at runtime, and for which coreboot should probe the i2c bus to confirm device presence prior to adding an entry for it in the SSDT.
This is useful for boards which may utilize touchpads/touchscreens from multiple vendors, so that only the device(s) present are added to the SSDT. This relieves the burden from the OS to detect/probe if a device is actually present and allows the OS to trust the ACPI _STA value.
Flame graphs are used to visualize hierarchical data, like call stacks. Timestamps collected by coreboot can be processed to resemble profiler-like output, and thus can be feed to flame graph generation tools.
Generating flame graph using https://github.com/brendangregg/FlameGraph:
cbmem -S > trace.txt FlameGraph/flamegraph.pl --flamechart trace.txt > output.svg
This patch adds an option to disable loglevel prefixes. This patch helps to achieve clear messages when low loglevel is used and very few messages are displayed on a terminal. This option also allows to maintain compatibility with log readers and continuous integration systems that depend on fixed log content.
If the code contains: printk(BIOS_DEBUG, “This is a debug message!\n”) it will show as: [DEBUG] This is a debug message! but if the Kconfig contains: CONFIG_CONSOLE_USE_LOGLEVEL_PREFIX=n the same message will show up as This is a debug message!
Add an option to the cbmem utility that can be used to append an entry to the cbmem timestamp table from userspace. This is useful for bookkeeping of post-coreboot timing information while still being able to use cbmem-based tooling for processing the generated data.
-a | --add-timestamp ID: append timestamp with ID\n
The following are changes across a number of patches, or changes worth noting, but not needing a full description.
Intel Icelake is unmaintained. Also, the only user of this platform ever was the CRB board. From the looks of it the code never was ready for production as only engineering sample CPUIDs are supported.
Thus, to reduce the maintanence overhead for the community, it is deprecated from this release on and support for the following components will be dropped with the release 4.19.
As of release 4.18 (August 2022) we plan to deprecate LEGACY_SMP_INIT. This also includes the codepath for SMM_ASEG. This code is used to start APs and do some feature programming on each AP, but also set up SMM. This has largely been superseded by PARALLEL_MP, which should be able to cover all use cases of LEGACY_SMP_INIT, with little code changes. The reason for deprecation is that having 2 codepaths to do the virtually the same increases maintenance burden on the community a lot, while also being rather confusing.
No platforms in the tree have any hardware limitations that would block migrating to PARALLEL_MP / a simple !CONFIG_SMP codebase.
SBoM is “who built what from where”SBoM is here to change this and provides a list of used software parts that have been put together. Firmware consists of multiple parts, often lumped together as one binary. Even coreboot, as an open-source firmware projects, has to consume multiple closed-source binaries in order to work properly. Now the end user has an easy way to scroll through the SBoM and check in more detail what really is inside this binary blob called firmware.
coreboot pulls in SBOM templates, modifies those that they fit for needs, and stich them into the image.With the patchset that adds SBOM, we added three things to coreboot:
src/sbom
and builds coSWID files out of this. To generate these files, we developed our own tooling called goSWID
. The code can be found here, but it will also be checked into coreboot together with the patchset.
Within the build log from coreboot, we do see that we have our own sbom CBFS section now.
goswid
tooling.
Pluton (HSP) X86 Firmware Support Enable/Disable X86 firmware HSP related code path, including AGESA HSP module, SBIOS HSP related drivers. Auto - Depends on PcdAmdHspCoreEnable build value NOTE: PSP directory entry 0xB BIT36 have the highest priority. NOTE: This option will NOT put HSP hardware in disable state, to disable HSP hardware, you need setup PSP directory entry 0xB, BIT36 to 1. // EntryValue[36] = 0: Enable, HSP core is enabled. // EntryValue[36] = 1: Disable, HSP core is disabled then PSP will gate the HSP clock, no further PSP to HSP commands. System will boot without HSP."HSP" here means "Hardware Security Processor" - a generic term that refers to Pluton in this case. This is a configuration setting that determines whether Pluton is "enabled" or not - my interpretation of this is that it doesn't directly influence Pluton, but disables all mechanisms that would allow the OS to communicate with it. In this scenario, Pluton has its firmware loaded and could conceivably be functional if the OS knew how to speak to it directly, but the firmware will never speak to it itself. I took a quick look at the Windows drivers for Pluton and it looks like they won't do anything unless the firmware wants to expose Pluton, so this should mean that Windows will do nothing. So what about the reference to "PSP directory entry 0xB BIT36 have the highest priority"? The PSP is the AMD Platform Security Processor - it's an ARM core on the CPU package that boots before the x86. The PSP firmware lives in the same flash image as the x86 firmware, so the PSP looks for a header that points it towards the firmware it should execute. This gives a pointer to a "directory" - a list of different object types and where they're located in flash (there's a description of this for slightly older AMDs here). Type 0xb is treated slightly specially. Where most types contain the address of where the actual object is, type 0xb contains a 64-bit value that's interpreted as enabling or disabling various features - something AMD calls "soft fusing" (Intel have something similar that involves setting bits in the Firmware Interface Table). The PSP looks at the bits that are set here and alters its behaviour. If bit 36 is set, the PSP tells Pluton to turn itself off and will no longer send any commands to it. So, we have two mechanisms to disable Pluton - the PSP can tell it to turn itself off, or the x86 firmware can simply never speak to it or admit that it exists. Both of these imply that Pluton has started executing before it's shut down, so it's reasonable to wonder whether it can still do stuff. In the image I'm looking at, there's a blob starting at 0x0069b610 that appears to be firmware for Pluton - it contains chunks that appear to be the reference TPM2 implementation, and it broadly decompiles as valid ARM code. It should be viable to figure out whether it can do anything in the face of being "disabled" via either of the above mechanisms. Unfortunately for me, the system I'm looking at does set bit 36 in the 0xb entry - as a result, Pluton is disabled before x86 code starts running and I can't investigate further in any straightforward way. The implication that the user-controllable mechanism for disabling Pluton merely disables x86 communication with it rather than turning it off entirely is a little concerning, although (assuming Pluton is behaving as a TPM rather than having an enhanced set of capabilities) skipping any firmware communication means the OS has no way to know what happened before it started running even if it has a mechanism to communicate with Pluton without firmware assistance. In that scenario it'd be viable to write a bootloader shim that just faked up the firmware measurements before handing control to the OS. The bit 36 disabling mechanism seems more solid? Again, it should be possible to analyse the Pluton firmware to determine whether it actually pays attention to a disable command being sent. But even if it chooses to ignore that, if the PSP is in a position to just cut the clock to Pluton, it's not going to be able to do a lot. At that point we're trusting AMD rather than trusting Microsoft, but given that you're also trusting AMD to execute the code you're giving them to execute, it's hard to avoid placing trust in them. Overall: I'm reasonably confident that systems that ship with Pluton disabled via setting bit 36 in the soft fuses are going to disable it sufficiently hard that the OS can't do anything about it. Systems that give the user an option to enable or disable it are a little less clear in that respect, and it's possible (but not yet demonstrated) that an OS could communicate with Pluton anyway. However, if that's true, and if the firmware never communicates with Pluton itself, the user could install a stub loader in UEFI that mimicks the firmware behaviour and leaves the OS thinking everything was good when it absolutely is not. So, assuming that Pluton in its current form on AMD has no capabilities outside those we know about, the disabling mechanisms are probably good enough. It's tough to make a firm statement on this before I have access to a system that doesn't just disable it immediately, so stay tuned for updates.