[GSOC] Panic Room, week #4/5/6

What have you been up to these past few weeks?

My facetious alter ego, I must admit I sinned, I strayed oh so much from my original timeline. (Quite obvious if you read the previous posts)

Since my last post I mainly kept focusing on the region_device API, I attempted to go through with my week #3 plan to continue the phasing out of rdev_mmap_full() from the selfload code (described in previous post).

I think I overestimated what I could accomplish regarding that project, I tried to modify the current lzma API used inside to coreboot to make it possible to easily load each chunk of data from memory/chip and decompress it gradually.
Theoretically it shouldn’t be too difficult, in practice the decompression code is particularly nasty and it seems it is a customised version of the LZMA SDK code.
I, not so quickly, realised that I didn’t have the time to delve into that, so I momentarily dropped the idea. (It could be a nice project to do after the GSOC ends.)

I also spent a chunk of my time in porting the time stamps reading functionality from the cbmem utility into the coreinfo payload (commit).
It is now possible to read how much time the coreboot boot sequence takes without having a working OS environment.
I needed to measure if there had been any boot time improvements after the rdev_mmap_full commit. (Still haven’t got around to doing that though…)

What are you working on now then?

I’m currently working on porting the region_device API from coreboot to libpayload, meanwhile replacing all the functionality that was provided by its predecessor: cbfs_media.

Today I finished (in before my code has to be reworked completely ) replacing the old api inside libpayload and started converting the only payload that used cbfs_media: depthcharge, the payload used to boot ChromeOS.
Hopefully it will be a painless process.

What’s next?

So, after I finish these current patch (possibly for the weekend), I need to re-focus my attention on my actual timeline.
I plan to test the current FILO master branch to check for possible problems and show-stoppers before an eventual new release in the (near?) future.
It would be quite helpful in that the current stable is not stable at all, it doesn’t even compile and neither does the master tag (unless you apply this patch which is still pending approval).
It would also be the first release that includes the new flashupdate command and would allow for some interesting things to be done (i.e. path to the recovery from a bad coreboot flash/configuration).

Furthermore I would like to get the SerialICE firmware-side code to be merged into the main coreboot codebase (commit), so I plan to work out the current problems with the time that I still have at my disposal. (Uh, sounds a bit cliche’ by now)

See you next week time! (Just to be on the safe side 🙂 )

[GSoC] Better RISC-V support, week #4/5

Week 4

In week 4, I tracked down why coreboot halted after about one line of output. It turned out to be a spike bug, that I wrote up in this bug report, and affect any program that doesn’t have a tohost symbol. As a workaround, I extended my script that turns coreboot.rom into a ELF to also include this symbol.

After some more patches I could run coreboot in spike and get the familiar “Payload not loaded” line.

Week 5

I was now clearly moving towards being able to run linux on spike/coreboot. But there was a problem: The RISC-V linux port requires a working implementation of the Supervisor Binary Interface (SBI), which is a collection of functions that the supervisor (i.e. the linux kernel) can call in the system firmware.

Coreboot has an implementation of the SBI, but it’s probably outdated by now. To get an up-to-date SBI implementation, I decided to use bbl as a payload. When I built bbl with coreboot’s RISC-V toolchain, I noticed that it depends on a libc being installed in two ways:

  • The autoconf-generated configure script checks that the C compiler can compile and link a program, which only succeeds if it finds a linker script (riscv.ld) and a crt0.o in the right place.
  • bbl relies on the libc headers to declare some common functions and types (it doesn’t use any of the implementations in the libc, though).

The coreboot toolchain script doesn’t, however, install a libc, because coreboot doesn’t need one.

I tweaked the bbl source code until it didn’t need the libc headers, changed the implementation of mcall_console_putchar to use my 8250 UART, got the payload section of bbl (where linux is stored before it’s loaded) out of the way of the CBFS by moving it to 0x81000000 (bbl/bbl.lds is the relevant file for this change), and could finally observe Linux booting in spike, on top of coreboot and bbl. It stops with a kernel panic, though, because it doesn’t have a root filesystem.

Plans for this week

This week I will document my work on the Spike wiki page in the coreboot wiki, so others can run coreboot on spike, too.

[GSoC] Better RISC-V support, week #3

Last week, after updating GCC (by applying Iru Cai’s patch) and commenting out uses of outdated instructions and CSRs (most notably eret and the HTIF CSRs), I noticed that coreboot crashed when it tried to access any global variables. This was because the coreboot build system thought coreboot would live near the start of the address space.

I found spike-riscv/memlayout.ld, and adjusted the starting offset. But then I got a linker error:

build/bootblock/arch/riscv/rom_media.o: In function `boot_device_ro': [...]/src/arch/riscv/rom_media.c:26:(.text.boot_device_ro+0x0): relocation truncated to fit: _RISCV_HI20 against `.LANCHOR0'

I played around with the start address and noticed that addresses below 0x78000000 worked, but if I chose an address that was too close to 0x80000000, it broke. This is, in fact, because pointers to global variables were determined with an instruction sequence like lui a0, 0xNNNNN; addi a0, a0, 0xNNN. On 32-bit RISC-V, the LUI instruction loads its argument into the upper 20 bits of a register, and ADDI adds a 12-bit number. On a 64-bit RISC-V system, lui a0, 0x80000 loads 0xffffffff80000000 into a0, because the number is sign extended.

After disassembling some .o files of coreboot and the RISC-V proxy kernel, I finally noticed that I had to use the -mcmodel=medany compiler option, which makes data accesses pc-relative.

Now that coreboot finally ran and could access its data section, I finished debugging the UART block that I promised last week. Coreboot can now print stuff, but it stops running pretty soon.

Plans for this week

This week I will debug why coreboot hangs, and will hopefully get it to boot until the “Payload not found” line again, which worked with an older version of Spike.

Also, Ron Minnich will be giving a talk about coreboot on RISC-V at the coreboot convention in San Francisco, in a few hours.

[GSOC] Panic Room, week #2

How was your last week?

Let’s say that it was a bit unexpected.

I spent the majority of it trying to wrap my head around the ELF (Executable Linkable Format) specification.
I used this new acquired knowledge to improve the utility cbfstool and allow it to extract payloads contained inside a CBFS directly into ELF instead of SELF (commit).

In order to achieve this cbfstool has to do a few things:

  • Extract the payload from the coreboot image
  • Parse the segment table contained inside the SELF payload in order to find out how many and which segments are present.
  • Using the elf_writer API generate a compliant ELF header
  • Take the content from each segment and copy it to the correspondent ELF section header and configure it accordingly
  • Once the section table is filled out, use elf_writer to generate the program header table and write out the final ELF

The final results would allow to, for example, easily move payloads from a CBFS to another one without having to re-build the payload, coreboot rom or mess with the build system configuration.
Right now the implementation it’s not complete yet but it works quite well with a good chunk of the payloads commonly used with coreboot such as SeaBIOS, coreinfo, nvramcui and others.
The major hurdles right now are to get the GRUB payload to work and add a way to handle the extraction of a compressed payload.

Wait a minute! Weren’t you working on SerialICE?

You are quite the inquisitive type, aren’t you?

Yes, my main goal is still to continue integrating SerialICE and coreboot.
Unfortunately there have been a few showstoppers this week, first my only test clip broke and now my target, Lenovo x60, stopped working and I am no longer able to flash its BIOS chip.
I already ordered a replacement but it’ll probably take a bit more than a week to arrive.

In the meantime my mentor (adurbin) kindly pointed out the task above to keep me busy while waiting.

What are your plans for the next week?

I plan to finish implementing the functionality described above and test all the remaining payloads.
Hopefully I will also be able to start looking at some of the other tasks that have been suggested to me by my mentors.

That’s it for today, see you next week!

[GSoC] Multiple status registers, block protection and OTP support, week #1 and #2

Hi, I am Hatim Kanchwala (hatim on IRC) from India. I am the GSoC student working with flashrom this year. Stefan Tauner (stefanct) and David Hendricks (dhendrix) will be mentoring me (thanks a lot for the opportunity). The pre-midterm phase of my project comprises three sub-projects – multiple status registers, block protection and OTP support. Each of these projects deals with SPI flashchips.

As of writing this post, flashrom supports over 300 SPI flashchips. Around 10% have multiple status registers (most have two but there is one with three). Almost all have some sort of block protection in place. Around 40% have some variation of OTP or security registers. A combination of BP (Block Protect, first status register) and SRP bits (usually first, but sometimes second status register as well) in the status register determine the range and type of protection in effect. A few have a TB bit (Top/Bottom) in addition to BP bits. Some also have a CMP bit (Complement Protect, second or third status register) to add more flexibility to range available. Few chips have a WPS bit (Write Protect Scheme, second or third status register) that define which scheme of access protection is in use. Chips with security registers have corresponding LB bits (Lock Bits, second status register) which are one-time programmable and, when set, render the corresponding security register read-only. Chips with a separate OTP sector(s) have opcodes to enter/exit OTP mode and, within OTP mode usual read, page program and sector erase opcodes can be used.

Previously, flashrom could only read/write the first status register. For writes, all block protect bits were unset (this configuration corresponds to block protection), if the type of protection allowed it. Once unset, flashrom couldn’t revert the BP bit configuration. The ChromiumOS fork of flashrom has some support for locking/unlocking block access protection in place. A lot of the work is done around specific families of chips, but they are moving towards generalising it. For chips with OTP support, flashrom simply printed a warning.

In these two weeks I sifted through around 5-6 dozen datasheets and developed models for multiple status registers, block protection and OTP/security registers. I discussed with mentors and the community over mailing list (link to thread) the infrastructural changes and use cases corresponding to the models. To substantiate these ideas, I wrote separate prototype code. In the process, Stefan introduced me to a powerful tool, Coccinelle. This tool will make applying changes to the large struct flashchips easier while being safe. As a byproduct of studying existing flashrom infrastructure, I had the opportunity to explore the history of flashrom through git log – evolution of flashrom from its humble beginnings in coreboot/util to flash_and_burn to flash_rom to finally flashrom today!

My broad targets for the following few weeks will be to finish up with the pending dozen or two datasheets, polish the models and start transforming the prototype code into merge-worthy code. Following the infrastructure changes, I will update existing chips to make use of the new infrastructure, add support for a bunch of new chips and finally test on actual hardware.

Thanks. See you later!

[GSoC] Better RISC-V support, week #2

Last week, I updated my copy of spike (to commit 2fe8a17a), and familiarized myself with the differences between the old and the new version:

  • The Host-Target Interface (HTIF) isn’t accessed through the mtohost and mfromhost CSRs anymore. Instead, you have to define two ELF symbols (tohost and fromhost). Usually this is done by declaring two global variables with these names, but since the coreboot build system doesn’t natively produce an ELF file, it would get a little tricky.
  • Spike doesn’t implement a classic UART.
  • The memory layout is different. The default entry point is now at 0x1000, where spike puts a small ROM, which jumps to the start of the emulated RAM, at 0x80000000. One way to run coreboot is to load it at 0x80000000, but then it can’t catch exceptions: The exception vector is at 0x1010.
  • Within spike’s boot ROM, there’s also a text-based “platform tree”, which describes the installed peripherals.

“Why does coreboot need a serial console?”, you may ask. Coreboot uses it to log everything it does (at a configurable level of detail), and that’s quite useful for debugging and development.

Instead of working around the problems with HTIF, I decided to implement a minimal, 8250-compatible UART. I’m not done yet, but the goal is to use coreboot’s existing 8250 driver.

Plans for this week

This week, I will rewrite the bootblock and CBFS code to work with RISC-V’s new memory layout, and make sure that the spike UART works with coreboot’s 8250 UART driver. Booting Linux probably still takes some time.

[GSOC] Panic Room, week #1

Who are you?

Hello everyone, I’m Antonello Dettori (avengerf12 on IRC) and I’m the student currently working on improving SerialICE.

What are you working on?

I’m glad you asked.

As I said just a bunch of lines before I’m working on SerialICE, which is one of the main tools used in reverse engineering an OEM BIOS and therefore in understanding the initialisation process that coreboot will have to perform in order to properly run on a target.

The original idea of my proposal was to work towards:

  • Incorporating the functionality of SerialICE into coreboot.
  • Allowing for a way to flash a coreboot-running target without a working OS environment.

The situation has changed a bit in the few months after the proposal was written and part of the goals have already been worked on by some of the wonderful contributors in the coreboot community.
I still have plenty of work to do and my mentors already pointed out some of the areas of the project with which I could spend my time.

How was your first week?

Oh boy, you had to go there, didn’t you?

I’ve been kind of a late bloomer regarding this project since only from this week I came to truly appreciate all of the work that goes into making coreboot and SerialICE tick.
I’m therefore still knee-deep in the learning process, but don’t worry, progress is being made on this front.
Unfortunately, this also means that I don’t have any actual code to reach my goals yet.

What will you do during the next week?

I will, hopefully, manage to wrap up my learning “session” with SerialICE and get to finally write some actual (possibly useful) code.
In particular I hope to fix the problem regarding the conflicts in managing the cache and its related registers that occur when coreboot initialises the target but SerialICE is used as the romstage.

That’s pretty much it  for now, see you next week!

[GSoC] coreboot for ARM64 Qemu – Week #9 #10

In the last post I talked about using aarch64-linux-gnu-gdb and debugging in qemu. In these two weeks I was intensely involved in stepping through gdb, disassembly and in-turn debugging the qemu port. I summarise the major highlights below.

Firstly, the correct instruction to invoke qemu is as follows

./aarch64-softmmu/qemu-system-aarch64 -machine virt -cpu cortex-a57 -machine type=virt -nographic -smp 1 -m 2048 -bios ~/coreboot/build coreboot.rom -s -S

After invoking gdb, I moved onto tracing the execution of the instructions step by step to determine where and how the code fails. A compendium of the code execution is as follows

gdb) target remote :1234
Remote debugging using :1234
(gdb) set disassemble-next-line on
(gdb) stepi
0x0000000000000980 in ?? ()
=> 0x0000000000000980: 02 00 00 14 b 0x988
(gdb)
0x0000000000000988 in ?? ()
=> 0x0000000000000988: 1a 00 80 d2 mov x26, #0x0                    // #0
(gdb)
0x000000000000098c in ?? ()
=> 0x000000000000098c: 02 00 00 14 b 0x994
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x0000000000000750 in ?? ()
=> 0x0000000000000750: 3f 08 00 71 cmp w1, #0x2

The detailed version can be seen here.

The first sign of error can be seen here, where the instruction is 0 and the address is way off.

0x64672d3337303031 in ?? ()
=> 0x64672d3337303031: 00 00 00 00 .inst 0x00000000 ; undefined

To find insights as to why this is happening, I resorted to tracing in gdb. This can be done by adding the following in the qemu invoke command. This creates a log file in /tmp which can be read to determine suitable information.

-d out_asm,in_asm,exec,cpu,int,guest_errors -D /tmp/qemu.log

Looking at the disassembly, it can be seen that execution of instructions till 0x784 is correct and it goes bonkers immediately after it. Looking at the trace, this is where the code hangs

IN:
0x0000000000000784:  d65f03c0      ret
The ret goes to somewhere bad. So the stack has been blown or it has executed into an area it should have prior to this. Next, I did a objdump on the bootblock.debug file. Relating to the code at this address, it could be determined that the code fails at “ret in 0000000000010758 <raw_write_sctlr>:”
Next up was determining where the stack gets blown or corrupt. For this, while stepping through each instruction, I looked at the stack pointer. The output here shows the details. Everything seems to function correctly till 0x00000000000007a0 (0x00000000000007a0: f3 7b 40 a9 ldp x19, x30, [sp] ), then next is 0x00000000000007a4: ff 43 00 91 add sp, sp, #0x10 . This is where saved pc goes corrupt. This code gets called in the “raw_write_sctlr_current” (using objdump)
From the trace, we have the following information : The ret goes bad at 0000000000010758 <raw_write_sctlr>:
0x0000000000000908:  97fffe06      bl #-0x7e8 (addr 0x120)
0x0000000000000120:  3800a017      sturb w23, [x0, #10]
0x0000000000000124:  001c00d5      unallocated (Unallocated)
Taking exception 1 [Undefined Instruction]
…from EL1
…with ESR 0x2000000
Which is here:
0000000000010908 <arm64_c_environment>:
   10908: 97fffe06  bl 10120 <loop3_csw+0x1b>
   1090c: aa0003f8  mov x24, x0
This finally gave some leads in the qemu debug. There seems be some misalignment in smp_processor_id.
While tracing in gdb, we have
0x0000000000000908 in ?? ()
=> 0x0000000000000908: 06 fe ff 97   bl  0x120
(which is actually bl smp_processor_id (from src/arch/arm64/stage_entry.S))
Under arm64_c_environment (in objdump) we have;
10908:       97fffe06        bl      10120 <loop3_csw+0x1b>
Also in the trace we have
IN:
0x0000000000000908:  97fffe06      bl #-0x7e8 (addr 0x120)

Now loop3_csw is defined at (from objdump)
0000000000010105 <loop3_csw>:

So this + 0x1b = 10120

Thus it wants to branch and link to 0x120 but smp_processor_id is at 121.

smp_processor_id is at (from objdump)
0000000000010121 <smp_processor_id>:

This gives us where the code is failing. Next up is finding out the reason for this misalignment and rectifying it.

 

 

 

[GSoC] coreboot for ARM64 Qemu – Week #8

As I had discussed in my last blog post, currently I am onto the debug of the qemu boot. I was intending to use Valgrind tools to detect various memory managements bugs and use that information for my debug. But sadly the information provided by Valgrind was not of much use since it didn’t deal with the execution stream of the coreboot code in qemu. I ultimately had to turn to gdb and use it for further debug.

This was an initial hiccup, since, as in my last post, building aarch64-linux-gnu-gdb on MacOSX was not straightforward, since there was no direct replacement for the “gdb-multiarch”. I was able to get this done. I discuss some of the basic steps of how to set it up below.

First, we need a couple to packages to build gdb. They are listed below:

expat guile texinfo

Next, download the aarch64-gdb from here. Now, you need to configure CC to gcc (GNU gcc and not the innate symlink to clang). Then proceed to,

$ ./configure --target=aarch64-linux-gnu
$ make
$ make install

If this completes successfully, you would have aarch64-gdb installed on your system correctly. The important thing to remember is to use GNU gcc (>=4.9) and not the innate MacOS gcc.

To run gdb you must

$ aarch64-linux-gnu-gdb

The output looks like this :

GNU gdb (GDB) 7.9
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-apple-darwin13.3.0 --target=aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".

Now I had gdb working. Then I did started the debug by giving a “-s -S” while invoking qemu. After this, I need to connect to gdb remotely using

(gdb) target remote : 1234

Some of the debug information I received was this :

(gdb) target remote :1234
Remote debugging using :1234
0x0000000000000200 in ?? ()
(gdb) run
The “remote” target does not support “run”.  Try “help target” or “continue”.
(gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x0000000000000200 in ?? ()

On trying single-step execution on gdb, I received :

(gdb) step
Cannot find bounds of current function
An error like this usually seen when we overflow a buffer and corrupt the stack, the proper return address is destroyed. When the debugger tries to figure out which function the address is in, it fails, because the address is not in any of the functions in the program.
On running the simple where on gdb I get [where displays the current line and function and the stack of calls that got you there]
(gdb) where
#0  0x0000000000000200 in ?? ()

After some unscrambling of the source code using information from gdb, we were pointed to some issues under the stage_entry in src/arch/arm64/stage_entry.S. I am onto re-setting those and continuing the debug further now.  

 

 

[GSoC] EC/H8S firmware week #7|#8

Week #7 was is little bit frustrating, because of no real progress, only more unfinished things which aren’t working. Week #8 was a lot better.

1. Sniffing the communication between the 2 embedded controllers H8S and PMH4.

I’ve tried to build an protocol analyser with the msp430, but the data output was somehow strange. For testing purpose I used my H8S firmware to produce testing data. But the msp430 decoded only wrong data. I’m using IRQs on the clock to do the magic and writing it to a buffer before transmitting it via UART. Maybe the msp430 is too slow for that? Possible. Set a GPIO to high when the IRQ routing start and to low when it ends. Visualize the clock signal and connect the  IRQ measure pin to an oscilloscope. The msp430 is far too slow. I’m using memory dereference in the IRQ routine, which takes a lot of time. Maybe the msp430 is fast enough, when using asm routine and registers to buffer the 3 byte transmission. But a logic analyser would definitely work. So I borrowed two logic analyser. An OLS (Openbench Logic Sniffer) and a Saleae Logic16.

There isn’t so much data on the lines. Every 50 ms there is a short transmission of 3 byte. But I don’t want to decode the data by hand. So it needs a decoder for the logic analyser. sigrok looks like the best start point and both analyser are supported.

I’ve started with the Openbench Logic Sniffer, but unfortunately it doesn’t have enough RAM to buffer the input long enough. Maybe the external trigger input can be used. But before doing additional things I would like to test with the Logic16.

The Logic16 doesn’t support any triggers but it can stream all data over USB even with multiple MHz. Good enough to capture all data. I found out that the best samplerate is 2 MHz. Otherwise the LE signal isn’t captured, because it’s a lot shorter than a clock change. In the end I created a decoder with libsigrokdecode.

sigrok-cli -i boots_and_shutdown_later_because_too_hot.sr –channels 0-3 -P ec_xp:clk=2:data=3:le=1:oe=0 | uniq -c 

67 0x01 0x07 0xc8
3 0x01 0x04 0xc8 
4 0x01 0x10 0x48
1120 0x01 0x17 0x48
67 0x01 0x07 0xc8

0x01 0x07 0xc8 is called when only power is plugged in, like a watchdog(every 500ms)
0x01 0x17 0x48 is called when the device is powered on, like a watchdog (every 50ms)
0x01 0x04 0xc8 around the time power button pressed
0x01 0x10 0x48 around the time power button pressed

2. Flash back the OEM H8S firmare

The OEM H8S firmware is included in the bios updates. cabextract and strings is enough for extracting it out of the update. Look for SREC lines. Put the SREC lines into a separate file and flash them back via UART bootloader and the renesas flash tool. The display powers up and it’s booting again with OEM BIOS.
I could imagine they are using a similar update method like the UART bootloader. First transfer a flasher application into RAM and afterwards communicate with the flasher to transfer the new firmware, but the communication works over LPC instead of UART.

3. Progress on the bootloader

I’ve implemented the ADC converter to enable the speaker amp and the display backlight brightness.

Written down LPC registers and just enable the Interface in order to get GateA20 working. Still unclear how far this works.

4. How to break into the bootloader?

The idea of the bootloader is providing a brick free environment for further development. The bootloader loads the application which adds full support for everything. It should be possible to stop the loading application and flash a new application into the EC flash. When starting development on the x60 or x201 I want to use I2C line as debug interface. I2C chips have a big footstep and are easy to access. But there must be a way to abort the loading. I will use the function key in combination with the leds.

  1. Remove the battery and power plug.
  2. Press the function key
  3. Put the power plug in
  4. Wait until leds blinking
  5. release the function key within 5 seconds after the leds starting to blink to enter the bootloader.

The H8S will become I2C slave on a specific address.

What next?

  • Add new PMH4 commands to the H8S
  • solder additional pins to MAINOFF PWRSW_H8 A20 KBRC
  • use the logic analyser to put the communication in relation with these signals
  • UART shell
  • I2C master & client
  • solder LPC pins to analyse firmware update process
  • test T40 board with new PMH4 commands and look if all power rails are on