[GSoC] EC/H8S firmware week #7|#8

Week #7 was is little bit frustrating, because of no real progress, only more unfinished things which aren’t working. Week #8 was a lot better.

1. Sniffing the communication between the 2 embedded controllers H8S and PMH4.

I’ve tried to build an protocol analyser with the msp430, but the data output was somehow strange. For testing purpose I used my H8S firmware to produce testing data. But the msp430 decoded only wrong data. I’m using IRQs on the clock to do the magic and writing it to a buffer before transmitting it via UART. Maybe the msp430 is too slow for that? Possible. Set a GPIO to high when the IRQ routing start and to low when it ends. Visualize the clock signal and connect the IRQ measure pin to an oscilloscope. The msp430 is far too slow. I’m using memory dereference in the IRQ routine, which takes a lot of time. Maybe the msp430 is fast enough, when using asm routine and registers to buffer the 3 byte transmission. But a logic analyser would definitely work. So I borrowed two logic analyser. An OLS (Openbench Logic Sniffer) and a Saleae Logic16.

There isn’t so much data on the lines. Every 50 ms there is a short transmission of 3 byte. But I don’t want to decode the data by hand. So it needs a decoder for the logic analyser. sigrok looks like the best start point and both analyser are supported.

I’ve started with the Openbench Logic Sniffer, but unfortunately it doesn’t have enough RAM to buffer the input long enough. Maybe the external trigger input can be used. But before doing additional things I would like to test with the Logic16.

The Logic16 doesn’t support any triggers but it can stream all data over USB even with multiple MHz. Good enough to capture all data. I found out that the best samplerate is 2 MHz. Otherwise the LE signal isn’t captured, because it’s a lot shorter than a clock change. In the end I created a decoder with libsigrokdecode.

sigrok-cli -i boots_and_shutdown_later_because_too_hot.sr –channels 0-3 -P ec_xp:clk=2:data=3:le=1:oe=0 | uniq -c

67 0x01 0x07 0xc8
3 0x01 0x04 0xc8
4 0x01 0x10 0x48
1120 0x01 0x17 0x48
67 0x01 0x07 0xc8

0x01 0x07 0xc8 is called when only power is plugged in, like a watchdog(every 500ms)
0x01 0x17 0x48 is called when the device is powered on, like a watchdog (every 50ms)
0x01 0x04 0xc8 around the time power button pressed
0x01 0x10 0x48 around the time power button pressed

2. Flash back the OEM H8S firmare

The OEM H8S firmware is included in the bios updates. cabextract and strings is enough for extracting it out of the update. Look for SREC lines. Put the SREC lines into a separate file and flash them back via UART bootloader and the renesas flash tool. The display powers up and it’s booting again with OEM BIOS.
I could imagine they are using a similar update method like the UART bootloader. First transfer a flasher application into RAM and afterwards communicate with the flasher to transfer the new firmware, but the communication works over LPC instead of UART.

3. Progress on the bootloader

I’ve implemented the ADC converter to enable the speaker amp and the display backlight brightness.

Written down LPC registers and just enable the Interface in order to get GateA20 working. Still unclear how far this works.

4. How to break into the bootloader?

The idea of the bootloader is providing a brick free environment for further development. The bootloader loads the application which adds full support for everything. It should be possible to stop the loading application and flash a new application into the EC flash. When starting development on the x60 or x201 I want to use I2C line as debug interface. I2C chips have a big footstep and are easy to access. But there must be a way to abort the loading. I will use the function key in combination with the leds.

Remove the battery and power plug.
Press the function key
Put the power plug in
Wait until leds blinking
release the function key within 5 seconds after the leds starting to blink to enter the bootloader.

The H8S will become I2C slave on a specific address.

What next?

Add new PMH4 commands to the H8S
solder additional pins to MAINOFF PWRSW_H8 A20 KBRC
use the logic analyser to put the communication in relation with these signals
UART shell
I2C master & client
solder LPC pins to analyse firmware update process
test T40 board with new PMH4 commands and look if all power rails are on

Experiments of mind

The time for writing code is over. The time to design hardware is over. After seven weeks, the beginning has come to an abrupt end. I am severely behind schedule. In week seven I was supposed to implement erase functionality — tell the programmer how to erase the chip. This is not done. On the other hand, I have had code for weeks 8 and 9 almost ready, and just merged most of it last week. So, where am I? Am I ahead or behind schedule?

The fallacy of preemption

One of the requirements for applying as a GSoC coreboot student was to have a fully established, schedule from day[-1]. Establishing this schedule was a great experience, and it allowed me to think in depth about the problem and possible solutions — to a certain degree. I picked the steps I considered logical, in the order which I saw logical. Development is never about writing code in the order in which it will be executed. In this particular case, it was much easier to implement bulk writing without a predefined erase/write strategy, opting instead for a default just-do-it approach.

Why is this approach better than following the schedule, from a development point of view? We have had bulk read partially working for a while now. From the host point of view, reading and writing are symmetrical operations. The bulk of the code (pun definitely intended) is shared between the read and write operation. They both juggle data on the same endpoint. The only difference is the endpoint direction bit. It therefore made sense, once bulk reading was fixed for corner cases, to uses the same code to send data to the programmer. Making the programmer write that data was a matter of a couple of hours. There was no sensible reason to wait an additional two weeks before implementing this last bit.

Software development work is as much about making things work, as it is about the application of programming principles with unquestionable moral authority and correctness. In this case, implementing a trivial extension reusing code fresh in my mind was the preferred approach. Not only did it save me time by not having to re-examine the situation a few weeks from now, it also allows me to have a working program/verify scenario when implementing the erase strategies. As one might imagine, this makes the problem a lot easier. Attempting to preempt and enforce a schedule before the problem is thoroughly explored, occasionally conflicts with best practices of development. With this in mind, I am neither behind, nor ahead of schedule. I am precisely where I need to be.

A matter of experimentation

Most of the infrastructure and code is already in place. Bringing QiProg to completion is no longer an issue of adding functionality through code, but rather completing functionality by connecting the existing code. One issue I discovered after testing the bulk program code was a terrible race condition between read prefetching and the write loop. The prefetch logic incremented the internal address before data arrived. As a result, the new data would get written at the wrong address. Choosing the best solution to the problem is a matter of experimentation.

The “this won’t work because of that” and “what if this” turned into a series of exhausting thought experiments. I have been bugging Peter a lot in the past few days about a series of potential issues. Through tiring thought experimentation, we eventually agreed that the best way to proceed was to abstract a lot more through the API. This is a non-exhaustive list of the decisions we’ve made in the past week:

set_address() is hidden from the API
the internal address range is not exhausted once read or written
read and write operations must not be interdependent, the internal read and write pointers will be distinct (as a side effect, this change also eliminates the race condition depicted above)
set_address() + readn() turns into read(dev, where, n)
All API addresses begin at 0. The programmer translates that into an absolute address
new API call set_chip_size()
new API call to explicitly erase blocks or sectors (to be defined)
implicit erase on write can be enabled or disabled (to be defined)
implicit erase will erase the sector/block right before the first byte of the sector is written
exposing any USB specific dependencies in the API is strictly forbidden

My focus for the remainder of this week will be to shorten this list as much as possible. Once the dependency between read and write is unshackled, I will be able to erase/program/verify my faithful SST 49LF080A. From here, it will be a matter of finalizing and implementing the last obscure bits of the specification.

The state of QiProg for flashrom

As QiProg is still being finalized, implementing it as a flashrom programmer is still a long ways ahead. I do estimate that weeks 11 and 12 will provide ample time to integrate everything into flashrom, hopefully, in time for the 0.9.8 release.

Cooking with thin spaghetti: The hard side of Vultureprog

One of the reasons I fell in love with the Stellaris Launchpad boards is that they are modularly expandable. This notion is difficult to explain without comparison to STM Discovery boards, which have a row or two of pins on each side. The idea is simple: you hook one end of your wire to the right pin, and the other end to your breadboard, or you design a custom baseboard specific to the Discovery model. Stellaris takes this idea a little further. The layout of the pins is standardized, not just for the Stellaris, but across the family of TI development boards. Enter the Booster Packs: standardized add-on modules for TI boards. These modules are stackable, so it is possible to connect more than one to a single Stellaris board. This is why I wanted to use the Stellaris for this project. It’s much easier to build a booster pack than to tell people how to connect 32 wires; most people have problems connecting four of them to a buspirate. Let’s look at some of the design choices.

Constraints, constraints, constraints

It’s easy to imagine connecting a LPC chip: six wires and power. In reality, the situation is nowhere near as bright. Four ID pins need to be pulled low, reset pins (yes, there is more than one) need to be pulled high, and some pins simply cannot be left floating. Thus, even a simple bus like LPC becomes a nightmare. Without a logic analyzer to tell what works and what does not, the result is frustration and even self-inflicted injuries. Consequently, I wanted to do a few things right from the beginning(TM).

The most important point was to have all pins properly connected with zero wires. Users should not have to worry about what connects to where. Remember, these chips have 32 pins.

I also wanted to support all possible bus types. LPC and FWH are identical hardware-wise, and are not a problem to support concurrently. SPI is also just a few extra traces that lead to a header. On the other hand, having a programmer that also supports parallel mode is a much harder problem. It turns out there are really two “parallel” modes. The first one is ISA, where the chip is accessed via a linear address space. You put the address you want to access on the address pins, handle a couple of handshake lines to tell the chip if you want to read or write, and move the data over a separate 8-bit data bus.

On the other hand, the second “parallel” mode is a real pain. It uses a 2-dimensional address space, where you need to drive a row address, then a column address, and only then access the data. It’s called PP or “parallel programming” mode. Luckily we get a break: PP mode is an auxiliary programming mode specific to some LPC chips. If we support LPC, we don’t need PP. PP goes in the garbage bin (for now).

Now we need an efficient way to connect the GPIOs to the chip. By “efficient” I mean minimizing the number of GPIO accesses, and the number of bitshifts we need to do in firmware. A poorly chosen pinout will result in abysmal performance, as the 80MHz core struggles to shift the correct bit to the correct GPIO. My choice here was limited, as the best I could do was assign successive GPIOs to successive address pins. I spent the entire Sunday looking over chip datasheets and deciding on this “spaghetti recipe”.

Flexibility – a big issue

I also wanted to have the option between a normal PLCC32 socket, or a ZIF socket (AKA clamshell). I was really an idiot for thinking I would have both on the same board. On paper, it looks very straightforward. In reality, adjacent pins are on different hemispheres of the globe, and routing them is well, the tastiest spaghetti you have ever eaten. There was no way I could fit both a clamshell, and a PLCC32 socket. There was no way to route the 32 or so tracks on just 2 layers. So I killed the clamshell, the SPI header, and the LPC header. After a couple of hours of messing with the routing, I always had one or two pins that got cornered.

Even routing a simple PLCC socket proved difficult.

What coffee can do to you

I decided to start over, with all the components in place. Once I reduced the track size to 8 mils, and spacing to 6 mils, I was able to route two tracks between a set of pins. This time, I placed the socket inside the clamshell, and managed to connect the two using just the top layer. I then worked from the booster pack connection to the DIP pins on the same side of the board, again, using only the top layer. Then I started using the bottom layer for DIP pins on the opposite side. After a few hours, Chuck Norris warped space and time to make room for all the tracks:

From here, it was a matter of optimizing the routing, taking care of ground planes and other finishing touches. In the end, we get VultureProg hardware version 0.1:

Don’t let the PRELIMINARY DESIGN warning fool you. There is an infinitesimal possibility I will ever want to go back and revise the design. We have 35 GPIOs. accessible on the Stellaris. Five of them are connected to the on-board LEDs and buttons. The remaining 30 are all used up.

Conclusion

If you are a Kicad user, you can head over to yet another one of my GitHub repositories. If you do not have a way to consume Kicad files, you can look in the doc and gerbers directories. Feel free to feed the gerbers to Mayhew Labs’ 3D Gerber Viewer (hint: you can rotate the board in 3D). With all that being done I ordered the first batch of PCBs from Seed Studio’s Fusion PCB service. Routing is definitely too crammed and painful, but I really wanted something versatile and flexible. Whether it lives up to its design goals in REV 0.1 or REV 0.2 remains to be seen. My money is on REV 0.1 — quite literally.