[GSoC] EC/H8S firmware week #7|#8

Week #7 was is little bit frustrating, because of no real progress, only more unfinished things which aren’t working. Week #8 was a lot better.

1. Sniffing the communication between the 2 embedded controllers H8S and PMH4.

I’ve tried to build an protocol analyser with the msp430, but the data output was somehow strange. For testing purpose I used my H8S firmware to produce testing data. But the msp430 decoded only wrong data. I’m using IRQs on the clock to do the magic and writing it to a buffer before transmitting it via UART. Maybe the msp430 is too slow for that? Possible. Set a GPIO to high when the IRQ routing start and to low when it ends. Visualize the clock signal and connect the  IRQ measure pin to an oscilloscope. The msp430 is far too slow. I’m using memory dereference in the IRQ routine, which takes a lot of time. Maybe the msp430 is fast enough, when using asm routine and registers to buffer the 3 byte transmission. But a logic analyser would definitely work. So I borrowed two logic analyser. An OLS (Openbench Logic Sniffer) and a Saleae Logic16.

There isn’t so much data on the lines. Every 50 ms there is a short transmission of 3 byte. But I don’t want to decode the data by hand. So it needs a decoder for the logic analyser. sigrok looks like the best start point and both analyser are supported.

I’ve started with the Openbench Logic Sniffer, but unfortunately it doesn’t have enough RAM to buffer the input long enough. Maybe the external trigger input can be used. But before doing additional things I would like to test with the Logic16.

The Logic16 doesn’t support any triggers but it can stream all data over USB even with multiple MHz. Good enough to capture all data. I found out that the best samplerate is 2 MHz. Otherwise the LE signal isn’t captured, because it’s a lot shorter than a clock change. In the end I created a decoder with libsigrokdecode.

sigrok-cli -i boots_and_shutdown_later_because_too_hot.sr –channels 0-3 -P ec_xp:clk=2:data=3:le=1:oe=0 | uniq -c 

67 0x01 0x07 0xc8
3 0x01 0x04 0xc8 
4 0x01 0x10 0x48
1120 0x01 0x17 0x48
67 0x01 0x07 0xc8

0x01 0x07 0xc8 is called when only power is plugged in, like a watchdog(every 500ms)
0x01 0x17 0x48 is called when the device is powered on, like a watchdog (every 50ms)
0x01 0x04 0xc8 around the time power button pressed
0x01 0x10 0x48 around the time power button pressed

2. Flash back the OEM H8S firmare

The OEM H8S firmware is included in the bios updates. cabextract and strings is enough for extracting it out of the update. Look for SREC lines. Put the SREC lines into a separate file and flash them back via UART bootloader and the renesas flash tool. The display powers up and it’s booting again with OEM BIOS.
I could imagine they are using a similar update method like the UART bootloader. First transfer a flasher application into RAM and afterwards communicate with the flasher to transfer the new firmware, but the communication works over LPC instead of UART.

3. Progress on the bootloader

I’ve implemented the ADC converter to enable the speaker amp and the display backlight brightness.

Written down LPC registers and just enable the Interface in order to get GateA20 working. Still unclear how far this works.

4. How to break into the bootloader?

The idea of the bootloader is providing a brick free environment for further development. The bootloader loads the application which adds full support for everything. It should be possible to stop the loading application and flash a new application into the EC flash. When starting development on the x60 or x201 I want to use I2C line as debug interface. I2C chips have a big footstep and are easy to access. But there must be a way to abort the loading. I will use the function key in combination with the leds.

  1. Remove the battery and power plug.
  2. Press the function key
  3. Put the power plug in
  4. Wait until leds blinking
  5. release the function key within 5 seconds after the leds starting to blink to enter the bootloader.

The H8S will become I2C slave on a specific address.

What next?

  • Add new PMH4 commands to the H8S
  • solder additional pins to MAINOFF PWRSW_H8 A20 KBRC
  • use the logic analyser to put the communication in relation with these signals
  • UART shell
  • I2C master & client
  • solder LPC pins to analyse firmware update process
  • test T40 board with new PMH4 commands and look if all power rails are on

Experiments of mind

The time for writing code is over. The time to design hardware is over. After seven weeks, the vultureprog_action_shotbeginning has come to an abrupt end. I am severely behind schedule. In week seven I was supposed to implement erase functionality — tell the programmer how to erase the chip. This is not done. On the other hand, I have had code for weeks 8 and 9 almost ready, and just merged most of it last week. So, where am I? Am I ahead or behind schedule?

The fallacy of preemption

One of the requirements for applying as a GSoC coreboot student was to have a fully established, vultureprog_probingschedule from day[-1]. Establishing this schedule was a great experience, and it allowed me to think in depth about the problem and possible solutions — to a certain degree. I picked the steps I considered logical, in the order which I saw logical. Development is never about writing code in the order in which it will be executed. In this particular case, it was much easier to implement bulk writing without a predefined erase/write strategy, opting instead for a default just-do-it approach.

Why is this approach better than following the schedule, from a development point of view? We have had bulk read partially working for a while now. From the host point of view, reading and writing are symmetrical operations. The bulk of the code (pun definitely intended) is shared between the read and write operation. They both juggle data on the same endpoint. The only difference is the endpoint direction bit. It therefore made sense, once bulk reading was fixed for corner cases, to uses the same code to send data to the programmer. Making the programmer write that data was a matter of a couple of hours. There was no sensible reason to wait an additional two weeks before implementing this last bit.

Software development work is as much about making things work, as it is about the application of programming principles with unquestionable moral authority and correctness. In this case, implementing a trivial extension reusing code fresh in my mind was the preferred approach. Not only did it save me time by not having to re-examine the situation a few weeks from now, it also allows me to have a working program/verify scenario when implementing the erase strategies. As one might imagine, this makes the problem a lot easier. Attempting to preempt and enforce a schedule before the problem is thoroughly explored, occasionally conflicts with best practices of development. With this in mind, I am neither behind, nor ahead of schedule. I am precisely where I need to be.

A matter of experimentation

Most of the infrastructure and code is already in place. Bringing QiProg to completion is no longer an issue of adding functionality through code, but rather completing functionality by connecting the existing code. One issue I discovered after testing the bulk program code was a terrible race condition between read prefetching and the write loop. The prefetch logic incremented the internal address before data arrived. As a result, the new data would get written at the wrong address. Choosing the best solution to the problem is a matter of experimentation.

The “this won’t work because of that” and “what if this” turned into a series of exhausting thought experiments. I have been bugging Peter a lot in the past few days about a series of potential issues. Through tiring thought experimentation, we eventually agreed that the best way to proceed was to abstract a lot more through the API. This is a non-exhaustive list of the decisions we’ve made in the past week:

  • set_address() is hidden from the API
  • the internal address range is not exhausted once read or written
  • read and write operations must not be interdependent, the internal read and write pointers will be distinct (as a side effect, this change also eliminates the race condition depicted above)
  • set_address() + readn() turns into read(dev, where, n)
  • All API addresses begin at 0. The programmer translates that into an absolute address
  • new API call set_chip_size()
  • new API call to explicitly erase blocks or sectors (to be defined)
  • implicit erase on write can be enabled or disabled (to be defined)
  • implicit erase will erase the sector/block right before the first byte of the sector is written
  • exposing any USB specific dependencies in the API is strictly forbidden

My focus for the remainder of this week will be to shorten this list as much as possible. Once the dependency between read and write is unshackled, I will be able to erase/program/verify my faithful SST 49LF080A. From here, it will be a matter of finalizing and implementing the last obscure bits of the specification.

The state of QiProg for flashrom

As QiProg is still being finalized, implementing it as a flashrom programmer is still a long ways ahead. I do estimate that weeks 11 and 12 will provide ample time to integrate everything into flashrom, hopefully, in time for the 0.9.8 release.

Cooking with thin spaghetti: The hard side of Vultureprog

One of the reasons I fell in love with the Stellaris Launchpad boards is that they are modularly vultureprog_3dexpandable. This notion is difficult to explain without comparison to STM Discovery boards, which have a row or two of pins on each side. The idea is simple: you hook one end of your wire to the right pin, and the other end to your breadboard, or you design a custom baseboard specific to the Discovery model. Stellaris takes this idea a little further. The layout of the pins is standardized, not just for the Stellaris, but across the family of TI development boards. Enter the Booster Packs: standardized add-on modules for TI boards. These modules are stackable, so it is possible to connect more than one to a single Stellaris board. This is why I wanted to use the Stellaris for this project. It’s much easier to build a booster pack than to tell people how to connect 32 wires; most people have problems connecting four of them to a buspirate. Let’s look at some of the design choices.

Constraints, constraints, constraints

It’s easy to imagine connecting a LPC chip: six wires and power. In reality, the situation is nowhere near as bright. Four ID pins need to be pulled low, reset pins (yes, there is more than one) need to be pulled high, and some pins simply cannot be left floating. Thus, even a simple bus like LPC becomes a nightmare. Without a logic analyzer to tell what works and what does not, the result is frustration and even self-inflicted injuries. Consequently, I wanted to do a few things right from the beginning(TM).

The most important point was to have all pins properly connected with zero wires. Users should not have to worry about what connects to where. Remember, these chips have 32 pins.

I also wanted to support all possible bus types. LPC and FWH are identical hardware-wise, and are not a problem to support concurrently. SPI is also just a few extra traces that lead to a header. On the other hand, having a programmer that also supports parallel mode is a much harder problem. It turns out there are really two “parallel” modes. The first one is ISA, where the chip is accessed via a linear address space. You put the address you want to access on the address pins, handle a couple of handshake lines to tell the chip if you want to read or write, and move the data over a separate 8-bit data bus.

On the other hand, the second “parallel” mode is a real pain. It uses a 2-dimensional address space, where you need to drive a row address, then a column address, and only then access the data. It’s called PP or “parallel programming” mode. Luckily we get a break: PP mode is an auxiliary programming mode specific to some LPC chips. If we support LPC, we don’t need PP. PP goes in the garbage bin (for now).

Now we need an efficient way to connect the GPIOs to the chip. By “efficient” I mean minimizing the number of GPIO accesses, and the number of bitshifts we need to do in firmware. A poorly chosen pinout will result in abysmal performance, as the 80MHz core struggles to shift the correct bit to the correct GPIO. My choice here was limited, as the best I could do was assign successive GPIOs to successive address pins. I spent the entire Sunday looking over chip datasheets and deciding on this “spaghetti recipe”.

Flexibility – a big issue

I also wanted to have the option between a normal PLCC32 socket, or a ZIF socket (AKA clamshell). I was really an idiot for thinking I would have both on the same board. On paper, it looks very straightforward. In reality, adjacent pins are on different hemispheres of the globe, and routing them is well, the tastiest spaghetti you have ever eaten. There was no way I could fit both a clamshell, and a PLCC32 socket. There was no way to route the 32 or so tracks on just 2 layers. So I killed the clamshell, the SPI header, and the LPC header. After a couple of hours of messing with the routing, I always had one or two pins that got cornered.

An epic fail

Even routing a simple PLCC socket proved difficult.

What coffee can do to you

I decided to start over, with all the components in place. Once I reduced the track size to 8 mils, and  spacing to 6 mils, I was able to route two tracks between a set of pins. This time, I placed the socket inside the clamshell, and managed to connect the two using just the top layer. I then worked from the booster pack connection to the DIP pins on the same side of the board, again, using only the top layer. Then I started using the bottom layer for DIP pins on the opposite side. After a few hours, Chuck Norris warped space and time to make room for all the tracks:

A little less epic this time

From here, it was a matter of optimizing the routing, taking care of ground planes and other finishing touches. In the end, we get VultureProg hardware version 0.1:

vultureprog_board

Don’t let the PRELIMINARY DESIGN warning fool you. There is an infinitesimal possibility I will ever want to go back and revise the design. We have 35 GPIOs. accessible on the Stellaris. Five of them are connected to the on-board LEDs and buttons. The remaining 30 are all used up.

Conclusion

If you are a Kicad user, you can head over to yet another one of my GitHub repositories. If you do not have a way to consume Kicad files, you can look in the doc and gerbers directories. Feel free to feed the gerbers to Mayhew Labs’ 3D Gerber Viewer (hint: you can rotate the board in 3D). With all that being done I ordered the first batch of PCBs from Seed Studio’s Fusion PCB service. Routing is definitely too crammed and painful, but I really wanted something versatile and flexible. Whether it lives up to its design goals in REV 0.1 or REV 0.2 remains to be seen. My money is on REV 0.1 — quite literally.

Spartan-3E logic analyzer

I got the sump.org Logic Analyzer running on a Xilinx Spartan-3E FPGA starter kit board. This wasn't too difficult, but there were some annoying problems that I'll share with you.

Buying the board

The particular board is not sold by Xilinx anymore, but the company that actually designed the board for Xilinx, Digilent Inc., still sell it.

Installing and setting up the tools

Once again I downloaded and installed the 32-bit ISE WebPack 11.1 software on 32-bit Linux. It's a 2.8GB download which installs to roughly 5GB. (After deleting 1GB of .xinstall folders left from the installation.. wtf..) It seems that version 12.1 was released just recently and if you're using a version other than 11.1, or another operating system, then you may of course run into fewer problems, more problems, or just different problems.

Installation ran fine as a regular user. At the end of the installation I tried to do the upgrade to version 11.5 but didn't have enough disk space left. Now I can source the settings32.sh file in the directory I installed to and then run ise, impact, and the other programs.

I struggled for a while with iMPACT not finding the built-in Xilinx programmer cable on the starter kit board. There were two issues:

  1. Wrong firmware downloaded to the USB controller
  2. iMPACT stubbornly trying to use kernel windrvr instead of libusb
USB controller firmware

Xilinx provides a driver setup script which puts files in /etc/udev/rules.d and /usr/share but I prefered to make an ebuild for the Xilinx firmware package and have added it to my Portage overlay "stuge" which was included in the official layman list on 2010-06-14. If you're not using Gentoo you could most likely run the setup_pcusb script included in the tarball without issues, but note that newer versions of udev require some changes in the supplied rules.

iMPACT insists on ignoring libusb

Older versions of iMPACT used a really nasty kernel driver for accessing USB devices, but now it instead prefers to use libusb for the programmer access. Yay! But not so fast...

On my system iMPACT kept trying to use the old drivers, which I did not have installed and do not want to pollute my system with. According to Xilinx this means that "libusb is not installed", which is absolutely useless of course. It's very frustrating, and typical for closed source software, that information is not sufficiently specific. They need to document the actual assumptions made by their software, instead of publishing some nonsense high level description for what is a very low level system issue. Of course libusb was installed! Stupid.

Eventually I found the problem. It seems that iMPACT uses dlopen() to load "libusb.so", which is arguably a mistake by Xilinx developers. I believe that they should specify a filename with an explicit version instead, such as "libusb-0.1.so.4". The libusb-0.12-r5 package that I had installed unfortunately did not allow the unversioned dlopen() to work. The file found by dlopen("libusb.so") was /usr/lib/libusb.so, which was a small text file (a GNU ld script) used to redirect the linker to /lib/libusb.so when I compiled programs that used libusb. Similar files exist for other libraries. dlopen() needs a binary file however, it doesn't understand the linker script, so iMPACT was unable to load libusb, quietly resorted to requiring windrvr, and complained loudly when it wasn't found. There were no error messages about the failure to load libusb. I can't fix iMPACT, so I solved the problem by doing what I should have done long ago anyway; I removed libusb-0.1 and installed the libusb-compat-0.1.3 package instead, which provides backwards compatibility for libusb-0.1 applications and uses the much improved libusb-1.0 for communication. libusb-compat installs the binary library file in /usr/lib and /usr/lib/libusb.so is a symlink to it. Once it was installed iMPACT could find the programming cable via libusb without problems.

Downloading a logic analyzer

There's not much documentation for the sump.org logic analyzer package, which is too bad since it's a really useful design. There's even a minimal and pretty open hardware made explicitly for the sump.org design, and it costs only $45; the Open Logic Sniffer. If you have use for a LA and don't already have a suitable FPGA board around then I think the OLS is amazing value for money.

On the sump.org project page there are a few different versions available for download. The experimental Spartan 3E source is for the board that I used. The zip includes source for the PC Client (Java) and for the logic analyzer hardware design (VHDL). I added a sump-analyzer-0.8.1.ebuild to my overlay that builds the client and installs a launcher script called sump-analyzer. Non-Gentooers can download the slightly older v0.8 binary package to get a pre-built analyzer.jar, or compile their own with the following commands:

$ cd client/
$ javac -encoding iso-8859-1 $(find -name '*.java')
$ jar cfm analyzer.jar Manifest.txt $(find -name '*.class' -o -name '*.png')

Another great option is the highly portable sigrok open source software, which supports the sump hardware design as well as several other logic analyzer products.

From files to hardware

The fpga directory has the VHDL code that makes up the actual logic analyzer, and the UCF (User Constraint File) that specifies how the FPGA chip is connected. Now, let's make hardware from them:

Open ISE, create a new project, select Top-level source type HDL, click Next, set Family Spartan3E, Device XC3S500E, Package FG320, Speed -4 and Preferred Language VHDL, click Next twice (skip the "Create New Source" step), and in the "Add Existing Sources" step make sure to add all files in the fpga subdirectory except for la.ucf, la.vhd and license.txt, in particular la-S3ESK.ucf and la-S3ESK.vhd must be included. When you OK the "Project Summary" then ISE should add all 23 files without problems. Double click "Configure Target Device" under "Processes: la - Behavioral" in the lower half of the Design tab in the Design panel in ISE to run through Synthesis, Place & Route, and bitstream generation.

You may get a message about no iMPACT project file, just OK that so that iMPACT starts. Double-click "Boundary Scan" in the iMPACT Flows panel, then press Ctrl-I or select File->Initialize Chain in the menu. A dialog with settings for the three discovered devices appears, just OK that too. Double click the xc3s500e icon and assign the la.bit file created by ISE. Say No to the SPI or BPI PROM question. Right-click the xc3s500e icon and select Program. OK the same properties dialog again to send the bitstream to the FPGA.

When programming has succeeded the board is a logic analyzer, and you can start capturing signals! Don't forget a level shifter or series resistors if you want to capture 5V signals.

Notes
  • SW1:0 sets baud rate: 00=115k2 01=57k6 10=38k4 11=19k2
  • LED4:3 shows the baud rate set by SW1:0.
  • LED1:0 shows the (active low) TX and RX signals.
  • BTN_SOUTH is reset. LED6 shows the reset signal.
  • LED7 shows the external clock signal. (DIP-8 socket)
  • If capturing without an actual signal connected then remember to disable Noise Filter, or the logic analyzer doesn't see any data, and the capture seems to hang. Either apply an actual signal or close the capture, press BTN_SOUTH and then try again, making sure to disable the filter.
  • iMPACT uses a SysV IPC semaphore to "lock" the download cable. If it isn't closed cleanly or if it crashes, it may start saying "Cable is LOCKED. Retrying..." in the log, which is of course wrong. Remove the semaphore with semget(0x240157b1,1,0),0,IPC_RMID,NULL) in a C program, or reboot, to get iMPACT to find the cable again.
  • Look at la-S3ESK.ucf for the connections. All input channels are in the FX2 connector, but the first twelve are also on the more convenient J1, J2 and J4 headers.