GSoC (coreboot): Week 3 and 4

In the past two weeks I was on vacation and I have been working on what I call “test interface board”. Before I go on to elaborate this I feel there’s a need to discuss the big picture of this project because a lot of things have changed for good reasons and the old terminologies don’t make sense.

Just to remind, my project is centered on building inexpensive and flexible test-rig for the Automated Distributed Firmware Test System described in Quality Assurance Talk by Stefan Reinauer.

A centralized Test Management Server generates test sequences for remotely located systems under test (SUTs) and this includes controlling and monitoring the SUTs and flashing different firmware builds on them. The test management server coordinates with a repository for accessing test builds and for storing test reports. Test reports are the final and useful output of the whole system and these may be accessed using browser by clients from internet.

A Test Supervision Server is a low power computer that acts as a local housekeeper of SUTs for a given physical location. It connects to the Test Management Server using SSH over internet and executes given test sequences by coordinating closely with the SUTs using a Test Interface Board. Programmable power-strips are provided to control power supply to the SUTs from Test Supervision Server.

My work will be confined to the distributed components for now. I have completed the programmable power-strip block. A future add-on to this block could be integrating active power & energy measurement of an SUT for energy efficiency benchmarking. If this is really desirable it could be done after I finish doing the other parts. Right now I’m working on the Test Interface Board.

The Test Interface Board provides necessary hardware interface for connecting Test Supervision Server to an SUT. This is necessary to flash firmware to the ROM, to control power/reset sw, to measure PSU voltages and surface temp. of ICs and to take POST feedback if available. Let’s dive into more details to see how this can be done.

FT232H has a multipurpose serial engine that can be configured as SPI master. FT232H has additional pins that may be used as GPIOs so a GPIO expander may not be needed. The FT232H datasheet states that it offers up to 30mpbs throughput in synchronous serial mode which makes it a fast flashing solution for the given price point (3$). Slave Select (SS) pins can be used to switch between other devices like an ADC that gives voltage and temperature measurements and an optional Feedback microcontroller configured as an SPI slave that gives more information about the SUT. A few GPIO pins can be used to configure Logic Level Translator to ensure compatibility with serial flash of different voltages ranging from 1.8 V to 5V and a GPIO pin will also be used to configure a FET toggle switch (MUX) to electrically detach the serial flash for programming and connecting it back to the motherboard when it’s done.

Notice that I’ve got rid of microcontroller this time. This is because a new microcontroller chip doesn’t necessarily have a bootloader and it needs to be programmed using a dedicated programmer. This adds considerable cost and inconvenience for someone who needs to build only few of these boards. So unless you’re using the optional Feedback module nothing needs to be programmed. Just ordering the board and components and soldering up everything using a 15W iron should be enough to make one of these.

I’m also going to ensure modularity by having small PCBs for each functionality connected to a main-board using headers so that they can be developed independently and used as required. Also, there’s flexibility of choosing temperature probes because it is possible that someone already has good quality probes (that come with professional DMMs).

And a few comments about the ADC I’ve chosen – The ideal choice of ADC for voltage and temperature measurements where the sampling period is large is an integrating ADC. An integrating ADC charges a capacitor from the input signal for a known period of time using an opamp integrator then it discharges that capacitor using a known negative reference voltage. The time it takes to discharge the capacitor is proportional to average value (area under curve) of input signal over sampling period. It’s theoretically simple but it needs use of precision external components and a microcontroller program to work. This is the technique used in professional DMMs (True-RMS) and bench power supplies (for feedback). Delta-Sigma ADCs are common and cheap these days but they don’t average the values over time like integrating ADCs. However, they can provide acceptable accuracy for our application and MCP3208 is a good candidate.

Please see the figure for more details and let me know if there are concerns or suggestions. I’ll post more stuff and schematics in a couple of days.

GSoC [early debugging] Art of refactor

Your branch is ahead of origin/master by 48 commits.

Yes, I knew this would happen, it has become increasingly difficult to push new work for review on gerrit, as I have dependencies on existing work waiting for merge. As the pile of un-merged patches increases so does the time I spend with git rebase, so I am hoping for some progress on that side.

My eyes in the local working directory have turned towards SerialICE integration inside coreboot tree. The benefits of this approach are better tree structure, wider hardware support, cache-as-ram and usbdebug.

There are several use-cases to consider:

Compile classic stand-alone SerialICE ROM image with ROMCC, using super-IO and chipset initialisation from coreboot tree.
Compile SerialICE as an alternative romstage with ROMCC, using existing coreboot bootblock added with serial port initialisation.
Compile SerialICE as romstage with GCC and cache-as-ram to use existing usbdebug code and possibly better execution performance.
Add abilility to jump out of SerialICE to regular romstage.

Also for the SerialICE session on debug host we have alternatives:

Execute vendor BIOS image under QEMU.
Execute coreboot image under QEMU.
Execute coreboot image in user-mode under GDB without QEMU.
Execute utils like nvramtool, msrtool, inteltool, superiotool, lspci and setpci remotely.

Now all of the above has been demonstrated before but not adopted. Adopting these widely for all mainboards may not happen during my GSoC, as there is no common function to call to enable a serial-port from romstage. At the minimum I will make some simple example one can follow to get SerialICE running on boards with existing coreboot support.

Cooking with thin spaghetti: The hard side of Vultureprog

One of the reasons I fell in love with the Stellaris Launchpad boards is that they are modularly expandable. This notion is difficult to explain without comparison to STM Discovery boards, which have a row or two of pins on each side. The idea is simple: you hook one end of your wire to the right pin, and the other end to your breadboard, or you design a custom baseboard specific to the Discovery model. Stellaris takes this idea a little further. The layout of the pins is standardized, not just for the Stellaris, but across the family of TI development boards. Enter the Booster Packs: standardized add-on modules for TI boards. These modules are stackable, so it is possible to connect more than one to a single Stellaris board. This is why I wanted to use the Stellaris for this project. It’s much easier to build a booster pack than to tell people how to connect 32 wires; most people have problems connecting four of them to a buspirate. Let’s look at some of the design choices.

Constraints, constraints, constraints

It’s easy to imagine connecting a LPC chip: six wires and power. In reality, the situation is nowhere near as bright. Four ID pins need to be pulled low, reset pins (yes, there is more than one) need to be pulled high, and some pins simply cannot be left floating. Thus, even a simple bus like LPC becomes a nightmare. Without a logic analyzer to tell what works and what does not, the result is frustration and even self-inflicted injuries. Consequently, I wanted to do a few things right from the beginning(TM).

The most important point was to have all pins properly connected with zero wires. Users should not have to worry about what connects to where. Remember, these chips have 32 pins.

I also wanted to support all possible bus types. LPC and FWH are identical hardware-wise, and are not a problem to support concurrently. SPI is also just a few extra traces that lead to a header. On the other hand, having a programmer that also supports parallel mode is a much harder problem. It turns out there are really two “parallel” modes. The first one is ISA, where the chip is accessed via a linear address space. You put the address you want to access on the address pins, handle a couple of handshake lines to tell the chip if you want to read or write, and move the data over a separate 8-bit data bus.

On the other hand, the second “parallel” mode is a real pain. It uses a 2-dimensional address space, where you need to drive a row address, then a column address, and only then access the data. It’s called PP or “parallel programming” mode. Luckily we get a break: PP mode is an auxiliary programming mode specific to some LPC chips. If we support LPC, we don’t need PP. PP goes in the garbage bin (for now).

Now we need an efficient way to connect the GPIOs to the chip. By “efficient” I mean minimizing the number of GPIO accesses, and the number of bitshifts we need to do in firmware. A poorly chosen pinout will result in abysmal performance, as the 80MHz core struggles to shift the correct bit to the correct GPIO. My choice here was limited, as the best I could do was assign successive GPIOs to successive address pins. I spent the entire Sunday looking over chip datasheets and deciding on this “spaghetti recipe”.

Flexibility – a big issue

I also wanted to have the option between a normal PLCC32 socket, or a ZIF socket (AKA clamshell). I was really an idiot for thinking I would have both on the same board. On paper, it looks very straightforward. In reality, adjacent pins are on different hemispheres of the globe, and routing them is well, the tastiest spaghetti you have ever eaten. There was no way I could fit both a clamshell, and a PLCC32 socket. There was no way to route the 32 or so tracks on just 2 layers. So I killed the clamshell, the SPI header, and the LPC header. After a couple of hours of messing with the routing, I always had one or two pins that got cornered.

Even routing a simple PLCC socket proved difficult.

What coffee can do to you

I decided to start over, with all the components in place. Once I reduced the track size to 8 mils, and spacing to 6 mils, I was able to route two tracks between a set of pins. This time, I placed the socket inside the clamshell, and managed to connect the two using just the top layer. I then worked from the booster pack connection to the DIP pins on the same side of the board, again, using only the top layer. Then I started using the bottom layer for DIP pins on the opposite side. After a few hours, Chuck Norris warped space and time to make room for all the tracks:

From here, it was a matter of optimizing the routing, taking care of ground planes and other finishing touches. In the end, we get VultureProg hardware version 0.1:

Don’t let the PRELIMINARY DESIGN warning fool you. There is an infinitesimal possibility I will ever want to go back and revise the design. We have 35 GPIOs. accessible on the Stellaris. Five of them are connected to the on-board LEDs and buttons. The remaining 30 are all used up.

Conclusion

If you are a Kicad user, you can head over to yet another one of my GitHub repositories. If you do not have a way to consume Kicad files, you can look in the doc and gerbers directories. Feel free to feed the gerbers to Mayhew Labs’ 3D Gerber Viewer (hint: you can rotate the board in 3D). With all that being done I ordered the first batch of PCBs from Seed Studio’s Fusion PCB service. Routing is definitely too crammed and painful, but I really wanted something versatile and flexible. Whether it lives up to its design goals in REV 0.1 or REV 0.2 remains to be seen. My money is on REV 0.1 — quite literally.

GSoC [coreboot debugging] Now it is broken, now it is not

I feel I did not make much progress the last week, I realised I wasted two days looking for error in my code and I finally found the error elsewhere. As for preparation to push my developments to review, I had rebased my tree. That is, I had picked up the developments done by other people in to my setup. My mistake. While the error still persists there in the master tree, in the process of recovering my platform I learned that there are two types of SOIC-8 SPI flash chips, ones that fit in the miniature socket I have and ones that are physically too large. The spare chips I had were of the second type and that slowed down my system recovery procedure radically. New flash chips are waiting for pickup in the store now.

This is actually just the situation I want coreboot to deal with better in the future: doing a firmware upgrade without the risk of bricking the device to the point where you need to use an external programmer device to recover. Problem is specifically with laptops, which may take a good hour or so to disassemble and put back together, and with every disassembly the risk of breaking some of those miniature connectors increases.

My plan of having two copies of firmware in the same flash chip image just got a bit more complicated. I learned that with recent platforms using a so-called binary blob for raminit, aka. system-agent binary, it is not possible to do a type of dual-boot-prefix setup I had planned, since one cannot put two system-agent binaries in the same CBFS image. I hope the system-agent build and release process is seriously improved to overcome this issue as badly gone(/done) binary blob upgrade procedure was the root-cause of my troubles the past week.

I have not really had a chance to test pre-OS flashing with FILO (actually the code might not yet be available for me to download). Instead I have attacked the low-level PCI and IO sources to reduce a good two or three copies from coreboot tree, this will help my efforts in the long run with SerialICE integration work.

GSoC (coreboot) Progress till week 2

As you might know my GSoC project is about making a test rig that can make coreboot test systems more accessible to a coreboot test server. This test rig enables coreboot test server to interact with the systems under test (which may be remotely located) in the following ways:

Power supply control (discussed in this post)
power/reset switch control, voltage and temperature readouts, firmware flashing on serial flash (to be done next)
provision for POST feedback (later)

With this project I’m hoping to create an environment where developers will be able to conveniently connect their systems to the coreboot test server for testing at their own place. This is why I like to call it a distributed test environment as it facilitates mass testing without the need to maintain a dedicated testing facility.

So this week I will present a nice and easy solution for power control of the coreboot test systems. I would call this device a ‘programmable power strip’. Before going to the final solution let me first walk you through all the routes that I’ve taken in order to answer some potential questions that may arise. Read on…

QiProg: The soft side of VultureProg

Another week bites the dust, and coffee supplies are running low. The swarm of zombies is restless, but they seem to be active mostly at night. The barricaded windows are holding up well for the time being. I can venture small distances during the day, but not far enough to reach other survivors. I was able to recover a package from the mailbox this week. Its unannounced appearance is still a mystery, but its contents are most enthralling in this forlorn aeon. I was able to use the waterblocks in the package to reduce my heat signature. There are fewer of THEM trying to break through the barricades. Last night, the leader of the pack did not disturb my hiding place. I don’t know if help will ever come, but I owe it. I owe it to myself, and to the flight engineers. I do not know if they survived, but I owe it to them to finish the flight plans. We must leave this planet.

If you are listening to this transmission, you are a survivor. I have spent this last week in improving the flight plans; I have annotated and documented them in glorious detail. Get them here:

$ git clone git://git.qiprog.org/qiprog.git
$ git clone git://git.qiprog.org/vultureprog.git

How QiProg works

QiProg was originally designed to be a pure USB protocol, specialized in driving flash chip programmers. Peter Stuge’s original QiProg specification is just that: a USB protocol. But as Peter suggested, once that protocol is converted into API calls, it stops being a protocol, and becomes a full-featured API. It’s amazing how each USB control request can be mapped to one and just one API call. Most USB dependencies become invisible in the API, with a very limited number of exceptions, where the dependence on USB could be inferred from the size of the data structures. Even in these limited cases (hint: there are exactly three), the dependency on the USB bus can trivially be abstracted away. My original reaction was to modify the spec to remove them, but that patch now sits lonely on a forgotten github branch.

QiProg initialization

Initializing the QiProg logic is a very boring boilerplate operation. Luckily, this can be done with just one or two API calls. In fact, I have only included three functions to take care of this. They can create a context, free a context, or set the verbosity of debug messages. That’s it: the very standard boilerplate.

QiProg device discovery

The discovery phase is yet again, boilerplate, albeit a very smart suggestion from Peter. With a single, lightweight call, qiprog_get_device_list(), QiProg scans all devices and presents them in a flat list. This gives us a bunch of qiprog_device pointer. The qiprog_device pointers are at the heart of QiProg. The public API only presents them as pointers, as opaque as the dictionary allows.

Once we have a device pointer, we can try to open the device, ask the device what it can do, and decide whether or not to hire it. Once we hire a device, the real fun begins

The QiProg core

Remember how I said QiProg devices are presented as opaque pointers? This makes them full-fledged objects. Anytime we want to do anything with the device, we have to perform a device operation. This is exactly where the core come is. It makes sure that the operation is dispatched to the correct handler (more on that later), and makes sure that we don’t crash because of programming mistakes. If I had written QiProg in C++, I would have made the qiprog_device an abstract class, and would have hidden the derived classes and their constructors away from the API.

So, what’s in the core?

qiprog_do_action_x(device, action_parameters...);

That’s about it. action_x can be any of the actions in the original QiProg specification. While it might seem that a _lot_ of logic is needed to make this happen, the core is actually ludicrously lightweight.

Inside the core

QiProg is designed to handle more than just USB programmers. This brings the need for different code paths for each class of devices. Internally, QiProg implements a “driver” for each class. This driver is a structure with function pointers. QiProg asks each of these drivers to scan for available devices, and append them to a context-global device list. This is the exact list we get with qiprog_get_device_list().

So, back to the core. Since the drivers are invisible to the outside world, we can’t get those function pointers. This is the job of the core. The core dereferences the device pointer, and sanity-checks it. This sanity checking removes most boilerplate from the application. And now the magic: each device stores a pointer to its associated driver. All the core has to do is dereference the driver and call the appropriate function with the device as the parameter.

Each device gets a void pointer to store private data. The driver decides what to store there and how to use it. That is sufficient to carry all necessary context information, and why the device pointer is passed to each member of the driver. Since there is no need to look up context information, the core is essentially an O(1) operation. This is the reason we can run the core on the embedded QIProg device.

The hidden QiProg core

Yes, QiProg is running on the VultureProg device as well, not just the host. We don’t care about discovery, or any function that does not need a qiprog_device. Those steps can be handled by standard USB requests; all information is in the USB descriptors. The situation, once again, turns interesting when we have a qiprog_device. VultureProg has a qiprog_device as well (and it can have several).

From USB to the core

Any USB transaction will come in through some sort of hardware-specific channel. It’s the nature of the beast. So, the first thought is: “OK, let’s write a bus IO, hook it into the USB handler, and be done”. However, we can make our USB dispatcher forward control requests to QiProg. And this is where a little file that never seems to be included in the build comes in. qiprog_usb_device.c is never compiled in host code, but is our bridge to QiProg on VultureProg devices. It takes USB requests, and forwards them to a real QiProg driver.

Ok, let’s pause for a second:

Yes, we run QiProg drivers inside the little Cortex-M processor, and with QiProg drivers comes the slick QiProg core. There are a few more tricks we use for making several drivers use the same hardware, but they are far too technical. For the curious, I have to words: “doxygen documented”.

Kick-starting with some maintenance

EHCI, USB, LOL, OTG, CBMEM, OMG, CAR. Those have been the topics of my first week of GSoC on the coreboot tree. Dozen or so patches in, same amount waiting on approvals or further actions from me. I was glad to find my mentors with many ideas for refactoring and working actively on reviews. Nice start I would say!

It turned out usbdebug support in coreboot may not be very widely tested, hardware has typically had serial ports available for the same task. With some required bugfixes on cache-as-ram and CBMEM, I now have identical output on usbdebug when compared for CBMEM console and serial console. For my setup, that is. More needs to be done to get AMD boards supported once again. I also get to fix usbdebug receive side to make it a usable pipe for GDB and SerialICE, and I want it to handle USB errors and disconnects gracefully.

On the debugging hardware side things have brightened up quite a bit. While the original Net20DC product is discontinued, I was concerned the only solution is the DIY version pictured on the right. I have then received positive feedback and testing from the community (thanks Denis and Aaron) of using some inexpensive ARM boards as USB debug gadgets. To make them work flawlessy, some modification needs to be done on the USB gadget framework drivers on the kernel side. I should try to find someone already familiar with the gadgets to take this development task as I believe it is of interest for kernel developers too.

Some principal decisions on payloads have been made. I would first add usbdebug support for FILO. I am eagerly waiting for the FILO payload with flashrom to be released, this would gain us methods to program the system flashchip from USB storage, in a pre-os environment.

VultureProg: Meet the contenders

Wow! It’s the weekend already. Where was this last week? With the first week of the 12 weeks of GSoC already gone, I think the time to panic has arrived. Looking back on my original proposal, I realized I have given anyone interested definitive proof I am indeed and indubitably insane. I have offered to design and build hardware, gift it with a fully functional and versatile firmware, write the software to control it, and integrate the functionality into flashrom. To the untrained eye, the last statement resembles a description of four separate GSoC projects. Here at coreboot, that’s all in a day’s work, with enough time for lunch.

Continue reading VultureProg: Meet the contenders

Hello :)

I’m Ayush Sagar from India and I will be working with coreboot this summer on the project “Test set-up for the coreboot distributed firmware test environment featuring greater extensibility, enhanced automation, concurrent high speed firmware flashing and decentralized operation“ under Google Summer of Code 2013.

I’ve almost completed my graduation in Electrical & Electronics Engineering and by training I’m skilled at developing SCADA applications and ladder logic programs which are used for power system and factory automation. However my interests are widely scattered around physics, electrical engineering and computer science. I have been repairing consumer electronics and computer hardware on component level since a very long time as an earning hobby. It’s quite profitable here even today as most people are reluctant about throwing away their belongings. I’m also passionate about programming but I’m new to free and open source software development.

Continue reading Hello 🙂

A biology laboratory: Dissecting the LPC bus

I feel nostalgic over PLCC ROMs. They are by far the most practical way to store firmware. They replaced the huge 32-pin DIP chips whose pins were very likely to bend or break off as one tried to remove them from the socket. The first board I ever tried to hack on had a PLCC32. After I bricked the board, a friend was able to track down Cristi Măgherușan in Cluj. When I went to meet Cristi in the hope that he might be able to salvage the board, I carried the chip in a little plastic bag in my pocket. He was never able to flash it, but this little detail is of less importance. What matters is that LPC interfaces were most popular in PLCC32 chips, and this is the interface I will be primarily focusing on.

Why bother with LPC in the first place?

Parallel buses were the norm in the 80’s and 90’s. The idea is simple. Have a number of data lines, usually 8, 16, or 32, a clock, and maybe handshake signals. When I say the idea is simple, I mean it literally: the hardware is very simple. The clock is fed straight to the flip-flops which latch the data on the bus. The handshake signals also feed straight into logic gates that control the state of the pins (inputs, or outputs), etc. There is no extra logic needed to figure out when to do what on the bus. Having simple buses saved a lot of silicon real estate for the intended purpose of an integrated circuit.

As transistors got smaller, they got faster, but the laws of Physics did not change. Electromagnetic waves in a metal travel at the speed of light (well, a little slower when you consider electrodynamics effects). Since the information is carried on the wavefronts, we really don’t care how fast electrons are moving. If a signal is 10 centimeters longer, it will arrive 0.33 nanoseconds later. On an 8MHz ISA bus (125ns cycle time), this delay is insignificant. The little physics policemen do not stop here, however. Wires are nothing but little series inductors radiating energy to ground planes and other wires like little parallel capacitors. It takes tiny amounts of energy to energize a wire to the right voltage. This process takes a tiny amount of time, usually a few nanoseconds. If we’re lenient, and allow 12ns , add the travel time to our sloppily routed signal, and it takes 12.33 ns from the time we drive the signal to the time it reaches its destination at the correct voltage. That’s 1/10th of our ISA cycle time. To get around this, simple buses drive the data on one edge of the clock, and sample it on the other edge.

Fast-forward to the 90’s and the 33MHz PCI bus. If we allowed 12 ns for any signal to rise, and 12 ns for it to fall, we would already waste 24 ns of the 33 ns allotted for each cycle. We get a tiny 5 ns window to latch the correct data. All of a sudden, routing signals to reduce inductance and capacitance, and match their travel times becomes a lot more important… and expensive. ISA is downgraded to a second-class citizen: It doesn’t make sense to route a high number of additional signals for a low-speed bus, and it doesn’t make sense for low-speed components to move to more expensive PCI silicon. This is the gap the LPC bus fills.

The LPC

LPC eradicates ISA for a 4-bit, 33MHz bus, a clock, and a handshake signal. The 33MHz clock is not an accident. Rather it is meant to be derived directly from the PCI clock. The signaling bandwidth of 16.5 MB/s (33MHz * 0.5 Bytes) is very close to ISA’s 16MB/s (8MHz * 2 Bytes). Handshakes are handled through the data pins, though they require very little silicon logic.

Let’s have a look at what LPC brings us:

LAD[3:0] – 4-bit data
LCLK – 33MHz clock
#LFRAME – Start of frame handshake

Signaling and handshakes happen within a frame. A frame is a predefined sequence of handshakes and data transfers. Somewhere within the frame we transfer the actual data byte. Let’s take a look at what a typical frame looks like:

A bit cryptic until we put some meaning into the signaling:

Aaah! Much better. Here’s what happens:

#LFRAME starts a new frame.
LAD is at 0x0, which indicates the start of a cycle for a target.
In the next cycle, we put 0x4 on the LAD pins. This indicates we want to start a memory read.
The next 8 cycles are the memory address: 0xFFBC0000. This is the JEDEC address for reading the manufacturer ID.
Next, we start a turnaround cycle (TAR) which transfers the control of the bus to the target.

The TAR cycle is supposed to last two clocks. We drive the pins high during the first clock, then float them with a pull-up in the next clock. The chip is supposed to wait until the third clock to send us a SYNC (LAD[3:0] driven low). And the first problem: the SST 49LF080A is guilty of violating the LPC spec. It does not wait for the third clock to send the SYNC. It most likely implements an internal delay, and completely ignores the TAR cycle. It’s these sort of unexpected problems that have prompted me allocated a lot of time in my GSoC project to figuring out how to master the LPC bus. This problem will most likely require ARM assembly blackmagic to more closely match the timing to that of a 33MHz clock.

After the skipped clock, everything looks fine though (If one clock early can really be considered “fine”). We get 0xBF in the data cycle, which is SST’s manufacturer ID. Success!