Last time I talked about the benefits of using sed to transform repetitive low-level patterns into meaninful function calls. And still, doing all that regex magic did not get us a fully working replay. A great portion of the hardware initialization flow is based on situational awareness. What hardware is connected? What are our capabilities? What if …?
That means a simple sequence of writes is, in most cases, not sufficient. We may need to modify registers, wait on other hardware, or respond differently to hardware states. While that seems daunting and tedious, it gives us an unexpected advantage: that every execution of the blob produces a different trace.
This is where diff comes in. By getting a bunch of traces and diffing them, we can see the points where the firmware takes different decisions, and the states which determine those decisions. It won’t tell us what condition triggers path A or path B, but it allows us to infer that by comparing the hardware states. Let’s have a look:
@@ -305,28 +305,28 @@ void run_replay(void) radeon_read_sync(0x6430); /* 04040101 */ radeon_write_sync(0x6430, 0x04000101); radeon_write_sync(0x3f50, 0x00000000); - radeon_read(0x3f54); /* 000dda12 */ - inl(0x2004); /* 000dda8a */ - inl(0x2004); /* 000ddb1a */ - inl(0x2004); /* 000ddb9c */ - ... - inl(0x2004); /* 000de3a0 */ - inl(0x2004); /* 000de41a */ + radeon_read(0x3f54); /* 000d9efb */ + inl(0x2004); /* 000d9f73 */ + inl(0x2004); /* 000da003 */ + inl(0x2004); /* 000da081 */ + ... + inl(0x2004); /* 000da883 */ + inl(0x2004); /* 000da8fd */ radeon_read_sync(0x611c); /* 00000000 */ radeon_write_sync(0x611c, 0x00000002); radeon_write_sync(0x6ccc, 0x00007fff);
I’ve chosen an example that shows similarities rather than differences, as I find this to be a more interesting case. Since we’ve already established that 0x2004 is our data port, as long as we don’t touch the index port, we’ll be reading from the same register, in this case 0x3f54.
Now the values returned by this register are completely different in every trace, yet the behavior of the blob is strikingly similar every time. The first key observation is that this register increase monotonically in every trace. It also increases by roughly the same amount on successive reads. The differences between the last read and first read of the register are also strikingly similar in both traces: 0xa08 and 0xa02 respectively.
This register looks to be a monotonic timer, and the loop has all the elements of a delay loop. To determine the actual delay, we could try to extract absolute timing information when collecting the trace; however, in this specific case, I had the AtomBIOS tables handy. By comparing register accesses around this loop, I was able to figure out where in the tables this delay is occuring:
0200: 0d250c1901 OR reg[190c] [...X] <- 01 0205: 54300c19 CLEAR reg[190c] [.X..] 0209: 5132 DELAY_MicroSec 32
The ’32’ in the delay is a hex number. Doing a bit of hex math we see we’re waiting about 51 ticks per microsecond. Comparing more loops, we get between 50 to 52 ticks per microsecond. Since a delay loop normally waits until the minimum time has elapsed, we now have a very convincing case that register 0x3f54 implements a 50MHz monotonic timer. Every time before accessing this register, we also poke register 0x3f50. That looks very much like the timer control register.
We now extend our sed script with:
timerctl=0x3f50 timer=0x3f54 ... sed "s/radeon_read($timer);[^$hex\r]*\([$hex]*\)[^\r]*\(\r\tinl($dport);[^$hex\r]*\([$hex]*\)[^\r]*\)*/radeon_delay(0x\3 - 0x\1);/g" | sed "s/radeon_write_sync($timerctl, 0x0\{1,8\});[^\r]*\r\tradeon_delay(/radeon_delay(/g"
Now when we rerun our logs through the script, the results decrease in size from 20K lines to 13K lines. The diffs between processed logs also decrease in size significantly. All the more proof we were right!
There’s another way in which diff is excellent for our purpose. We can implement our helpers to generate the same output as the processed logs. That allows us to poke the replay from userspace, yet get the same output format. Now we can diff the replay and original log, and observe how the hardware state changes. We can even go as far as implementing our delay with usleep() instead of the timer at 0x3f54. When the diff is independent of the delay method we use, we have another strong proof our assumptions are true. This is the case here.
‘diff’ is an extremely powerful tool. Despite its name, it can show similarities just as well as differences. While regular expressions exaust their usability with simple patterns, diff can take us a lot further. Now that we’ve cracked the delay implementation of the blob, we can more easily see delay and wait loops — again, using diff. Complex, multivariable patterns are too awkward to handle with sed. I’ll go over those some other time. However, once such patterns are simplified to a function call, diff can once again show the story. Different GPU model? diff. New display? diff. HDMI connected? diff. It’s almost as versatile as det cord .