In the last post I talked about using aarch64-linux-gnu-gdb and debugging in qemu. In these two weeks I was intensely involved in stepping through gdb, disassembly and in-turn debugging the qemu port. I summarise the major highlights below.
Firstly, the correct instruction to invoke qemu is as follows
./aarch64-softmmu/qemu-system-aarch64 -machine virt -cpu cortex-a57 -machine type=virt -nographic -smp 1 -m 2048 -bios ~/coreboot/build coreboot.rom -s -S
After invoking gdb, I moved onto tracing the execution of the instructions step by step to determine where and how the code fails. A compendium of the code execution is as follows
gdb) target remote :1234Remote debugging using :1234(gdb) set disassemble-next-line on(gdb) stepi0x0000000000000980 in ?? ()=> 0x0000000000000980: 02 00 00 14 b 0x988(gdb)0x0000000000000988 in ?? ()=> 0x0000000000000988: 1a 00 80 d2 mov x26, #0x0 // #0(gdb)0x000000000000098c in ?? ()=> 0x000000000000098c: 02 00 00 14 b 0x994(gdb) cContinuing.^CProgram received signal SIGINT, Interrupt.0x0000000000000750 in ?? ()=> 0x0000000000000750: 3f 08 00 71 cmp w1, #0x2
The detailed version can be seen here.
The first sign of error can be seen here, where the instruction is 0 and the address is way off.
0x64672d3337303031 in ?? () => 0x64672d3337303031: 00 00 00 00 .inst 0x00000000 ; undefined
To find insights as to why this is happening, I resorted to tracing in gdb. This can be done by adding the following in the qemu invoke command. This creates a log file in /tmp which can be read to determine suitable information.
-d out_asm,in_asm,exec,cpu,int,guest_errors -D /tmp/qemu.log
Looking at the disassembly, it can be seen that execution of instructions till 0x784 is correct and it goes bonkers immediately after it. Looking at the trace, this is where the code hangs
IN:0x0000000000000784: d65f03c0 ret
0x0000000000000908: 97fffe06 bl #-0x7e8 (addr 0x120)…0x0000000000000120: 3800a017 sturb w23, [x0, #10]0x0000000000000124: 001c00d5 unallocated (Unallocated)…
Taking exception 1 [Undefined Instruction]…from EL1…with ESR 0x2000000
0000000000010908 <arm64_c_environment>:10908: 97fffe06 bl 10120 <loop3_csw+0x1b>1090c: aa0003f8 mov x24, x0
10908: 97fffe06 bl 10120 <loop3_csw+0x1b>
0x0000000000000908: 97fffe06 bl #-0x7e8 (addr 0x120)
Now loop3_csw is defined at (from objdump)
Thus it wants to branch and link to 0x120 but smp_processor_id is at 121.
smp_processor_id is at (from objdump)