[UTCTF 2023 - reverse] Welcome
UTCTF 2023: Welcome
Welcome
1000
Note: while this challenge is nominally RE, there is some crypto-level/crypto-style math involved too.
Welcome to UTCTF! I made a special last minute program just for you to display a wonderful welcome message (+ flag!) I may have accidentally (okay… purposely) made a small bug in my math > that makes this unsolvable(TM). Can you figure it out and fix it for me please?
By Jeriah (@jyu on discord)
Round 1: analysis of the program structure
In this challenge, we are given a “main.elf” binary which is a AVR-8 ELF executable that opens nice in Ghidra. In the “Language” dropdown, I selected “AVR8 atmega256” variant. Given the binary has debug symbols, the “main” function can be spotted easily, and looks like this:
code:000c5b cf 93 push Ylo
code:000c5c df 93 push Yhi
code:000c5d 00 d0 rcall
code:000c5e 00 d0 rcall
code:000c5f 00 d0 rcall
code:000c60 cd b7 in Ylo,SPL
code:000c61 de b7 in Yhi,SPH
code:000c62 78 94 bset Iflg
code:000c63 84 b5 in R24,DAT_mem_0044 = ??
code:000c64 82 60 ori R24,0x2
code:000c65 84 bd out DAT_mem_0044,R24 = ??
code:000c66 84 b5 in R24,DAT_mem_0044 = ??
code:000c67 81 60 ori R24,0x1
To ease analysis a bit, let’s use the decompiler, to have a somewhat “easy-to-understand” pseudo-C code. After some global variable initializations, we get huge part of the main function that initializes devices, and according to debug symbols it was a “LiquidCrystal” LCD display.
After looking at the Adafruit_LiquidCrystal source code on Github, I figured out that that huge block of code was just the class constructor and “begin” method that got inlined into the main() function. The usage of those libs also give us the hint that an Arduino board, which doesn’t have “main()” function, but have “init()” and “loop()” functions, that get called by a SDK-generated main function. So, the real fun would happen in the “loop” function, which can be easily spotted thanks to the disassembly graph.
Then, in the main loop, we get decompiled code that looks like this:
do {
auStack_d = (undefined [3])0xd70;
Adafruit_LiquidCrystal::setCursor('\0',(byte)R23R22);
R3 = R1;
R2 = R1;
while( true ) {
*(byte *)(Y + 2) = R3;
*(byte *)(Y + 1) = R2;
R25R24._0_1_ = 1;
*(undefined *)(Y + 6) = 0;
*(byte *)(Y + 5) = (byte)R25R24;
*(undefined *)(Y + 3) = 0xb;
R5 = R1;
R7R6 = CONCAT11(R1,R1);
R9R8 = CONCAT11(R1,R1);
*(byte *)(Y + 4) = R1;
while (Z = *(Adafruit_I2CDevice **)(Y + 1), Z != (Adafruit_I2CDevice *)0x0) {
if (((uint)Z & 1) != 0) {
R11R10._0_1_ = *(undefined *)(Y + 3);
R11R10 = CONCAT11(R5,(byte)R11R10);
R13R12 = R7R6;
R15R14 = R9R8;
auStack_d = (undefined [3])0xd9a;
__muldi3();
Z = (Adafruit_I2CDevice *)CONCAT11(28,(byte)Z);
R11R10 = CONCAT11(R1,28);
R13R12 = CONCAT11(R1,R1);
R15R14 = CONCAT11(R1,R1);
auStack_d = (undefined [3])0xda5;
__moddi3();
*(undefined *)(Y + 5) = R18;
*(byte *)(Y + 6) = R19;
}
R25R24._0_1_ = *(byte *)(Y + 1);
R25R24._1_1_ = *(byte *)(Y + 2);
tempvar = R25R24._1_1_;
R25R24._1_1_ = R25R24._1_1_ >> 1;
R25R24._0_1_ = tempvar << 7 | (byte)R25R24 >> 1;
*(byte *)(Y + 2) = R25R24._1_1_;
*(byte *)(Y + 1) = (byte)R25R24;
R11R10._0_1_ = *(undefined *)(Y + 3);
R11R10 = CONCAT11(R5,(byte)R11R10);
R13R12 = R7R6;
R15R14 = R9R8;
auStack_d = (undefined [3])0xdbb;
__muldi3();
Z = (Adafruit_I2CDevice *)CONCAT11(Z._1_1_,28);
R11R10 = CONCAT11(R1,28);
R13R12 = CONCAT11(R1,R1);
R15R14 = CONCAT11(R1,R1);
auStack_d = (undefined [3])0xdc6;
__moddi3();
*(undefined *)(Y + 3) = R18;
R5 = R19;
R7R6 = R21R20;
R9R8 = R23R22;
*(byte *)(Y + 4) = (byte)R25R24;
}
Z._0_1_ = *(byte *)(Y + 5);
Z._1_1_ = *(char *)(Y + 6);
Z = (Adafruit_I2CDevice *)
CONCAT11((Z._1_1_ * '\x02' + CARRY1((byte)Z,(byte)Z)) -
(((byte)((byte)Z * '\x02') < 242) + -2),(byte)Z * '\x02' + 14);
R23R22._0_1_ = *(byte *)Z;
auStack_d = (undefined [3])0xe1b;
Adafruit_LiquidCrystal::send(&out,(byte)R23R22,true);
auStack_d = (undefined [3])0xe1d;
After trying to understand this non-sense code the first day, I gave up and fled into the wonderful world of anime. Then, I decided to give this challenge another try, and decided to focus on the actual assembler instructions rather than relying on a broken pseudo-C.
Round 2: AVR assembly and some math go brrrr
Inside the “loop” function, we get this initialization block:
mfw_loopped XREF[3]: code:000ce0(j), code:000e53(j),
code:000e56(j)
code:000d6d 80 e0 ldi R24,0x0
code:000d6e 0e 94 87 0a call Adafruit_LiquidCrystal::setCursor undefined setCursor(uchar param_
code:000d70 31 2c mov R3,R1
code:000d71 21 2c mov R2,R1
init_sparta XREF[2]: code:000e0a(j), code:000e0e(j)
code:000d72 3a 82 std Y+0x2,R3 Y + 2 = R3 (0 ?)
code:000d73 29 82 std Y+0x1,R2 Y + 1 = R2 (0 ?)
code:000d74 81 e0 ldi R24,0x1
code:000d75 90 e0 ldi R25,0x0
code:000d76 9e 83 std Y+0x6,R25 Y + 6 = 0
code:000d77 8d 83 std Y+0x5,R24 Y + 5 = 1
code:000d78 9b e0 ldi R25,0xb
code:000d79 9b 83 std Y+0x3,R25 Y + 3 = 0xb (11)
code:000d7a 51 2c mov R5,R1
code:000d7b 61 2c mov R6,R1
code:000d7c 71 2c mov R7,R1
code:000d7d 81 2c mov R8,R1
code:000d7e 91 2c mov R9,R1
code:000d7f 1c 82 std Y+0x4,R1 Y + 4 = 0
code:000d80 41 2c mov R4,R1 R4 - R9 = 0
Followed by:
where_the_fun_begins XREF[1]: code:000dcc(j)
code:000d81 e9 81 ldd Zlo,Y+0x1
code:000d82 fa 81 ldd Zhi,Y+0x2
code:000d83 30 97 sbiw Z,0x0
code:000d84 09 f4 brbc multiply_exponent,Zflg
code:000d85 89 c0 rjmp sendMessage
multiply_exponent XREF[1]: code:000d84(j)
code:000d86 e0 ff sbrs Zlo,0x0
code:000d87 1f c0 rjmp bit0_empty
code:000d88 9e 81 ldd R25,Y+0x6 R25 = Y + 6
code:000d89 99 0f add R25,R25
code:000d8a 99 0b sbc R25,R25
code:000d8b ab 80 ldd R10,Y+0x3 R10 = (Y + 3) | (Y + 4) << 48 |
code:000d8c b5 2c mov R11,R5
code:000d8d 63 01 movw R13R12,R7R6
code:000d8e 74 01 movw R15R14,R9R8
code:000d8f 0c 81 ldd R16,Y+0x4
code:000d90 14 2d mov R17,R4
code:000d91 2d 81 ldd R18,Y+0x5 R18 = ((Y + 5),(Y + 6))
code:000d92 3e 81 ldd R19,Y+0x6
code:000d93 49 2f mov R20,R25
code:000d94 59 2f mov R21,R25
code:000d95 69 2f mov R22,R25
code:000d96 79 2f mov R23,R25
code:000d97 89 2f mov R24,R25
code:000d98 0e 94 79 0e call __muldi3 undefined __muldi3(void)
code:000d9a fc e1 ldi Zhi,28
code:000d9b af 2e mov R10,Zhi
code:000d9c b1 2c mov R11,R1
code:000d9d c1 2c mov R12,R1
code:000d9e d1 2c mov R13,R1
code:000d9f e1 2c mov R14,R1
code:000da0 f1 2c mov R15,R1
code:000da1 00 e0 ldi R16,0x0
code:000da2 10 e0 ldi R17,0x0
code:000da3 0e 94 d1 0e call __moddi3 undefined __moddi3(void)
code:000da5 2d 83 std Y+0x5,R18 (Y + 5), (Y + 6) = _muldi3 % 28
code:000da6 3e 83 std Y+0x6,R19
bit0_empty XREF[1]: code:000d87(j)
code:000da7 89 81 ldd R24,Y+0x1
code:000da8 9a 81 ldd R25,Y+0x2
code:000da9 96 95 lsr R25
code:000daa 87 95 ror R24
code:000dab 9a 83 std Y+0x2,R25
code:000dac 89 83 std Y+0x1,R24 (Y+1) = (Y+1) / 2
code:000dad ab 80 ldd R10,Y+0x3
code:000dae b5 2c mov R11,R5
code:000daf 63 01 movw R13R12,R7R6
code:000db0 74 01 movw R15R14,R9R8
code:000db1 0c 81 ldd R16,Y+0x4
code:000db2 14 2d mov R17,R4
code:000db3 2a 2d mov R18,R10
code:000db4 35 2d mov R19,R5
code:000db5 a3 01 movw R21R20,R7R6
code:000db6 b4 01 movw R23R22,R9R8
code:000db7 80 2f mov R24,R16
code:000db8 94 2d mov R25,R4
code:000db9 0e 94 79 0e call __muldi3 Y3Y4 = Y3Y4 ** 2
code:000dbb ec e1 ldi Zlo,28
code:000dbc ae 2e mov R10,Zlo
code:000dbd b1 2c mov R11,R1
code:000dbe c1 2c mov R12,R1
code:000dbf d1 2c mov R13,R1
code:000dc0 e1 2c mov R14,R1
code:000dc1 f1 2c mov R15,R1
code:000dc2 00 e0 ldi R16,0x0
code:000dc3 10 e0 ldi R17,0x0
code:000dc4 0e 94 d1 0e call __moddi3 undefined __moddi3(void)
code:000dc6 2b 83 std Y+0x3,R18 Y + 3 gets the modulus
code:000dc7 53 2e mov R5,R19
code:000dc8 3a 01 movw R7R6,R21R20
code:000dc9 4b 01 movw R9R8,R23R22
code:000dca 8c 83 std Y+0x4,R24
code:000dcb 49 2e mov R4,R25
code:000dcc b4 cf rjmp where_the_fun_begins
The “init_sparta” seems to initialize some stack variables:
*(short*)(Y+5) = 1
*(short*)(Y+3) = 0xb
*(short*)(Y+1) = 0
Then, the label “where_the_fun_begins” check if the Z = *(short*)(Y+1)
register is equals to zero, and jumps magic stuff that print a letter on the LCD screen.
Otherwise, the code checks the first bit of Y register, and jumps to “bit0_empty” if the bit is 0.
Let’s focus on the case when bit0_empty is set to 0: The first instructions:
code:000da7 89 81 ldd R24,Y+0x1
code:000da8 9a 81 ldd R25,Y+0x2
code:000da9 96 95 lsr R25
code:000daa 87 95 ror R24
is just a division by two optimized by GCC (it’s even stated on AVR’s instruction reference). For people used to crypto, an algorithm that checks the lowest bit of a variable and then divides this variable by two (and call to multiplication and modulo functions), smells like modular exponentiation but let’s continue our journey.
There is a lot of stuff moved into registers before a call to a “__muldi3” call: this call is a compier-provided function that get called on architectures that don’t have opcodes to multiply 64-bit integers. And it seems that the first 64-bit argument is passed through registers R18-R25, and the second 64-bit argument through R10-R17. And since registers from R5 to R9 have been set to 0, we end up with this pseudo-code:
{R18-R25} = _muldi3(*(short*)(Y+3),*(short*)(Y+3))
which is basically a power of two, reinforcing the hypothesis of a modular exponentiation.
The next call follows the same logic, except since the return value of _muldi3 is already stored into the registers used to pass the first argument, only the second argument is loaded into {R10-R17}, which is our modulus, 28
.
This means, that if the first bit of our “Y+1” variable is set to 0 (which means it’s a multiple of 2), we are basically doing:
Y3 = (Y3*Y3) % 28
If the first bit of “Y+1” variable is set to 1, then we enter into the “multiply_exponent” label, where funny GCC optimizations happen (for the record, I understood some parts of those optimizations when writing this writeup, I flagged by making an educated guess on modular exponentiation):
code:000d88 9e 81 ldd R25,Y+0x6 R25 = Y + 6
code:000d89 99 0f add R25,R25
code:000d8a 99 0b sbc R25,R25
code:000d8b ab 80 ldd R10,Y+0x3 R10 = (Y + 3) | (Y + 4) << 48
code:000d8c b5 2c mov R11,R5
code:000d8d 63 01 movw R13R12,R7R6
code:000d8e 74 01 movw R15R14,R9R8
code:000d8f 0c 81 ldd R16,Y+0x4
code:000d90 14 2d mov R17,R4
In our case, R25 will be always 0 (since R24R25 only contain an integer modulo 28) and the same happens for the byte in Y+4
.
Basically, the code does:
*(short*)Y5 = (*(short*)Y5) * (*(short*)Y3) % 28
This can be translated by:
while(Y1) {
if(Y1 & 1) {
Y5 = (Y5) * (Y3) % 28
}
Y1 = Y1 / 2;
Y3 = (Y3*Y3) % 28
}
And bingo, the educated guess was correct :รพ
Now let’s make another educated guess and considering we can get the flag with
lol = b'\x00uewefi_}{_tbmgleophcopb_aleo'
exp = 11
l = []
for i in range(28):
l.append(chr(lol[pow(exp, i, 28)]))
print("".join(l))
And… ut{labut{labut{labut{l
, which obviously does not look like a flag.
The problem here is that the modulus 28 is not a prime number, which can lead to funny things, like making impossible to find a generator of the set of numbers from 1 to n-1.
But fortunately, 28 = 29-1, and 29 is a prime number, so let’s change 28 by 29:
lol = b'\x00uewefi_}{_tbmgleophcopb_aleo'
exp = 11
l = []
for i in range(29):
l.append(chr(lol[pow(exp, i, 29)]))
print("".join(l))
… and utflag{beep_boop_welcome_hi}u