[Write-up] Encrypted Box - Barbhack 2023

Aug 29, 2023

logobb

Encrypted Box is a challenge from Barbhack 2023. It requires automating the decryption of code through 1000 stages and reversing the aesenc/aesdec CPU instructions.

Each stage is nested within another. To understand the challenge, we only need to examine a few ASM instructions. Here is what a single stage looks like:

.text:0000000000400078 start:
[...syscall setup]
.text:0000000000400088                 syscall                 ; LINUX - sys_read
[...Setup some data in xmm0, and key in xmm1]
.text:4001B8                 pinsrq  xmm2, qword ptr [rsp+8], 1 ; User input it written to xmm2
.text:4001C1                 pinsrq  xmm2, qword ptr [rsp], 0
.text:4001C9                 aesdec  xmm2, xmm1
.text:4001CE                 comiss  xmm2, xmm0
.text:4001D1                 jnz     short loc_40020E
.text:4001D3                 mov     rcx, 66B80h
.text:4001DA                 pinsrq  xmm2, qword ptr [rsp+8], 1
.text:4001E3                 pinsrq  xmm2, qword ptr [rsp], 0
.text:4001EB
.text:4001EB dec_loop:                               ; CODE XREF: .text:000000000040020Aj
.text:4001EB                 sub     rcx, 10h
.text:4001EF                 lea     rax, next_stage
.text:4001F6                 add     rax, rcx
.text:4001F9                 movdqu  xmm0, xmmword ptr [rax]    ; User input it written to xmm2
.text:4001FD                 aesdec  xmm0, xmm2
.text:400202                 movdqu  xmmword ptr [rax], xmm0
.text:400206                 cmp     rcx, 0
.text:40020A                 jnz     short dec_loop
.text:40020C                 jmp     short next_stage
.text:40020E ; ---------------------------------------------------------------------------
.text:40020E
.text:40020E loc_40020E:                             ; CODE XREF: .text:00000000004001D1j
.text:40020E                                         ; .text:0000000000400255j
.text:40020E                 mov     rax, 3Ch ; '<'
.text:400215                 syscall                 ; LINUX - sys_exit
.text:400217
.text:400217 next_stage:                             ; CODE XREF: .text:000000000040020Cj
[...junk code]

Upon analyzing this assembly, we quickly notice that the only impact we have on the executable is setting up xmm2 at addresses 0x4001DA and 0x4001E3.

This register will be :

  • used in aesdec instruction as the state ;
  • compared after decryption to data stored in xmm0.

The encryption key is stored in xmm1, and some data is stored in xmm0. Both values are fixed.

Our goal is to find an input that, when used in aesdec, will result in the same value as in xmm0.

We need to encrypt xmm0 in a way that: decrypt(data=xmm2, key=xmm1) == xmm0 <=> encrypt(data=xmm0, key=xmm1) == xmm2.

This implies that we need to “reverse” the following instruction: aesenc xmm0, xmm1.

aesenc performs a single round of AES encryption, while aesdec performs a single round of AES decryption.

It is important to note that aesenc is not a direct symmetric operation of aesdec. In other words, A != aesdec(aesenc(A)).

This distinction becomes clear when reading Intel’s Manual, which states that aesenc performs the following operations:

STATE := SRC1;
RoundKey := SRC2;
STATE := ShiftRows( STATE );
STATE := SubBytes( STATE );
STATE := MixColumns( STATE );
DEST[127:0] := STATE XOR RoundKey;
DEST[MAXVL-1:128] (Unmodified)

While aesdec performs the following operations:

STATE := SRC1;
RoundKey := SRC2;
STATE := InvShiftRows( STATE );
STATE := InvSubBytes( STATE );
STATE := InvMixColumns( STATE );
DEST[127:0] := STATE XOR RoundKey;
DEST[MAXVL-1:128] (Unmodified)

To “revert” an aesenc operation, one would need to reverse the order or operations, as follows:

DEST[127:0] := STATE XOR RoundKey;
STATE := InvMixColumns( STATE );
STATE := InvSubBytes( STATE );
STATE := ShiftRows( STATE );

which are essentially aesdec operations in reverse order.

The same principle applies to “reverting” an aesdec instruction:

DEST[127:0] := STATE XOR RoundKey;
STATE := MixColumns( STATE );
STATE := SubBytes( STATE );
STATE := ShiftRows( STATE );

Reversing a single aesenc / aesdec instruction is something I encountered in 2020 at FCSC (French Cybersecurity Challenge) with the keykoolol challenge. Two write-ups are available online (1, 2), altough they do not explain how to or provide code to revert both aesdec and aesenc (I noticed after the CTF that the first writeup has a Rust routine that inverts aesenc).

Mathis Hammel provided me with the following Python script, which implements AESDECINV and AESENCINV, to respectively invert AESDEC and AESENC instructions:

def xor(a, b):
    return ''.join(chr(ord(ac) ^ ord(bc)) for ac, bc in zip(a, b))


Sbox = (
    0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76,
    0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0,
    0xB7, 0xFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, 0xCC, 0x34, 0xA5, 0xE5, 0xF1, 0x71, 0xD8, 0x31, 0x15,
    0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A, 0x07, 0x12, 0x80, 0xE2, 0xEB, 0x27, 0xB2, 0x75,
    0x09, 0x83, 0x2C, 0x1A, 0x1B, 0x6E, 0x5A, 0xA0, 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84,
    0x53, 0xD1, 0x00, 0xED, 0x20, 0xFC, 0xB1, 0x5B, 0x6A, 0xCB, 0xBE, 0x39, 0x4A, 0x4C, 0x58, 0xCF,
    0xD0, 0xEF, 0xAA, 0xFB, 0x43, 0x4D, 0x33, 0x85, 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8,
    0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5, 0xBC, 0xB6, 0xDA, 0x21, 0x10, 0xFF, 0xF3, 0xD2,
    0xCD, 0x0C, 0x13, 0xEC, 0x5F, 0x97, 0x44, 0x17, 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73,
    0x60, 0x81, 0x4F, 0xDC, 0x22, 0x2A, 0x90, 0x88, 0x46, 0xEE, 0xB8, 0x14, 0xDE, 0x5E, 0x0B, 0xDB,
    0xE0, 0x32, 0x3A, 0x0A, 0x49, 0x06, 0x24, 0x5C, 0xC2, 0xD3, 0xAC, 0x62, 0x91, 0x95, 0xE4, 0x79,
    0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9, 0x6C, 0x56, 0xF4, 0xEA, 0x65, 0x7A, 0xAE, 0x08,
    0xBA, 0x78, 0x25, 0x2E, 0x1C, 0xA6, 0xB4, 0xC6, 0xE8, 0xDD, 0x74, 0x1F, 0x4B, 0xBD, 0x8B, 0x8A,
    0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E,
    0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF,
    0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16,
)

InvSbox = (
    0x52, 0x09, 0x6A, 0xD5, 0x30, 0x36, 0xA5, 0x38, 0xBF, 0x40, 0xA3, 0x9E, 0x81, 0xF3, 0xD7, 0xFB,
    0x7C, 0xE3, 0x39, 0x82, 0x9B, 0x2F, 0xFF, 0x87, 0x34, 0x8E, 0x43, 0x44, 0xC4, 0xDE, 0xE9, 0xCB,
    0x54, 0x7B, 0x94, 0x32, 0xA6, 0xC2, 0x23, 0x3D, 0xEE, 0x4C, 0x95, 0x0B, 0x42, 0xFA, 0xC3, 0x4E,
    0x08, 0x2E, 0xA1, 0x66, 0x28, 0xD9, 0x24, 0xB2, 0x76, 0x5B, 0xA2, 0x49, 0x6D, 0x8B, 0xD1, 0x25,
    0x72, 0xF8, 0xF6, 0x64, 0x86, 0x68, 0x98, 0x16, 0xD4, 0xA4, 0x5C, 0xCC, 0x5D, 0x65, 0xB6, 0x92,
    0x6C, 0x70, 0x48, 0x50, 0xFD, 0xED, 0xB9, 0xDA, 0x5E, 0x15, 0x46, 0x57, 0xA7, 0x8D, 0x9D, 0x84,
    0x90, 0xD8, 0xAB, 0x00, 0x8C, 0xBC, 0xD3, 0x0A, 0xF7, 0xE4, 0x58, 0x05, 0xB8, 0xB3, 0x45, 0x06,
    0xD0, 0x2C, 0x1E, 0x8F, 0xCA, 0x3F, 0x0F, 0x02, 0xC1, 0xAF, 0xBD, 0x03, 0x01, 0x13, 0x8A, 0x6B,
    0x3A, 0x91, 0x11, 0x41, 0x4F, 0x67, 0xDC, 0xEA, 0x97, 0xF2, 0xCF, 0xCE, 0xF0, 0xB4, 0xE6, 0x73,
    0x96, 0xAC, 0x74, 0x22, 0xE7, 0xAD, 0x35, 0x85, 0xE2, 0xF9, 0x37, 0xE8, 0x1C, 0x75, 0xDF, 0x6E,
    0x47, 0xF1, 0x1A, 0x71, 0x1D, 0x29, 0xC5, 0x89, 0x6F, 0xB7, 0x62, 0x0E, 0xAA, 0x18, 0xBE, 0x1B,
    0xFC, 0x56, 0x3E, 0x4B, 0xC6, 0xD2, 0x79, 0x20, 0x9A, 0xDB, 0xC0, 0xFE, 0x78, 0xCD, 0x5A, 0xF4,
    0x1F, 0xDD, 0xA8, 0x33, 0x88, 0x07, 0xC7, 0x31, 0xB1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xEC, 0x5F,
    0x60, 0x51, 0x7F, 0xA9, 0x19, 0xB5, 0x4A, 0x0D, 0x2D, 0xE5, 0x7A, 0x9F, 0x93, 0xC9, 0x9C, 0xEF,
    0xA0, 0xE0, 0x3B, 0x4D, 0xAE, 0x2A, 0xF5, 0xB0, 0xC8, 0xEB, 0xBB, 0x3C, 0x83, 0x53, 0x99, 0x61,
    0x17, 0x2B, 0x04, 0x7E, 0xBA, 0x77, 0xD6, 0x26, 0xE1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0C, 0x7D,
)

xtime = lambda a: (((a << 1) ^ 0x1B) & 0xFF) if (a & 0x80) else (a << 1)

Rcon = (
    0x00, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40,
    0x80, 0x1B, 0x36, 0x6C, 0xD8, 0xAB, 0x4D, 0x9A,
    0x2F, 0x5E, 0xBC, 0x63, 0xC6, 0x97, 0x35, 0x6A,
    0xD4, 0xB3, 0x7D, 0xFA, 0xEF, 0xC5, 0x91, 0x39,
)


def text2matrix(text):
    matrix = []
    for i in range(16):
        byte = text[i]
        if i % 4 == 0:
            matrix.append([byte])
        else:
            matrix[i // 4].append(byte)
    return matrix


def matrix2text(matrix):
    text = []
    for i in range(4):
        for j in range(4):
            text.append(matrix[i][j])
    return bytes(text)


class AES:
    def init(self, master_key=None):
        if master_key:
            self.change_key(master_key)

    def change_key(self, master_key):
        self.round_keys = text2matrix(master_key)

        for i in range(4, 4 * 11):
            self.round_keys.append([])
            if i % 4 == 0:
                byte = self.round_keys[i - 4][0] \
                       ^ Sbox[self.round_keys[i - 1][1]] \
                       ^ Rcon[i // 4]
                self.round_keys[i].append(byte)

                for j in range(1, 4):
                    byte = self.round_keys[i - 4][j] \
                           ^ Sbox[self.round_keys[i - 1][(j + 1) % 4]]
                    self.round_keys[i].append(byte)
            else:
                for j in range(4):
                    byte = self.round_keys[i - 4][j] \
                           ^ self.round_keys[i - 1][j]
                    self.round_keys[i].append(byte)

    def encrypt(self, plaintext):
        self.plain_state = text2matrix(plaintext)
        self.add_round_key(self.plain_state, self.round_keys[:4])
        for i in range(1, 10):
            self.round_encrypt(self.plain_state, self.round_keys[4 * i: 4 * (i + 1)])
        self.sub_bytes(self.plain_state)
        self.shift_rows(self.plain_state)
        self.add_round_key(self.plain_state, self.round_keys[40:])

        return matrix2text(self.plain_state)


    def AESENCLAST(self, plaintext, round_key):
        state = text2matrix(plaintext)
        rkey = text2matrix(round_key)
        
        self.sub_bytes(state)
        self.shift_rows(state)
        self.add_round_key(state, rkey)

        return matrix2text(state)

    def AESENC(self, plaintext, round_key):
        state = text2matrix(plaintext)
        rkey = text2matrix(round_key)

        self.shift_rows(state)
        self.sub_bytes(state)
        self.mix_columns(state)
        self.add_round_key(state, rkey)

        return matrix2text(state)

    def AESDECINV(self, plaintext, round_key):
        state = text2matrix(plaintext)
        rkey = text2matrix(round_key)

        self.add_round_key(state, rkey)
        self.mix_columns(state)
        self.sub_bytes(state)
        self.shift_rows(state)

        return matrix2text(state)

    def AESDECLAST(self, ciphertext, round_key):
        state = text2matrix(ciphertext)
        rkey = text2matrix(round_key)

        self.add_round_key(state, rkey)
        self.inv_sub_bytes(state)
        self.inv_shift_rows(state)

        return matrix2text(state)

    def AESDEC(self, ciphertext, round_key):
        state = text2matrix(ciphertext)
        rkey = text2matrix(round_key)
        
        self.inv_shift_rows(state)
        self.inv_sub_bytes(state)
        self.inv_mix_columns(state)
        self.add_round_key(state, rkey)

        return matrix2text(state)

    def AESENCINV(self, ciphertext, round_key):
        state = text2matrix(ciphertext)
        rkey = text2matrix(round_key)

        self.add_round_key(state, rkey)
        self.inv_mix_columns(state)
        self.inv_sub_bytes(state)
        self.inv_shift_rows(state)
        

        return matrix2text(state)

    def decrypt(self, ciphertext):
        self.cipher_state = text2matrix(ciphertext)
        self.add_round_key(self.cipher_state, self.round_keys[40:])
        self.inv_shift_rows(self.cipher_state)
        self.inv_sub_bytes(self.cipher_state)
        for i in range(9, 0, -1):
            self.round_decrypt(self.cipher_state, self.round_keys[4 * i: 4 * (i + 1)])
        self.add_round_key(self.cipher_state, self.round_keys[:4])
        return matrix2text(self.cipher_state)

    def add_round_key(self, s, k):
        for i in range(4):
            for j in range(4):
                s[i][j] ^= k[i][j]

    def sr_encrypt(self, plaintext, key):
        state_matrix = text2matrix(plaintext)
        key_matrix = text2matrix(key)

        self.round_encrypt(state_matrix, key_matrix)

        return matrix2text(state_matrix)

    def round_encrypt(self, state_matrix, key_matrix):
        self.sub_bytes(state_matrix)
        self.shift_rows(state_matrix)
        self.mix_columns(state_matrix)
        self.add_round_key(state_matrix, key_matrix)

    def sr_decryptlast(self, plaintext, key):
        state_matrix = text2matrix(plaintext)
        key_matrix = text2matrix(key)
        self.add_round_key(state_matrix, key_matrix)
        self.inv_shift_rows(state_matrix)
        self.inv_sub_bytes(state_matrix)
        return matrix2text(state_matrix)

    def sr_decrypt(self, plaintext, key):
        state_matrix = text2matrix(plaintext)
        key_matrix = text2matrix(key)

        self.round_decrypt(state_matrix, key_matrix)

        return matrix2text(state_matrix)

    def round_decrypt(self, state_matrix, key_matrix):
        self.add_round_key(state_matrix, key_matrix)
        self.inv_shift_rows(state_matrix)
        self.inv_sub_bytes(state_matrix)

    def sub_bytes(self, s):
        for i in range(4):
            for j in range(4):
                s[i][j] = Sbox[s[i][j]]

    def x_sub_bytes(self, s):
        s = text2matrix(s)
        self.sub_bytes(s)
        return matrix2text(s)

    def x_mix_columns(self, s):
        s = text2matrix(s)
        self.mix_columns(s)
        return matrix2text(s)

    def x_inv_sub_bytes(self, s):
        s = text2matrix(s)
        self.inv_sub_bytes(s)
        return matrix2text(s)

    def x_inv_mix_columns(self, s):
        s = text2matrix(s)
        self.inv_mix_columns(s)
        return matrix2text(s)

    def x_inv_shift_rows(self, s):
        s = text2matrix(s)
        self.inv_shift_rows(s)
        return matrix2text(s)

    def x_shift_rows(self, s):
        s = text2matrix(s)
        self.shift_rows(s)
        return matrix2text(s)

    def inv_sub_bytes(self, s):
        for i in range(4):
            for j in range(4):
                s[i][j] = InvSbox[s[i][j]]

    def shift_rows(self, s):
        s[0][1], s[1][1], s[2][1], s[3][1] = s[1][1], s[2][1], s[3][1], s[0][1]
        s[0][2], s[1][2], s[2][2], s[3][2] = s[2][2], s[3][2], s[0][2], s[1][2]
        s[0][3], s[1][3], s[2][3], s[3][3] = s[3][3], s[0][3], s[1][3], s[2][3]

    def inv_shift_rows(self, s):
        s[0][1], s[1][1], s[2][1], s[3][1] = s[3][1], s[0][1], s[1][1], s[2][1]
        s[0][2], s[1][2], s[2][2], s[3][2] = s[2][2], s[3][2], s[0][2], s[1][2]
        s[0][3], s[1][3], s[2][3], s[3][3] = s[1][3], s[2][3], s[3][3], s[0][3]

    def mix_single_column(self, a):
        # please see Sec 4.1.2 in The Design of Rijndael
        t = a[0] ^ a[1] ^ a[2] ^ a[3]
        u = a[0]
        a[0] ^= t ^ xtime(a[0] ^ a[1])
        a[1] ^= t ^ xtime(a[1] ^ a[2])
        a[2] ^= t ^ xtime(a[2] ^ a[3])
        a[3] ^= t ^ xtime(a[3] ^ u)

    def mix_columns(self, s):
        for i in range(4):
            self.mix_single_column(s[i])

    def inv_mix_columns(self, s):
        # see Sec 4.1.3 in The Design of Rijndael
        for i in range(4):
            u = xtime(xtime(s[i][0] ^ s[i][2]))
            v = xtime(xtime(s[i][1] ^ s[i][3]))
            s[i][0] ^= u
            s[i][1] ^= v
            s[i][2] ^= u
            s[i][3] ^= v

        self.mix_columns(s)

if __name__ == "__main__":
    A = AES()

    txt = b'azertyuiopqsdfgh'
    key = b'0123456789abcdef'

    enc = A.AESENC(txt, key)
    txt2 = A.AESENCINV(enc, key)
    print('Original: ', txt)
    print('Recovered:', txt2)

    dec = A.AESENC(txt, key)
    txt2 = A.AESENCINV(dec, key)
    print('Original: ', txt)
    print('Recovered:', txt2)

The second step of this challenge involves automating the task, since this challenge consists of 1000 stages.

There are several ways to automate this :

  • static analysis ;
  • emulation ;
  • instrumentation ;
  • debugging.

I attempted to emulate the code using Unicorn, but I encountered an issue where Unicorn would not properly handle XMM registers. Plus, it also turned out that a valid unicorn script would take up too long to complete, since decryption process goes through quite a lot of instructions.

I gave up unicorn and tried miasm, but miasm cannot handle aesenc/aesdec instructions.

I ended up using gdb/peda to automate the execution. Here is the resulting script:

def parse_instruction(line) -> Tuple[str, str, str]:
    line = line.split()
    addr, instr, args = line[0][:-1], line[1], line[2] if len(line) > 2 else None
    return addr, instr, args

def get_current_ins() -> str:
    return peda.current_inst(peda.getreg("rip"))[1].split()[0]

def get_xmm(reg) -> int:
    out = peda.execute_redirect(f'p ${reg}.uint128')
    val = int(out.split()[2], 16)
    if val == 0:
        return 0
    hexval = hex(val)[2:].zfill(32)
    return int.from_bytes(bytes.fromhex(hexval), 'little')

def set_xmm(reg, bytes):
    peda.execute(f'set ${reg}.v={hex(int.from_bytes(bytes,"big"))}')

def setup_bp_from_disassembly(relevant_instructions: list, ret_on_first_match = False):
    rip = peda.getreg("rip")
    out = peda.disassemble(hex(rip), hex(rip+0x200))
    lines = out.splitlines()
    for i, line in enumerate(lines):
        addr, instr, args = parse_instruction(line)
        if instr in relevant_instructions:
            peda.execute(f'break *{addr}')
            if ret_on_first_match:
                return

def setup():
    peda.execute('peda set option context ""')
    peda.execute('peda set option clearscr off')
    try: # I also had pwndbg installed
        pwndbg.gdb.execute('set context-output /dev/null')
        pwndbg.gdb.execute('set context-sections regs')
    except:
        pass

setup()
peda.execute('file /home/nofix/encrypted_box2')
peda.execute('starti < /dev/urandom')
setup_bp_from_disassembly(['syscall', 'aesenc', 'aesdec'])

i = 0
while True:
    print(f'{i}/1001 {hex(peda.getreg("rip"))}')
    peda.execute('continue')
    curr_ins = get_current_ins()

    if curr_ins == 'syscall':
        peda.execute('set $rip=$rip+2') # ignore syscalls
    
    elif curr_ins in ['aesdec', 'aesenc']:
        xmm0 = get_xmm("xmm0")
        xmm1 = get_xmm("xmm1")
        # print(f'DATA : {hex(xmm0)}')
        # print(f'KEY  : {hex(xmm1)}')
        aes = AES()
        # We "reverse" the value from XMM0, using XMM1 as key
        if curr_ins == 'aesdec':
            value = aes.AESDECINV(bytes.fromhex(hex(xmm0)[2:].zfill(32)), bytes.fromhex(hex(xmm1)[2:].zfill(32)))
        elif curr_ins == 'aesenc':
            value = aes.AESENCINV(bytes.fromhex(hex(xmm0)[2:].zfill(32)), bytes.fromhex(hex(xmm1)[2:].zfill(32)))
        else:
            assert False

        upper, lower = struct.pack('<Q', int(value.hex()[0:16], 16)), struct.pack('<Q', int(value.hex()[16:32], 16))
        
        # It seems gdb wont allow one to use xmmX.int128 directly, so I have to split the value in two parts
        peda.execute(f'set $xmm2.v2_int64 = {{0x{upper.hex()},0x{lower.hex()}}}')
        
        # Value was supposed to come from the stack, and it is also used in the decryption algorithm to re-setup XMM2,
        # so I set the proper value @RSP, see :
        # 4001DA pinsrq  xmm2, qword ptr [rsp+8], 1
        # 4001E3 pinsrq  xmm2, qword ptr [rsp], 0
        peda.writemem(peda.getreg("rsp"), value)

        # For speed purposes, we remove breakpoints, since we don't need them anymore
        peda.execute('delete breakpoints')

        # We still need to break before next stage
        setup_bp_from_disassembly(['jmp'], ret_on_first_match=True)

    elif curr_ins == 'jmp': # We are sitting right before next stage
        peda.execute(f'si')
        i += 1
        peda.execute('delete breakpoints')
        # Set a bp to valid instruction not far from aesenc/aesdec
        setup_bp_from_disassembly(['pinsrq'], ret_on_first_match=True)
    
    elif curr_ins == 'pinsrq':
        # We should be near aesenc/aesdec
        setup_bp_from_disassembly(['syscall', 'aesenc', 'aesdec'])

For each of the 1000 stages, the script:

  • sets a breakpoint on the first aesdec/aesenc instruction;
  • skips the read from stdin by reading garbage from /dev/urandom (achieved automatically with starti < /dev/urandom);
  • reaches the aesdec/aesenc breakpoint, calculate the expected value of xmm2 by encrypting or decrypting xmm0 using xmm1 as a key;
  • writes the result in xmm2 and the stack, where our read input should have been located after the read syscall;
  • removes breakpoints and allows the encryption/decryption routine to proceed.
BRB{as_deep_as_OceanGate}

Bonus solution

The challenge creator, s4r, used DynamoRIO to instrument and automate the decryption process. This is a clever approach I did not think of during the CTF. While I tend to think that instrumentation requires coding effort, the APIs are actually rather simple, and you can achieve a lot with minimal code. The main drawback is that you need to recompile your code after each modification.

Here is the code provided by s4r:

/// ~/Softs/DynamoRIO-Linux-10.0.0/bin64/drrun -c trace_encryped_box.so -- ~/barbhack-ctf/2023/reverse/encrypted_box/encrypted_box < /dev/urandom
// This sample logs the basic blocks executed in the main module

#include "dr_api.h"
#include "dr_ir_instr.h"
#include "dr_ir_utils.h"
#include "drmgr.h"
#include "drwrap.h"
#include <string.h>

static app_pc exe_start;
static int bb;

static void set_input(int is_dec);
static void event_exit(void);
static dr_emit_flags_t event_app_instruction_blocks(void *drcontext, void *tag, instrlist_t *bb, instr_t *instr, bool for_trace, bool translating, void *user_data);
static dr_emit_flags_t event_app_instruction_calls(void *drcontext, void *tag, instrlist_t *bb, instr_t *instr, bool for_trace, bool translating, void *user_data);

void sub_bytes(unsigned char* x0)
{
    unsigned char sbox[] = {
        0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76,
        0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0,
        0xB7, 0xFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, 0xCC, 0x34, 0xA5, 0xE5, 0xF1, 0x71, 0xD8, 0x31, 0x15,
        0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A, 0x07, 0x12, 0x80, 0xE2, 0xEB, 0x27, 0xB2, 0x75,
        0x09, 0x83, 0x2C, 0x1A, 0x1B, 0x6E, 0x5A, 0xA0, 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84,
        0x53, 0xD1, 0x00, 0xED, 0x20, 0xFC, 0xB1, 0x5B, 0x6A, 0xCB, 0xBE, 0x39, 0x4A, 0x4C, 0x58, 0xCF,
        0xD0, 0xEF, 0xAA, 0xFB, 0x43, 0x4D, 0x33, 0x85, 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8,
        0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5, 0xBC, 0xB6, 0xDA, 0x21, 0x10, 0xFF, 0xF3, 0xD2,
        0xCD, 0x0C, 0x13, 0xEC, 0x5F, 0x97, 0x44, 0x17, 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73,
        0x60, 0x81, 0x4F, 0xDC, 0x22, 0x2A, 0x90, 0x88, 0x46, 0xEE, 0xB8, 0x14, 0xDE, 0x5E, 0x0B, 0xDB,
        0xE0, 0x32, 0x3A, 0x0A, 0x49, 0x06, 0x24, 0x5C, 0xC2, 0xD3, 0xAC, 0x62, 0x91, 0x95, 0xE4, 0x79,
        0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9, 0x6C, 0x56, 0xF4, 0xEA, 0x65, 0x7A, 0xAE, 0x08,
        0xBA, 0x78, 0x25, 0x2E, 0x1C, 0xA6, 0xB4, 0xC6, 0xE8, 0xDD, 0x74, 0x1F, 0x4B, 0xBD, 0x8B, 0x8A,
        0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E,
        0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF,
        0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16
    };

    for (int i = 0; i < 16; i++)
        x0[i] = sbox[x0[i]];
}

void inv_sub_bytes(unsigned char* x0)
{
    unsigned char inv_sbox[] = {
        0x52, 0x09, 0x6A, 0xD5, 0x30, 0x36, 0xA5, 0x38, 0xBF, 0x40, 0xA3, 0x9E, 0x81, 0xF3, 0xD7, 0xFB,
        0x7C, 0xE3, 0x39, 0x82, 0x9B, 0x2F, 0xFF, 0x87, 0x34, 0x8E, 0x43, 0x44, 0xC4, 0xDE, 0xE9, 0xCB,
        0x54, 0x7B, 0x94, 0x32, 0xA6, 0xC2, 0x23, 0x3D, 0xEE, 0x4C, 0x95, 0x0B, 0x42, 0xFA, 0xC3, 0x4E,
        0x08, 0x2E, 0xA1, 0x66, 0x28, 0xD9, 0x24, 0xB2, 0x76, 0x5B, 0xA2, 0x49, 0x6D, 0x8B, 0xD1, 0x25,
        0x72, 0xF8, 0xF6, 0x64, 0x86, 0x68, 0x98, 0x16, 0xD4, 0xA4, 0x5C, 0xCC, 0x5D, 0x65, 0xB6, 0x92,
        0x6C, 0x70, 0x48, 0x50, 0xFD, 0xED, 0xB9, 0xDA, 0x5E, 0x15, 0x46, 0x57, 0xA7, 0x8D, 0x9D, 0x84,
        0x90, 0xD8, 0xAB, 0x00, 0x8C, 0xBC, 0xD3, 0x0A, 0xF7, 0xE4, 0x58, 0x05, 0xB8, 0xB3, 0x45, 0x06,
        0xD0, 0x2C, 0x1E, 0x8F, 0xCA, 0x3F, 0x0F, 0x02, 0xC1, 0xAF, 0xBD, 0x03, 0x01, 0x13, 0x8A, 0x6B,
        0x3A, 0x91, 0x11, 0x41, 0x4F, 0x67, 0xDC, 0xEA, 0x97, 0xF2, 0xCF, 0xCE, 0xF0, 0xB4, 0xE6, 0x73,
        0x96, 0xAC, 0x74, 0x22, 0xE7, 0xAD, 0x35, 0x85, 0xE2, 0xF9, 0x37, 0xE8, 0x1C, 0x75, 0xDF, 0x6E,
        0x47, 0xF1, 0x1A, 0x71, 0x1D, 0x29, 0xC5, 0x89, 0x6F, 0xB7, 0x62, 0x0E, 0xAA, 0x18, 0xBE, 0x1B,
        0xFC, 0x56, 0x3E, 0x4B, 0xC6, 0xD2, 0x79, 0x20, 0x9A, 0xDB, 0xC0, 0xFE, 0x78, 0xCD, 0x5A, 0xF4,
        0x1F, 0xDD, 0xA8, 0x33, 0x88, 0x07, 0xC7, 0x31, 0xB1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xEC, 0x5F,
        0x60, 0x51, 0x7F, 0xA9, 0x19, 0xB5, 0x4A, 0x0D, 0x2D, 0xE5, 0x7A, 0x9F, 0x93, 0xC9, 0x9C, 0xEF,
        0xA0, 0xE0, 0x3B, 0x4D, 0xAE, 0x2A, 0xF5, 0xB0, 0xC8, 0xEB, 0xBB, 0x3C, 0x83, 0x53, 0x99, 0x61,
        0x17, 0x2B, 0x04, 0x7E, 0xBA, 0x77, 0xD6, 0x26, 0xE1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0C, 0x7D
	};

    for (int i = 0; i < 16; i++)
        x0[i] = inv_sbox[x0[i]];
}

void shiftrow(unsigned char* x0)
{
	unsigned char tmp[16];

    tmp[0] = x0[0];
    tmp[1] = x0[5];
    tmp[2] = x0[10];
    tmp[3] = x0[15];

    tmp[4] = x0[4];
    tmp[5] = x0[9];
    tmp[6] = x0[14];
    tmp[7] = x0[3];

    tmp[8] = x0[8];
    tmp[9] = x0[13];
    tmp[10] = x0[2];
    tmp[11] = x0[7];

    tmp[12] = x0[12];
    tmp[13] = x0[1];
    tmp[14] = x0[6];
    tmp[15] = x0[11];

    
	for (int i = 0; i < 16 ; i++)
	{
        x0[i] = tmp[i];
	}
}

void inv_shiftrow(unsigned char* v0)
{
    unsigned char tmp[16];
    tmp[0] = v0[0];
    tmp[1] = v0[13];
    tmp[2] = v0[10];
    tmp[3] = v0[7];
    tmp[4] = v0[4];
    tmp[5] = v0[1];
    tmp[6] = v0[14];
    tmp[7] = v0[11];
    tmp[8] = v0[8];
    tmp[9] = v0[5];
    tmp[10] = v0[2];
    tmp[11] = v0[15];
    tmp[12] = v0[12];
    tmp[13] = v0[9];
    tmp[14] = v0[6];
    tmp[15] = v0[3];

	for (int i = 0; i < 16 ; i++)
	{
        v0[i] = tmp[i];
	}
}

unsigned char xtime(unsigned char x)
{
      return ((x<<1) ^ (((x>>7) & 1) * 0x1b)) & 0xff;
}

unsigned char multiply(unsigned char x, unsigned char y)
{
   return (((y & 1) * x) ^
   ((y>>1 & 1) * xtime(x)) ^
   ((y>>2 & 1) * xtime(xtime(x))) ^
   ((y>>3 & 1) * xtime(xtime(xtime(x)))) ^
   ((y>>4 & 1) * xtime(xtime(xtime(xtime(x)))))) & 0xff;
}

void mix_columns(unsigned char* v0)
{
    for (int i = 0 ; i < 4 ; i++)
	{
        unsigned char t = v0[4*i + 0];
        unsigned char Tmp = v0[4*i + 0] ^ v0[4*i + 1] ^ v0[4*i + 2] ^ v0[4*i + 3];
        unsigned char Tm  = v0[4*i + 0] ^ v0[4*i + 1];
        Tm = xtime(Tm);
        v0[4*i + 0] ^= Tm ^ Tmp;
        Tm  = v0[4*i + 1] ^ v0[4*i + 2];
        Tm = xtime(Tm);
        v0[4*i + 1] ^= Tm ^ Tmp;
        Tm  = v0[4*i + 2] ^ v0[4*i + 3];
        Tm = xtime(Tm);
        v0[4*i + 2] ^= Tm ^ Tmp;
        Tm  = v0[4*i + 3] ^ t;
        Tm = xtime(Tm);
        v0[4*i + 3] ^= Tm ^ Tmp;
    }
}

void inv_mixcolumns(unsigned char* v0)
{
    for (int i = 0 ; i < 4 ; i++)
	{
        unsigned char a = v0[4*i + 0];
        unsigned char b = v0[4*i + 1];
        unsigned char c = v0[4*i + 2];
        unsigned char d = v0[4*i + 3];

        v0[4*i + 0] = multiply(a, 0x0e) ^ multiply(b, 0x0b) ^ multiply(c, 0x0d) ^ multiply(d, 0x09);
        v0[4*i + 1] = multiply(a, 0x09) ^ multiply(b, 0x0e) ^ multiply(c, 0x0b) ^ multiply(d, 0x0d);
        v0[4*i + 2] = multiply(a, 0x0d) ^ multiply(b, 0x09) ^ multiply(c, 0x0e) ^ multiply(d, 0x0b);
        v0[4*i + 3] = multiply(a, 0x0b) ^ multiply(b, 0x0d) ^ multiply(c, 0x09) ^ multiply(d, 0x0e);
    }
}

void xorv(unsigned char* x1, unsigned char* x2)
{
    for (int i = 0; i < 16 ; i++)
    {
        x1[i] ^= x2[i];
    }
}

void aesenc(unsigned char* x1, unsigned char* x2)
{
    shiftrow(x1);
    sub_bytes(x1);
    mix_columns(x1);
    return xorv(x1, x2);
}

void inv_aesenc(unsigned char* x1, unsigned char* x2)
{
    xorv(x1, x2);
    inv_mixcolumns(x1);
    inv_sub_bytes(x1);
    inv_shiftrow(x1);
}

void inv_aesdec(unsigned char* x1, unsigned char* x2)
{
    xorv(x1, x2);
    mix_columns(x1);
    sub_bytes(x1);
    shiftrow(x1);
}

void aesdec(unsigned char* x1, unsigned char* x2)
{
    inv_shiftrow(x1);
    inv_sub_bytes(x1);
    inv_mixcolumns(x1);
    xorv(x1, x2);
}

static void set_input(int is_dec){
    dr_mcontext_t mc = {sizeof(mc),DR_MC_ALL,};

    if (!dr_mcontext_xmm_fields_valid())
    {
        printf("Wrong values of XMM\n");
        exit(0);
    }

    dr_get_mcontext(dr_get_current_drcontext(), &mc);
    unsigned char xmm1[16];
    unsigned char xmm0[16];
    reg_get_value_ex(DR_REG_XMM1, &mc, xmm1);
    reg_get_value_ex(DR_REG_XMM0, &mc, xmm0);

    if (is_dec)
    {
        inv_aesdec(xmm1, xmm0);
    }
    else
    {
        inv_aesenc(xmm1, xmm0);
    }

    // To pass the comiss
    reg_set_value_ex(DR_REG_XMM2, &mc, xmm1);

    // To pass the decryption because XMM2 is fetch again from stack
    for (int i = 0; i < 16 ; i++)
    {
        *((unsigned char*)mc.rsp + i) = xmm1[i];
    }

    dr_set_mcontext(dr_get_current_drcontext(), &mc);
    return;
}

DR_EXPORT void dr_client_main(client_id_t id, int argc, const char** argv){
	module_data_t* exe = dr_get_main_module();

	if (!drmgr_init()){
        DR_ASSERT(false);
    }

	if (!drwrap_init()){
        DR_ASSERT(false);
    }

	dr_set_client_name("Stan tracer", "crackmes.one");
	if (exe != NULL) {
		exe_start = exe->start;
	}

    drmgr_register_bb_instrumentation_event(NULL, event_app_instruction_blocks, NULL);

	dr_free_module_data(exe);
	dr_register_exit_event(event_exit);
}

static dr_emit_flags_t
event_app_instruction_blocks(void *drcontext, void *tag, instrlist_t *bb, instr_t *instr, bool for_trace, bool translating, void *user_data)
{
    app_pc addr_instr = instr_get_app_pc(instr);
    instr_t* first_instr_block = instrlist_first_app(bb);
    app_pc first_addr_block = instr_get_app_pc(first_instr_block);

    module_data_t *mod = dr_lookup_module(dr_fragment_app_pc(tag));
    if (mod != NULL) {
        bool from_exe = (mod->start == exe_start);
        dr_free_module_data(mod);
        if (!from_exe) {
            return DR_EMIT_DEFAULT;
        }
    }

    // find aesdec; comiss
    if ((*(unsigned char*)(addr_instr) == 0x66) &&
        (*(unsigned char*)(addr_instr + 1) == 0x0f) &&
        (*(unsigned char*)(addr_instr + 2) == 0x38) &&
        (*(unsigned char*)(addr_instr + 3) == 0xde) &&
        (*(unsigned char*)(addr_instr + 4) == 0xd1) &&
        (*(unsigned char*)(addr_instr + 5) == 0x0f))
    {
        int dec = 0x1;
        dr_insert_clean_call(drcontext, bb, instr, set_input, false, 1, OPND_CREATE_INTPTR(dec));
    }
    // find aesenc; comiss
    else if ((*(unsigned char*)(addr_instr) == 0x66) &&
        (*(unsigned char*)(addr_instr + 1) == 0x0f) &&
        (*(unsigned char*)(addr_instr + 2) == 0x38) &&
        (*(unsigned char*)(addr_instr + 3) == 0xdc) &&
        (*(unsigned char*)(addr_instr + 4) == 0xd1) &&
        (*(unsigned char*)(addr_instr + 5) == 0x0f))
    {
        int dec = 0x0;
        dr_insert_clean_call(drcontext, bb, instr, set_input, false, 1, OPND_CREATE_INTPTR(dec));
    }

	return DR_EMIT_DEFAULT;
}

static void event_exit(void) {
	drmgr_exit();
}

And the associated makefile:

DR_PATH=/home/user/Softs/DynamoRIO-Linux-10.0.0
CC=gcc
CFLAGS=-fPIC -shared -lgcc -DLINUX -DX86_64 -I$(DR_PATH)/include -I$(DR_PATH)/ext/include
LFLAGS=-L$(DR_PATH)/lib64/debug -L$(DR_PATH)/ext/lib64/debug
LLIBS=-ldrmgr -ldrwrap

trace_encryped_box.so: trace_encryped_box.c
	$(CC) $(CFLAGS) $^ -o $@ $(LFLAGS) $(LLIBS)