hws/hw-buffer-tabbed.html

<!DOCTYPE html>
<html xmlns='http://www.w3.org/1999/xhtml' lang='' xml:lang=''>
<head>
  <meta charset='utf-8'></meta>
  <meta name='generator' content='pandoc'></meta>
  <meta name='viewport' content='width=device-width, initial-scale=1.0, user-scalable=yes'></meta>
  <title>ICS: Programming Homework: Buffer Overflow</title>
  <style>
    code{white-space: pre-wrap;}
    span.smallcaps{font-variant: small-caps;}
    span.underline{text-decoration: underline;}
    div.column{display: inline-block; vertical-align: top; width: 50%;}
    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
    ul.task-list{list-style: none;}
    .display.math{display: block; text-align: center; margin: 0.5rem auto;}
  </style>
  <link rel='stylesheet' href='../markdown.css'></link>
  <script>
  function openTab(evt, tabName) {
    // Declare all variables
    var i, tabcontent, tablinks;

    // Get all elements with class="tabcontent" and hide them
    tabcontent = document.getElementsByClassName("tabcontent");
    for (i = 0; i < tabcontent.length; i++) {
      tabcontent[i].style.display = "none";
    }

    // Get all elements with class="tablinks" and remove the class "active"
    tablinks = document.getElementsByClassName("tablinks");
    for (i = 0; i < tablinks.length; i++) {
      tablinks[i].className = tablinks[i].className.replace(" active", "");
    }

    // Show the current tab, and add an "active" class to the button that opened the tab
    document.getElementById(tabName).style.display = "block";
    evt.currentTarget.className += " active";
  }
  </script>
  
</head>
<body>
<h1 id='ics-programming-homework-buffer-overflow'>ICS: Programming Homework: Buffer Overflow</h1>
<p><a href='index.html'>Go up to the ICS HW page</a> (<a href='index.md'>md</a>) | <a href='hw-buffer.html'>view one-page version</a></p><div class='tab'>
<button class='tablinks' onclick="openTab(event,'tpurpose')" id='defaultOpen'>Purpose</button>
<button class='tablinks' onclick="openTab(event,'tchangelog')">Changelog</button>
<button class='tablinks' onclick="openTab(event,'tplatform')">Platform</button>
<button class='tablinks' onclick="openTab(event,'texecutable')">Executable</button>
<button class='tablinks' onclick="openTab(event,'taslr')">ASLR</button>
<button class='tablinks' onclick="openTab(event,'tbuffer-address')">Buffer Address</button>
<button class='tablinks' onclick="openTab(event,'tstep-1-makefile')">Makefile</button>
<button class='tablinks' onclick="openTab(event,'tstep-2-shellcode')">Shellcode</button>
<button class='tablinks' onclick="openTab(event,'ttask-3-overflow')">Overflow</button>
<button class='tablinks' onclick="openTab(event,'tdebugger')">Debugger</button>
<button class='tablinks' onclick="openTab(event,'ttask-4-newline')">Newline</button>
<button class='tablinks' onclick="openTab(event,'ttask-5-write-up')">Write Up</button>
<button class='tablinks' onclick="openTab(event,'ttroubleshooting')">Troubleshooting</button>
<button class='tablinks' onclick="openTab(event,'tsubmission')">Submission</button>
</div>
<div id='tpurpose' class='tabcontent'><h3 id='purpose'>Purpose</h3>
<p>This assignment will have you implement a shellcode-based buffer overflow attack against a program executable. You will need to be familiar with the content in the <a href='../slides/buffer-overflows.html#/'>buffer overflow slide set</a>, specifically the <a href='../slides/buffer-overflows.html#/how2doit'>how to do it</a> column of slides. That and the lecture recordings will guide you through many of the steps of this assignment.</p>
<p>This assignment has five parts (or tasks), and are meant to be done in order, as that will help guide you through the assignment. In any case, be sure to do task 5 – a brief write-up – as that will help us tell how far you got in the assignment.</p>
<h4 id='getting-stuck'>Getting stuck?</h4>
<p>There are many parts of this assignment where it is easy to get stuck. You can look at the troubleshooting section, and the end of this page, for some hints. We provide a <em>lot</em> of details in each section – try reading through those again. We know it’s a lot to read (and to read repeatedly), but one can easily miss something the first time (or two) through it.</p>
<h4 id='reference-platform'>Reference platform</h4>
<p>This program must run on the Cyber Range account. You are welcome to develop it on your own machine, but make sure it works there before you submit it. And if it isn’t working on your machine, try it on the Cyber Range to see if that allows it to work.</p>
</div><div id='tchangelog' class='tabcontent'><h3 id='changelog'>Changelog</h3>
<p>Any changes to this page will be put here for easy reference. Typo fixes and minor clarifications are not listed here. </p>
<ul>
<li>Mon, 11/13: updated the code to read in address.txt at the end of the ‘buffer address’ section</li>
</ul>
</div><div id='tplatform' class='tabcontent'><h3 id='platform'>Platform</h3>
<p>We will be using the Virginia Cyber Range at <a href='https://www.virginiacyberrange.org/'>https://www.virginiacyberrange.org/</a>. This site allows you to run a remote virtual environment; we are using Ubunmtu 22.04.</p>
<p>Go to <a href='https://www.virginiacyberrange.org/'>https://www.virginiacyberrange.org/</a>. You should have already signed up for the last assignment; if not, see that one for how to do so.</p>
<p>Once you log in, click on the course. There will then be a list of exercise environments at the bottom of the page – click on the one called “Buffer Overflow HW Environment”, then click on the power button on the page that appears (it’s in the lower left of the page). It will take a minute or so to start (boot). Once it does, hit the play button (the right-arrow that replaced the power button) to attach to a screen in the virtual environment.</p>
<p>You can load up a terminal via one of the icons on the bottom of the screen. You will then have to install a bit of the software. To do so, enter the following two commands:</p>
<pre><code>sudo apt update
sudo apt install -y make g++ emacs nasm</code></pre>
<p>This image is <em>persistent</em>, which means that any changes you make will be preserved in your virtual environment the next time you log in – so you should only have to do that once. You are welcome to install any other software that you would like to use. gedit is already installed, as is python3.</p>
<p>When you are done, you should close that window, and you can stop the virtual machine (the stop button, which is next to the play button that started this whole party).</p>
<p><strong>WARNING:</strong> This site is free to use for <em>class purposes</em> for all students in the course. Using it for non-class purposes is an honor violation, and will be dealt with as such. Anything outside the reasonable bounds of an assignment in this course is considered a non-class purpose.</p>
<p>This is a great resource, but it is a <em>finite</em> resource. If you decide to wait to the last minute to start the assignment, and the rest of your class-mates do so as well, it’s going to be slooooooow. You cannot get an extension because you waited until the last minute along with everybody else, and the system was slow as a result.</p>
<h4 id='working-as-root'>Working as root</h4>
<p>Some of these commands will require to to execute them as the root user. You can do this by prepending <code>sudo</code> in front of the command. For example, to insert a module called <code>root.ko</code>, you would call <code>sudo insmod root.ko</code> – this executes the <code>insmod root.ko</code> command as root.</p>
</div><div id='texecutable' class='tabcontent'><h3 id='executable'>Executable</h3>
<h4 id='grade.c'>grade.c</h4>
<p>The executable we are attacking is called <code>grade</code>. When run, you enter your name, and it will tell you your grade. An example run of the program:</p>
<pre><code>$ ./grade
Please enter your name:
Albert Einstein

Albert Einstein, your grade on this assignment is a F
$</code></pre>
<p>Note that the first instance of the name in the output above is what the user input, and the second is what the program printed back.</p>
<p>The grade reported by this program is the grade you will receive on (part of) this assignment.</p>
<p>The source code for this program is available: <a href='buffer/grade.c.html'>grade.c</a> (<a href='buffer/grade.c'>src</a>). It MUST be compiled with the following command:</p>
<pre><code>gcc -g -fno-stack-protector -m64 -fomit-frame-pointer -o grade grade.c</code></pre>
<p>Your job is to create a shellcode-based buffer overflow attack that will assign you a different grade. For example:</p>
<pre><code>$ ./grade &lt; input.bin
Please enter your name:
Albert Einstein, your grade on this assignment is an A
$</code></pre>
<p>The contents of input.bin are output by a program that you will write in tasks 3 & 4.</p>
<p><strong>NOTE:</strong> Your output on this must be EXACTLY: <code>&lt;name&gt;, your grade on this assignment is an A</code>, as the Gradescope submission will check this. It’s fine if <code>&lt;name&gt;</code> is a nickname or only your first name. Specifically, it has to match the name you put in the buffer.py file that you will submit.</p>

</div><div id='taslr' class='tabcontent'><h3 id='aslr'>ASLR</h3>
<p>Address Space Layout Randomization (ASLR) is an operating system defense against buffer overflows. Consider the following program, which is available as <a href='buffer/stack_addr.c.html'>stack_addr.c</a> (<a href='buffer/stack_addr.c'>src</a>):</p>
<pre><code>#include &lt;stdio.h&gt;
int main() {
  char buffer[256];
  printf ("%.8lx\n",(unsigned long)buffer);
}</code></pre>
<p>All it does is print out the address of a scoped variable, which means the variable is on the stack. We would expect that the variable is at the same position on the stack, relative to the top, for each execution. We compile it (via <code>gcc -o stack_addr stack_addr.c</code>) and run it a number of times:</p>
<pre><code>$ ./stack_addr 
9dbfc92c
$ ./stack_addr 
773ee00c
$ ./stack_addr 
b4f4c21c
$ ./stack_addr 
e811414c
$ ./stack_addr 
0f0e077c
$ ./stack_addr 
be185c6c
$ </code></pre>
<p>The stack position keeps changing. Actually what is happening is the address where the stack <em>starts</em> keeps changing – <code>buffer</code> is still in the same spot relative to that end of the stack, but the address of that end is what varies each time. This is an operating system defense to make buffer overflows quite difficult, since the address of a buffer keeps changing.</p>
<p>For this assignment, we need to turn that defense off.</p>
<p>We are going to run it differently. Try: <code>setarch $(uname -m) -L -R ./stack_addr</code>. The <code>-R</code> option tells it to turn off ASLR for the program being run. The <code>-L</code> flag tells it to use an older memory model that is more appropriate for our first buffer overflow exploit. You can execute this command many times, and you will get the exact same stack address each time.</p>
<p>However, it’s annoying to do that each time, so we can just do it once: <code>setarch $(uname -m) -L -R /bin/bash</code>. This runs bash – the shell – and any commands run in that shell will not have ASLR turned on. You can now run <code>./stack_addr</code> many times, and get the same address each time.</p>
<p>We will use the <code>setarch</code> command when we run grade. So your buffer overflow will run via:</p>
<p><code>setarch $(uname -m) -L -R ./stack_addr ./grade &lt; input.bin</code></p>
<p>The <code>input.bin</code> part is explained below.</p>
</div><div id='tbuffer-address' class='tabcontent'><h3 id='buffer-address'>Buffer Address</h3>
<p>As this is a first take at a stack buffer overflow, we are making the project more viable. Specifically, we will provide the address of the buffer that your program can use when determining the buffer contents.</p>
<p>If you run the <code>grade</code> executable with the <code>--print-buffer-address</code> flag, it will show the address of the buffer:</p>
<pre><code>$ ./grade --print-buffer-address
7ffca71f59f0
$ ./grade --print-buffer-address
7fff6deb9320
$ ./grade --print-buffer-address
7ffe5a3aefb0
$ ./grade --print-buffer-address
7ffd3328b150
$</code></pre>
<p>Again, we see that the address changes each time. So we can run it with <code>setarch</code> in one of two ways. First, each time we call the program:</p>
<pre><code>$ setarch $(uname -m) -L -R ./grade --print-buffer-address
7fffffffe050
$ setarch $(uname -m) -L -R ./grade --print-buffer-address
7fffffffe050
$ setarch $(uname -m) -L -R ./grade --print-buffer-address
7fffffffe050
$ setarch $(uname -m) -L -R ./grade --print-buffer-address
7fffffffe050
$</code></pre>
<p>Or we can run <code>setarch</code> just once to run <code>bash</code>, and then run the program:</p>
<pre><code>$ setarch $(uname -m) -L -R /bin/bash
$ ./grade --print-buffer-address
7fffffffe030
$ ./grade --print-buffer-address
7fffffffe030
$ ./grade --print-buffer-address
7fffffffe030
$ ./grade --print-buffer-address
7fffffffe030
$ </code></pre>
<p>Prior to running your program, there will be an <code>address.txt</code> file that will contain the (hex) address of the buffer. We will generate it as such:</p>
<pre><code>$ setarch $(uname -m) -L -R ./grade --print-buffer-address &gt; address.txt
$ cat address.txt
7fffffffd410
$</code></pre>
<p>You can then read in that file to determine the address of the buffer. The following C code will do that for you:</p>
<pre><code>#include &lt;stdio.h&gt;
void main() {
  unsigned char bytes[8];
  unsigned long n;

  FILE *fp = fopen("address.txt","r");
  fscanf(fp, "%lx", (unsigned long)&n);
  fclose(fp);
  printf("Buffer address: %lx\n",n);

  for (int i = 0; i &lt; 8; i ++)
      bytes[i] = (n &gt;&gt; (64 - 8 * (i + 1))) & 0xFF;
 
  printf("Reverse address: %lx%lx%lx%lx%lx%lx%lx%lx\n", bytes[7], bytes[6], bytes[5], bytes[4], bytes[3], bytes[2], bytes[1], bytes[0]); 
}</code></pre>
<p>You may assume that the <code>address.txt</code> file is present, properly formed, and has the correct buffer address value.</p>
</div><div id='tstep-1-makefile' class='tabcontent'><h3 id='step-1-makefile'>Step 1: Makefile</h3>
<p>Not much to do here yet, but we are going to call <code>make</code> to compile your code. All of your compilation lines must be in that Makefile, and all under one target. Most of the following steps will have you add some lines to the Makefile. A collection of all the compilation commands is at the end of this document (in the Submission section).</p>
<p>If you forget how to write a Makefile, you can see <a href='https://uva-cs.github.io/pdr/tutorials/05-make/index.html'>this Makefile tutorial</a>, and <a href='https://uva-cs.github.io/pdr/labs/lab08-64bit/index.html'>this C/assembly Makefile example</a>, or any other reference online.</p>
<p>You may want to put the grade.c compilation line, from above, into your Makefile. So far, your Makefile should look like this:</p>
<pre><code>all:
    gcc -g -fno-stack-protector -m64 -fomit-frame-pointer -o grade grade.c</code></pre>
<p>Note that the indentation must be a tab, not spaces!</p>
</div><div id='tstep-2-shellcode' class='tabcontent'><h3 id='step-2-shellcode'>Step 2: Shellcode</h3>
<p>This step is to create shellcode that will print out the grade you want to receive, and then gracefully exit. This should be in a file called <code>shellcode.s</code>. Recall that you will have to remove all end-of-string characters from the machine code generated. While this will mostly be 0x00 bytes, you will also have to check for newlines (0x0a) and carriage returns (0x0d). The <a href='../slides/buffer-overflows.html#/'>buffer overflow slide set</a> goes over how to do this.</p>
<h4 id='goal'>Goal</h4>
<p>The goal of this part is to create a stand-alone assembly program that will print out, <code>Albert Einstein, your grade on this assignment is an A</code>. You will code that in an assembly routine, and use a C program to call that routine. <strong>The intent is that you start with the shellcode provided in the course slides, specifically <a href='../slides/buffer-overflows.html#/how2doit'>here</a>, and adapt that code.</strong></p>
<h4 id='c-code'>C Code</h4>
<p>To test that your code works, you will need a <code>main()</code> function to call your shellcode; this will go into a <code>shellcode_test.c</code> file. Here is an example such file (you are welcome to use this as-is or modify it):</p>
<pre><code>#include &lt;stdio.h&gt;
#include &lt;sys/mman.h&gt;

extern void shellcode();

void main() {
  int on_stack;
  mprotect((char *)((long)&on_stack & -0x1000), 1, PROT_READ | PROT_WRITE | PROT_EXEC);
  mprotect((char *)((long)&main & -0x1000), 1, PROT_READ | PROT_WRITE | PROT_EXEC);
  printf ("Before shell code is executed.\n");
  shellcode();
  printf ("After shell code is executed.\n");
}</code></pre>
<p>The <code>mprotect()</code> calls work the same as described in the grade.c program (<a href='buffer/grade.c.html'>HTML</a>, <a href='buffer/grade.c'>C</a>) – they allow execution on the stack (the first call) and modifying the machine code (the second call). Needless to say, this requires that your assembly routine be called <code>shellcode</code>.</p>
<p>This file is provided in <a href='buffer/shellcode_test.c.html'>shellcode_test.c</a> (<a href='buffer/shellcode_test.c'>src</a>).</p>
<h4 id='assembly-code'>Assembly code</h4>
<p>Your assembly code will go in a <code>shellcode.s</code> file; here is the start fo that file:</p>
<pre><code>; shellcode.s

  global shellcode

  section .text

shellcode:

  ; your code here

  ; for now, a ret, but you will want to eventually remove it
  ret</code></pre>
<p>This file is provided in <a href='buffer/shellcode.s.html'>shellcode_test.c</a> (<a href='buffer/shellcode.s'>src</a>).</p>
<p>Forget assembly? Here are some references from CS 2150: <a href='https://uva-cs.github.io/pdr/slides/08-assembly-64bit.html#/'>slides</a>, labs (<a href='https://uva-cs.github.io/pdr/labs/lab08-64bit/index.html'>1</a> & <a href='https://uva-cs.github.io/pdr/labs/lab09-64bit/index.html'>2</a>), book chapters (<a href='https://uva-cs.github.io/pdr/book/x86-64bit-asm-chapter.pdf'>1</a> & <a href='https://uva-cs.github.io/pdr/book/x86-64bit-ccc-chapter.pdf'>2</a>), and/or <a href='https://www.cs.cmu.edu/~fp/courses/15213-s07/misc/asm64-handout.pdf'>recommended online document</a>. If you took CSO1, that course will have some references as well.</p>
<h4 id='makefile'>Makefile</h4>
<p>Your Makefile will need to have three lines to compile this program: one to call nasm to compile shellcode.s, one to compile shellcode_test.c, and one to link them together.</p>
<p>You will add the following lines to your Makefile (remember that the indents are a single tab, not spaces):</p>
<pre><code>nasm -g -f elf64 -o shellcode.o shellcode.s
gcc -g -m64 -c shellcode_test.c
gcc -g -no-pie -m64 -o shellcode_test shellcode.o shellcode_test.o</code></pre>
<p>This goes after the line to compile grade.c from the previous step.</p>
<p>Some VERY important notes for compilation:</p>
<ul>
<li>We are using C, not C++, so you will have to use the <code>gcc</code> (not <code>g++</code>) compiler.</li>
<li>We are using the GNU C compiler, so we are using <code>gcc</code> (not <code>clang</code>)</li>
<li>We are doing this as a 64-bit program, so you will have to pass <code>-f elf64</code> to nasm, and <code>-m64</code> to BOTH of the gcc compilation commands.</li>
<li>You will have to add the <code>-no-pie</code> flag to the final link line. (If you are interested, the default for the current version of gcc is to create a position independent executable (PIE), which means the addresses for everything are relative, not absolute. This does not work with what we are compiling, so we turn it off with that flag.)</li>
<li>Your compiled shellcode.s file should be named <code>shellcode.o</code> – we are going to run a disassembler on that file, so please name it correctly.</li>
<li>Your final binary should be called <code>shellcode_test</code> – we are going to execute it, so please name it correctly.</li>
</ul>
<h4 id='notes-and-hints'>Notes and hints</h4>
<p>Some notes and hints for developing this shellcode:</p>
<ul>
<li>If you forget how to write assembly, see the appropriate <a href='https://uva-cs.github.io/pdr/slides/08-assembly-64bit.html#/'>slides</a>, labs (<a href='https://uva-cs.github.io/pdr/labs/lab08-64bit/index.html'>1</a> & <a href='https://uva-cs.github.io/pdr/labs/lab09-64bit/index.html'>2</a>), book chapters (<a href='https://uva-cs.github.io/pdr/book/x86-64bit-asm-chapter.pdf'>1</a> & <a href='https://uva-cs.github.io/pdr/book/x86-64bit-ccc-chapter.pdf'>2</a>), and/or <a href='https://www.cs.cmu.edu/~fp/courses/15213-s07/misc/asm64-handout.pdf'>recommended online document</a> from CS 2150.</li>
<li>You can use <code>objdump -d shellcode.o</code> to see the machine language that your assembly opcodes compiled into.</li>
<li>The system calls for 32-bit x86 are DIFFERENT than those for 64-bit x86. In particular, syscall 1 is exit in 32-bit but is sys_write in 64-bit. We want this to be a 64-bit program, so be sure to use the syscalls that are listed <a href='http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/'>here</a>. You are welcome to use any other online table for your syscall reference, but be SURE it’s the 64-bit version.</li>
<li>When invoking <code>syscall</code> in assembly, it looks at the <em>entire</em> register rax for the syscall function. So if you only set the value in the lower 32 bits (i.e., <code>mov eax, 1</code>), then the overall value in <code>rax</code> may not be 1 if any of the upper 32 bits were set previously (which is likely), and you will not get the syscall you expect. So be sure to zero out the 64-bit versions of all the registers that syscall is using.</li>
<li>Note that <code>syscall</code> can modify the registers, so if the value in rax was 1 before syscall was invoked, it may be quite different afterward – so zero out your registers again for your second syscall invocation.</li>
<li>As mentioned in the slides, you will want your string to be in your .text section.</li>
<li>To save yourself a headache, don’t bother putting in a newline at the end of the string at this time.</li>
</ul>
<p>Where to start?</p>
<ul>
<li>First, get the C code (from above) properly compiling with some assembly code – you can adapt the vecsum example from the <a href='https://uva-cs.github.io/pdr/labs/lab08-64bit/index.html'>first x86 lab in CS 2150</a> for this.</li>
<li>Make sure your Makefile can compile the code</li>
<li>Review the <a href='../slides/buffer-overflows.html#/'>buffer overflow slides</a>, specifically the <a href='../slides/buffer-overflows.html#/how2doit'>how to do it</a> column</li>
<li>Change the assembly code to print out a string (via a <code>syscall</code>)</li>
<li>Add more assembly code to gracefully exit (also via a <code>syscall</code>)</li>
<li>View the machine code (via <code>objdump -d</code>), and work on removing all the invalid characters (mostly 0x00 bytes).</li>
</ul>
<h4 id='removing-invalid-characters'>Removing invalid characters</h4>
<p>As discussed in <a href='slides/buffer-overflows.html#/how2doit'>the slides</a>, we are going to have to remove all end-of-string characters. There are a number of characters can cause problems: newlines (0x0a), carriage returns (0x0d), tabs (0x09), vertical tab (0x0b), and spaces (0x20). The real issue, however, is removing null bytes (0x00). The slides discuss how to do this. You can check your assembly subroutine with <code>objdump -d</code> to verify that these bytes do not occur. Once you make these modifications, recompile <code>shellcode_test</code> to make sure it still works as intended after those changes.</p>
<h4 id='execution'>Execution</h4>
<p>After compiling with <code>make</code>, we will execute your code via <code>./shellcode_test</code> as follows.</p>
<pre><code>$ ./shellcode_test
Before shell code is executed.
Albert Einstein, your grade on this assignment is an A
$</code></pre>
<p>Note that we expect it to say <em>your</em> name, not Albert Einstein. Any reasonable form of your name is fine.</p>
</div><div id='ttask-3-overflow' class='tabcontent'><h3 id='task-3-overflow'>Task 3: Overflow</h3>
<p>This is what we are all here for: the buffer overflow itself. Presumably, from task 2, you have shell code that does not contain any 0x00 bytes (or any of the other token-ending characters), prints out the grade you want, and then exits.</p>
<p>In this part, you are going to create a C program called <code>attack_shellcode.c</code> that will write the carefully constructed data to stdout. You can run your program as follows (this is how we will do it to test it):</p>
<pre><code>$ ./attack_shellcode &gt; input.bin</code></pre>
<p>As your program will be writing binary data, the input.bin will be binary. You can use <code>hexdump -C</code> to view the contents of that file – but make sure you put the <code>-C</code> parameter in there!</p>
<p>Needless to say, be sure to name your executable <code>attack_shellcode</code>. You will then be able to run the buffer overflow via <code>./grade &lt; input.bin</code>, as shown above. Recall that the <code>grade</code> executable was compiled from the grade.c source code, as shown above.</p>
<p>The <a href='../slides/buffer-overflows.html#/'>buffer overflow slide set</a> will be a useful reference for this. The output of your <code>attack_shellcode</code> executable, which is what is stored in input.bin, will likely have three parts:</p>
<ul>
<li>The NOP sled</li>
<li>The shellcode</li>
<li>The return address</li>
</ul>
<h4 id='debugging'>Debugging</h4>
<p>See the next section for a quick tutorial on how to use the debugger to get your program working.</p>
<h4 id='notes-and-hints-1'>Notes and hints</h4>
<p>Some other notes and tips:</p>
<ul>
<li>We are not, at this time, expecting a newline to be printed at the end of your desired grade – that’s the next task.</li>
<li>Nothing here will work unless you have run <code>setarch x86_64 -v -LR bash</code> first, as described in the <a href='../slides/buffer-overflows.html#/'>buffer overflow slide set</a>.</li>
<li>The address of the <code>name</code> buffer in <code>vulnerable()</code> MUST be specified as described below, otherwise your program will not work and you will not receive credit for that part</li>
<li>When you are <em>not</em> running it in gdb, the buffer address will change slightly. It’s up to you to figure out what that is (you can modify grade.c to find out; just be sure to change it back).</li>
<li>You can look at your created file via <code>hexdump -C input.bin</code>. With the <code>-C</code> parameter, hexdump will display it in big Endian (different parameters to hexdump will show it in different Endian-ness).</li>
<li>In your <code>attack_shellcode</code> program, you may be tempted to use <a href='http://www.cplusplus.com/reference/cstdio/puts/'>puts()</a> to write your data to standard output. However, <code>puts()</code> will write a newline character (‘\n’, hex value 0x0a) after the data it prints. You can use <a href='http://www.cplusplus.com/reference/cstdio/fputs/'>fputs()</a> instead (use <code>stdout</code> as the file descriptor) – this will print to standard output without the trailing newline.</li>
</ul>
<h4 id='makefile-1'>Makefile</h4>
<p>Your Makefile will need to compile your program into an executable named <code>attack_shellcode</code>. A possible compilation line might look like the following – this would also go into your Makefile.</p>
<pre><code>gcc -o attack_shellcode attack_shellcode.c</code></pre>
<h4 id='execution-1'>Execution</h4>
<p>After compiling with <code>make</code>, we will execute your code via <code>./attack_shellcode</code> as follows.</p>
<pre><code>$ setarch $(uname -m) -R -L ./grade --print-buffer-address
7fffffffd410
$ setarch $(uname -m) -R -L ./grade --print-buffer-address &gt; address.txt
$ cat address.txt
7fffffffd410
$ ./attack_shellcode &gt; input.bin
$ setarch $(uname -m) -R -L ./grade &lt; input.bin
Albert Einstein, your grade on this assignment is an A$</code></pre>
<p>The specific value of your buffer address may vary. Notice that the end prompt is on the <em>same</em> line as the output – this is fixed next.</p>
</div><div id='tdebugger' class='tabcontent'><h3 id='debugger'>Debugger</h3>
<p>You are going to have become friendly with gdb – the GNU Debugger. There is no way around that. Here are a bunch of commands that will be of help; you can also see the <a href='https://uva-cs.github.io/pdr/tutorials/02-gdb/index.html'>CS 2150 GDB tutorial</a> (although some of the commands below are not listed there).</p>
<p>To perform any of these commands, you need to add the <code>-g</code> flag to <em>all</em> of your compilation lines. This was included above, but be sure that <em>each</em> compilation line (both gcc and nasm) have that flag present..</p>
<p>To start the debugger, run <code>gbd</code> with the <em>compiled</em> program you want to debug, such as: <code>gdb shellcode_test</code>.</p>
<h4 id='breakpoints-and-examining-data'>Breakpoints and examining data</h4>
<p>A breakpoint is a spot in the file where the program execution will pause so that you can examine the state of things. To set a breakpoint, enter <code>break shellcode_test.c:11</code>. This will set a breakpoint at line 11 of the shellcode_test.c file, which is the line that calls <code>shellcode();</code>. A breakpoint will pause the program just <em>before</em> that line is executed.</p>
<p>To start the program executing, you enter <code>run</code>. You can enter any command line parameters after the <code>run</code>, or pipes (i.e., <code>run &lt; input.bin</code>).</p>
<p>One you have entered the breakpoint, you can examine the state of the program:</p>
<ul>
<li><code>info locals</code> prints the values of the local variables in that subroutine</li>
<li><code>print</code> or <code>p</code> will show the value of a specific variable. If it’s an array or buffer, you can prefix it by the ampersand (<code>&</code>) to get the address of that array or buffer.</li>
<li>You can get the value of a specific register via: <code>p $rax</code>; this will be quite useful later on</li>
<li><code>bt</code> (for backtrace) shows the set of function calls that got to that point</li>
<li><code>list</code> shows the line of source code that the program is paused on, and the few lines before and after</li>
</ul>
<p>Below is an example of running the debugger using those commands. This is using the shellcode_test.c, from above, and there is only one local variable. There are a number of lines of information printed when the debugger starts; these have been omitted below and replaced with an elipsis (…).</p>
<pre><code>$ gdb shellcode_test 
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
...
(gdb) break shellcode_test.c:11
Breakpoint 1 at 0x40122a: file shellcode_test.c, line 11.
(gdb) run
Starting program: /home/aaron/Dropbox/git/ics-staff/hws/buffer/shellcode_test 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Before shell code is executed.

Breakpoint 1, main () at shellcode_test.c:11
11    shellcode();
(gdb) info locals
on_stack = 0
(gdb) p on_stack
$1 = 0
(gdb) p $rax
$2 = 31
(gdb) bt
#0  main () at shellcode_test.c:11
(gdb) list
6 void main() {
7   int on_stack;
8   mprotect((char *)((long)&on_stack & -0x1000), 1, PROT_READ | PROT_WRITE | PROT_EXEC);
9   mprotect((char *)((long)&main & -0x1000), 1, PROT_READ | PROT_WRITE | PROT_EXEC);
10    printf ("Before shell code is executed.\n");
11    shellcode();
12    printf ("After shell code is executed.\n");
13  }
(gdb) exit
A debugging session is active.

  Inferior 1 [process 1162358] will be killed.

Quit anyway? (y or n) y
$ </code></pre>
<h4 id='managing-execution'>Managing execution</h4>
<p>Debuggers can allow you to execute lines of code one at a time; this is called single-stepping. Presumably we would want to examine the data after each single step, as above – but for this part we are just going to look at how to control the execution of the program.</p>
<p>The relevant commands are:</p>
<ul>
<li><code>next</code> or <code>n</code> will execute the current line of C code and step to the next instruction. In particular, if that line is a subroutine call, it will execute the entire subroutine and break at the line <em>after</em> the subroutine call. It will also step over conditionals (<code>if</code>) and loops (<code>for</code> and <code>while</code>).</li>
<li><code>step</code> or <code>s</code> will step <em>into</em> the subroutine, and break at the very start of the subroutine. This is useful for subroutines that we write, but less useful for a subroutine we are calling from a library.</li>
<li><code>continue</code> or <code>c</code> will resume execution until either the program ends or another breakpoint is encountered.</li>
</ul>
<p>If you want to enter the same command a second time, you can just hit the Enter button. You will see this below where it looks like there was nothing entered on the gdb prompt – it just used the previous command.</p>
<p>For this example, we are going to use the grade.c program from above, and break at the first line of the <code>main()</code> function (line 45). We want to step over the <code>mprotect()</code> calls, but step into the <code>vulnerable()</code> call. We are also going to print the address of the buffer once in <code>vulnerable()</code>.</p>
<pre><code>$ gdb grade
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
...
(gdb) break grade.c:45
Breakpoint 1 at 0x12cf: file grade.c, line 45.
(gdb) run
Starting program: /home/aaron/Dropbox/git/ics-staff/hws/buffer/grade 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, main (argc=1, argv=0x7fffffffe3e8) at grade.c:45
45    if ( (argc == 2) && (!strcmp(argv[1],"--print-buffer-address")) )
(gdb) n
53    mprotect((char *)((long)&on_stack & -0x1000), 1, PROT_READ | PROT_WRITE | PROT_EXEC);
(gdb) 
57    mprotect((char *)((long)&main & -0x1000), 1, PROT_READ | PROT_WRITE | PROT_EXEC);
(gdb) 
60    vulnerable();
(gdb) s
vulnerable () at grade.c:28
28  void vulnerable() {
(gdb) print &buffer
$1 = (char (*)[30]) 0x7ffff7f96e30 &lt;buffer&gt;
(gdb) n
30    if ( print_buffer_address ) {
(gdb) n
35      printf ("Please enter your name:\n");
(gdb) c
Continuing.
Please enter your name:
Albert Einstein

Albert Einstein, your grade on this assignment is a F
[Inferior 1 (process 1162585) exited with code 067]
(gdb) exit</code></pre>
<h4 id='assembly-in-a-debugger'>Assembly in a debugger</h4>
<p>The commands seen so far work for C programs, but we want to also be able to examine assembly execution. Here are a few new commands:</p>
<ul>
<li><code>stepi</code> steps into the assembly being executed – this will be necessary to trace your shellcode. This is like the <code>step</code> (or <code>s</code>) command from above, but shows the instructions opcode by opcode.</li>
<li><code>layout asm</code> will split the screen so that the assembly being executed is shown at the top (with an arrow at the current instruction pointer), and a place to enter commands below. This can be <em>very</em> useful for tracing your assembly, especially once it is on the stack.</li>
<li>To print the value in a register, use <code>p/x $rsi</code> – this prints it in hexadecimal.</li>
<li>When paused in a part of the executable code, you can see the assembly listing via <code>disassemble</code> (or just <code>disas</code>). However, when you are executing on the stack, gdb can’t figure out what to disassemble, so it won’t show anything via a <code>disas</code> command. Instead, you can give it a range, such as: <code>disassemble 0x7fffffffdcb0,0x7fffffffdd30</code>. As x86 instructions are variable length, if your start address is not on an instruction boundary, you will not get the output you intend.</li>
<li>To print a number of values on the stack, try <code>x/40xw $rsp</code>. Note that this displays the data in little Endian, whereas <code>hexdump -C</code> (see below) displays it in big Endian.</li>
</ul>
<p>There is no example for this because the only relevant example is the homework solution.</p>
</div><div id='ttask-4-newline' class='tabcontent'><h3 id='task-4-newline'>Task 4: Newline</h3>
<p><strong><em>BACK UP YOUR CODE FIRST!!!</em></strong> You are about to modify your program, and if things go poorly, then you want to be able to revert back to the code that successfully completes tasks 3 and 4.</p>
<p>If you have made it this far, then your code, when run, looks something like:</p>
<pre><code>$ ./grade &lt; input.bin
Please enter your name:
Albert Einstein, your grade on this assignment is an A$</code></pre>
<p>You will notice that the prompt (the ‘$’) is on the same line as the desired grade because of the lack of a newline character in the string. You obviously can not put a newline in the string itself, as that would indicate end-of-string, and the rest of your carefully constructed data in input.bin would never make it onto the stack – so your buffer overflow attack would never occur.</p>
<p>For this task, you are to modify your shellcode (in shellcode.s), and your attack_shellcode.c file, to put a newline at the end of the string. You will likely want to compute that value and store it in the string, since you can’t include a newline in the shellcode itself. You should test this out via your <code>shellcode_test</code> program first. Note that one of the <code>mprotect()</code> calls in the provided shellcode_test.c allows for modification of the .text section, so such a modification will work in <code>shellcode_test</code>.</p>
<p>This is just a modification of the attack_shellcode.c file, so your modifications for this task can overwrite parts from the last two tasks. For grading purposes, if you did this task, then you obviously did the previous two tasks.</p>
<h4 id='execution-2'>Execution</h4>
<p>After compiling with <code>make</code>, we will execute your code via <code>./attack_shellcode</code> as follows.</p>
<pre><code>$ setarch $(uname -m) -R -L ./grade --print-buffer-address
7fffffffd410
$ setarch $(uname -m) -R -L ./grade --print-buffer-address &gt; address.txt
$ ./attack_shellcode &gt; input.bin
$ setarch $(uname -m) -R -L ./grade &lt; input.bin
Albert Einstein, your grade on this assignment is an A
$</code></pre>
<p>Notice that the end prompt is on the next line <em>after</em> the output.</p>
</div><div id='ttask-5-write-up' class='tabcontent'><h3 id='task-5-write-up'>Task 5: Write-up</h3>
<p>This is a <em>brief</em> write-up, and should be in the <a href='buffer/buffer.py.html'>buffer.py</a> (<a href='buffer/buffer.py'>src</a>) file – just fill in the blanks as necessary.</p>
<p>Please include the following information:</p>
<ul>
<li>Your name and userid
<ul>
<li><strong>NOTE:</strong> Your output of your shellcode must EXACTLY: <code>&lt;name&gt;, your grade on this assignment is an A</code>, as the Gradescope submission will check this. The <code>&lt;name&gt;</code> part must match the <code>name</code> variable in this file. It’s fine if you are only using a first name for the shellcode – if so, please put your full name in a comment in the file above the <code>name</code> variable.</li>
</ul></li>
<li>How far did you get (i.e., which tasks did you complete)?</li>
<li>What thing(s) did you get stuck on for this assignment?</li>
<li>Compared to some of the harder assignments you’ve done, how hard was this assignment?</li>
<li>Other than just giving the answers, what could we do to make this assignment run more smoothly in the future?</li>
<li>Any additional information that you would like to include (how far you got on a given task, etc.)</li>
</ul>
<p>We are not looking for any significant length here, just candid answers.</p>
</div><div id='ttroubleshooting' class='tabcontent'><h3 id='troubleshooting'>Troubleshooting</h3>
<p>You are likely to run into problems, which will probably be segmentation faults. Here are some ideas about how to solve that.</p>
<p>Recall that gcc often leaves some extra space between the buffer and the return address. How much? You have to figure that out yourself. You can look at the assembly generated for grade.c (run <code>gcc -S -o grade.s grade.c</code>) to start.</p>
<p>One issue is to figure out whether you are setting the return address correctly – if not, that would be the cause of the segfault. First, we need to find what the return address should be. To do this, we first need to find the return address from <code>vulnerable()</code> back to <code>main()</code>. Run <code>objdump -d grade</code>, and find the line <em>after</em> the call (in <code>main()</code>) to <code>vulnerable()</code> – that is the return address from vulnerable. Next, load up the executable in gdb, and set up a break point right when you enter into the <code>vulnerable()</code> function. You can list the stack with a command such as, <code>x/100xw $rsp</code>. This will print out 100 hex values, each one being 4 bytes. A typical gdb display will list four 32-bit (4 byte) values per line, which is 16 bytes per line. Since you know the size of the buffer (200 bytes), the return address is likely to be about 200/16 lines down, or about a dozen lines down. Keep in mind that a return address is 64 bits, which will be <em>two</em> of the 32-bit values listed in this display. That return address should (more or less) match the value you gleaned from the objdump command. Once you have the return address, figure out the command to just print out that return address: <code>x/2xw 0x7fffffffdcc0</code> might work (if the return address were located at 0x7fffffffdcc0). Next, set up a break point for the <em>end</em> of the <code>vulnerable()</code> function. Check that return address value again – it should look similar to (but not the exact same value as) the value in the rsp register (<code>print $rsp</code>). When split among two values, it will list like, <code>0xffffde00 0x00007fff</code>. Once you see that value, it should be the exact address of the <code>name</code> buffer. If not, then your program is going to seg fault.</p>
<p>If you are sure the return address is set correctly, the next step is to trace the injected assembly. Set a break point for the end of the <code>vulnerable()</code> function. If you set it <em>before</em> the last function call (the <code>strncpy()</code> call), hit <code>n</code> to step over that function. You will then start using <code>stepi</code> to single-step through the assembly instructions. Running <code>layout asm</code> will help the trace. You should step through your assembly code to make sure it works – presumably there will be the nop sled, then your injected code. Note that x86 instructions are of a different lengths, so if your jump is off, it may jump to the middle of an instruction, and you will get an illegal instruction error. You can disassemble a region of memory with a command such as <code>disassemble 0x7fffffffdcb0,0x7fffffffdd30</code>. Also note that when you are disassembling the contents of the stack, gdb will attempt to map the string (that you should receive, presumably, an A) to x86 instructions, many of which will be invalid. But if you do a disassembly from the start of your shell code (which is the target of your initial jump), you should see the assembly that you wrote.</p>
</div><div id='tsubmission' class='tabcontent'><h3 id='submission'>Submission</h3>
<p>You are going to submit five files:</p>
<ul>
<li><code>Makefile</code>: this calls four compilation lines (three from task 2, and one from task 3; the line for task 3 will also apply to tasks 4 and 5).
<ul>
<li>Make sure that things are compiled to the right names! shellcode.s should compile to shellcode.o, all of task 2 should compile into the <code>shellcode_test</code> executable, and the code from task 3 should compile into the <code>attack_shellcode</code> executable.</li>
<li>You can also include the compilation line for grade.c or not – we will provide it during testing regardless.</li>
</ul></li>
<li><code>shellcode.s</code> and <code>shellcode_test.c</code> from task 2</li>
<li><code>attack_shellcode.c</code> from task 3 & 4 (if you completed task 4, just submit that version, as that shows you also completed task 3)</li>
<li><code>buffer.py</code> from task 5
<ul>
<li>Make sure the <code>name</code> variable matches the name printed out by the shellcode, as Gradescope will check for this</li>
</ul></li>
</ul>
<p>If you did not get to a section, you will still have to submit the required files, as the submission will check that all 5 files are present. You can just create an empty file (or a file with <code>hello world</code> in it) if you did not get to that part.</p>
</div><script>document.getElementById('defaultOpen').click();</script></body>
</html>