all repos — site @ 9c1e6a8499a9fd1f2baf4c51767aa8efa82e8416

source for my site, found at icyphox.sh

build/blog/python-for-re-1/index.html (view raw)

  1<!DOCTYPE html>
  2<html lang=en>
  3<link rel="stylesheet" href="/static/style.css" type="text/css">
  4<link rel="shortcut icon" type="images/x-icon" href="/static/favicon.ico">
  5<meta content="Memeing security since forever." name=description>
  6<meta name="viewport" content="initial-scale=1">
  7<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
  8<meta content="#021012" name="theme-color">
  9<meta name="HandheldFriendly" content="true">
 10<meta name="twitter:card" content="summary_large_image">
 11<meta name="twitter:site" content="@icyphox">
 12<meta name="twitter:title" content="Anirudh">
 13<meta name="twitter:description" content="Memeing security since forever.">
 14<meta name="twitter:image" content="/static/icyphox.png">
 15<meta property="og:title" content="Anirudh">
 16<meta property="og:type" content="website">
 17<meta property="og:description" content="Memeing security since forever.">
 18<meta property="og:url" content="https://icyphox.sh">
 19<meta property="og:image" content="/static/icyphox.png">
 20<html>
 21  <title>
 22    Anirudh
 23  </title>
 24<script src="//instant.page/1.1.0" type="module" integrity="sha384-EwBObn5QAxP8f09iemwAJljc+sU+eUXeL9vSBw1eNmVarwhKk2F9vBEpaN9rsrtp"></script>
 25<div class="container-text">
 26  <header class="header">
 27     <a href="../">‹ back</a>
 28  </header>
 29<body> 
 30   <div class="content">
 31    <div align="left">
 32      <h1>Python for Reverse Engineering 1: ELF Binaries</h1>
 33
 34<h2>Building your own disassembly tooling for — that’s right — fun and profit</h2>
 35
 36<p>While solving complex reversing challenges, we often use established tools like radare2 or IDA for disassembling and debugging. But there are times when you need to dig in a little deeper and understand how things work under the hood.</p>
 37
 38<p>Rolling your own disassembly scripts can be immensely helpful when it comes to automating certain processes, and eventually build your own homebrew reversing toolchain of sorts. At least, that’s what I’m attempting anyway.</p>
 39
 40<h3>Setup</h3>
 41
 42<p>As the title suggests, you’re going to need a Python 3 interpreter before
 43anything else. Once you’ve confirmed beyond reasonable doubt that you do,
 44in fact, have a Python 3 interpreter installed on your system, run</p>
 45
 46<div class="codehilite"><pre><span></span><code><span class="gp">$</span> pip install capstone pyelftools
 47</code></pre></div>
 48
 49<p>where <code>capstone</code> is the disassembly engine we’ll be scripting with and <code>pyelftools</code> to help parse ELF files.</p>
 50
 51<p>With that out of the way, let’s start with an example of a basic reversing
 52challenge.</p>
 53
 54<div class="codehilite"><pre><span></span><code><span class="cm">/* chall.c */</span>
 55
 56<span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp"></span>
 57<span class="cp">#include</span> <span class="cpf">&lt;stdlib.h&gt;</span><span class="cp"></span>
 58<span class="cp">#include</span> <span class="cpf">&lt;string.h&gt;</span><span class="cp"></span>
 59
 60<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
 61   <span class="kt">char</span> <span class="o">*</span><span class="n">pw</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="mi">9</span><span class="p">);</span>
 62   <span class="n">pw</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="sc">&#39;a&#39;</span><span class="p">;</span>
 63   <span class="k">for</span><span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;=</span> <span class="mi">8</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">){</span>
 64       <span class="n">pw</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">pw</span><span class="p">[</span><span class="n">i</span> <span class="err"></span> <span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
 65   <span class="p">}</span>
 66   <span class="n">pw</span><span class="p">[</span><span class="mi">9</span><span class="p">]</span> <span class="o">=</span> <span class="sc">&#39;\0&#39;</span><span class="p">;</span>
 67   <span class="kt">char</span> <span class="o">*</span><span class="n">in</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="mi">10</span><span class="p">);</span>
 68   <span class="n">printf</span><span class="p">(</span><span class="s">&quot;password: &quot;</span><span class="p">);</span>
 69   <span class="n">fgets</span><span class="p">(</span><span class="n">in</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="n">stdin</span><span class="p">);</span>        <span class="c1">// &#39;abcdefghi&#39;</span>
 70   <span class="k">if</span><span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">in</span><span class="p">,</span> <span class="n">pw</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
 71       <span class="n">printf</span><span class="p">(</span><span class="s">&quot;haha yes!</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span>
 72   <span class="p">}</span>
 73   <span class="k">else</span> <span class="p">{</span>
 74       <span class="n">printf</span><span class="p">(</span><span class="s">&quot;nah dude</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span>
 75   <span class="p">}</span>
 76<span class="p">}</span>
 77</code></pre></div>
 78
 79<p>Compile it with GCC/Clang:</p>
 80
 81<div class="codehilite"><pre><span></span><code><span class="gp">$</span> gcc chall.c -o chall.elf
 82</code></pre></div>
 83
 84<h3>Scripting</h3>
 85
 86<p>For starters, let’s look at the different sections present in the binary.</p>
 87
 88<div class="codehilite"><pre><span></span><code><span class="c1"># sections.py</span>
 89
 90<span class="kn">from</span> <span class="nn">elftools.elf.elffile</span> <span class="kn">import</span> <span class="n">ELFFile</span>
 91
 92<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">&#39;./chall.elf&#39;</span><span class="p">,</span> <span class="s1">&#39;rb&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
 93    <span class="n">e</span> <span class="o">=</span> <span class="n">ELFFile</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
 94    <span class="k">for</span> <span class="n">section</span> <span class="ow">in</span> <span class="n">e</span><span class="o">.</span><span class="n">iter_sections</span><span class="p">():</span>
 95        <span class="k">print</span><span class="p">(</span><span class="nb">hex</span><span class="p">(</span><span class="n">section</span><span class="p">[</span><span class="err"></span><span class="n">sh_addr</span><span class="err"></span><span class="p">]),</span> <span class="n">section</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
 96</code></pre></div>
 97
 98<p>This script iterates through all the sections and also shows us where it’s loaded. This will be pretty useful later. Running it gives us</p>
 99
100<pre><code>› python sections.py
1010x238 .interp
1020x254 .note.ABI-tag
1030x274 .note.gnu.build-id
1040x298 .gnu.hash
1050x2c0 .dynsym
1060x3e0 .dynstr
1070x484 .gnu.version
1080x4a0 .gnu.version_r
1090x4c0 .rela.dyn
1100x598 .rela.plt
1110x610 .init
1120x630 .plt
1130x690 .plt.got
1140x6a0 .text
1150x8f4 .fini
1160x900 .rodata
1170x924 .eh_frame_hdr
1180x960 .eh_frame
1190x200d98 .init_array
1200x200da0 .fini_array
1210x200da8 .dynamic
1220x200f98 .got
1230x201000 .data
1240x201010 .bss
1250x0 .comment
1260x0 .symtab
1270x0 .strtab
1280x0 .shstrtab
129</code></pre>
130
131<p>Most of these aren’t relevant to us, but a few sections here are to be noted. The <code>.text</code> section contains the instructions (opcodes) that we’re after. The <code>.data</code> section should have strings and constants initialized at compile time. Finally, the <code>.plt</code> which is the Procedure Linkage Table and the <code>.got</code>, the Global Offset Table. If you’re unsure about what these mean, read up on the ELF format and its internals.</p>
132
133<p>Since we know that the <code>.text</code> section has the opcodes, let’s disassemble the binary starting at that address.</p>
134
135<pre><code># disas1.py
136
137from elftools.elf.elffile import ELFFile
138from capstone import *
139
140with open('./bin.elf', 'rb') as f:
141    elf = ELFFile(f)
142    code = elf.get_section_by_name('.text')
143    ops = code.data()
144    addr = code['sh_addr']
145    md = Cs(CS_ARCH_X86, CS_MODE_64)
146    for i in md.disasm(ops, addr):        
147        print(f'0x{i.address:x}:\t{i.mnemonic}\t{i.op_str}')
148</code></pre>
149
150<p>The code is fairly straightforward (I think). We should be seeing this, on running</p>
151
152<pre><code>› python disas1.py | less      
1530x6a0: xor ebp, ebp
1540x6a2: mov r9, rdx
1550x6a5: pop rsi
1560x6a6: mov rdx, rsp
1570x6a9: and rsp, 0xfffffffffffffff0
1580x6ad: push rax
1590x6ae: push rsp
1600x6af: lea r8, [rip + 0x23a]
1610x6b6: lea rcx, [rip + 0x1c3]
1620x6bd: lea rdi, [rip + 0xe6]
163**0x6c4: call qword ptr [rip + 0x200916]**
1640x6ca: hlt
165... snip ...
166</code></pre>
167
168<p>The line in bold is fairly interesting to us. The address at <code>[rip + 0x200916]</code> is equivalent to <code>[0x6ca + 0x200916]</code>, which in turn evaluates to <code>0x200fe0</code>. The first <code>call</code> being made to a function at <code>0x200fe0</code>? What could this function be?</p>
169
170<p>For this, we will have to look at <strong>relocations</strong>. Quoting <a href="http://refspecs.linuxbase.org/elf/gabi4+/ch4.reloc.html">linuxbase.org</a></p>
171
172<blockquote>
173  <p>Relocation is the process of connecting symbolic references with symbolic definitions. For example, when a program calls a function, the associated call instruction must transfer control to the proper destination address at execution. Relocatable files must have “relocation entries’’ which are necessary because they contain information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process’s program image.</p>
174</blockquote>
175
176<p>To try and find these relocation entries, we write a third script.</p>
177
178<div class="codehilite"><pre><span></span><code><span class="c1"># relocations.py</span>
179
180<span class="kn">import</span> <span class="nn">sys</span>
181<span class="kn">from</span> <span class="nn">elftools.elf.elffile</span> <span class="kn">import</span> <span class="n">ELFFile</span>
182<span class="kn">from</span> <span class="nn">elftools.elf.relocation</span> <span class="kn">import</span> <span class="n">RelocationSection</span>
183
184<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">&#39;./chall.elf&#39;</span><span class="p">,</span> <span class="s1">&#39;rb&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
185    <span class="n">e</span> <span class="o">=</span> <span class="n">ELFFile</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
186    <span class="k">for</span> <span class="n">section</span> <span class="ow">in</span> <span class="n">e</span><span class="o">.</span><span class="n">iter_sections</span><span class="p">():</span>
187        <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">section</span><span class="p">,</span> <span class="n">RelocationSection</span><span class="p">):</span>
188            <span class="k">print</span><span class="p">(</span><span class="n">f</span><span class="s1">&#39;{section.name}:&#39;</span><span class="p">)</span>
189            <span class="n">symbol_table</span> <span class="o">=</span> <span class="n">e</span><span class="o">.</span><span class="n">get_section</span><span class="p">(</span><span class="n">section</span><span class="p">[</span><span class="s1">&#39;sh_link&#39;</span><span class="p">])</span>
190            <span class="k">for</span> <span class="n">relocation</span> <span class="ow">in</span> <span class="n">section</span><span class="o">.</span><span class="n">iter_relocations</span><span class="p">():</span>
191                <span class="n">symbol</span> <span class="o">=</span> <span class="n">symbol_table</span><span class="o">.</span><span class="n">get_symbol</span><span class="p">(</span><span class="n">relocation</span><span class="p">[</span><span class="s1">&#39;r_info_sym&#39;</span><span class="p">])</span>
192                <span class="n">addr</span> <span class="o">=</span> <span class="nb">hex</span><span class="p">(</span><span class="n">relocation</span><span class="p">[</span><span class="s1">&#39;r_offset&#39;</span><span class="p">])</span>
193                <span class="k">print</span><span class="p">(</span><span class="n">f</span><span class="s1">&#39;{symbol.name} {addr}&#39;</span><span class="p">)</span>
194</code></pre></div>
195
196<p>Let’s run through this code real quick. We first loop through the sections, and check if it’s of the type <code>RelocationSection</code>. We then iterate through the relocations from the symbol table for each section. Finally, running this gives us</p>
197
198<pre><code>› python relocations.py
199.rela.dyn:
200 0x200d98
201 0x200da0
202 0x201008
203_ITM_deregisterTMCloneTable 0x200fd8
204**__libc_start_main 0x200fe0**
205__gmon_start__ 0x200fe8
206_ITM_registerTMCloneTable 0x200ff0
207__cxa_finalize 0x200ff8
208stdin 0x201010
209.rela.plt:
210puts 0x200fb0
211printf 0x200fb8
212fgets 0x200fc0
213strcmp 0x200fc8
214malloc 0x200fd0
215</code></pre>
216
217<p>Remember the function call at <code>0x200fe0</code> from earlier? Yep, so that was a call to the well known <code>__libc_start_main</code>. Again, according to <a href="http://refspecs.linuxbase.org/LSB_3.1.0/LSB-generic/LSB-generic/baselib---libc-start-main-.html">linuxbase.org</a></p>
218
219<blockquote>
220  <p>The <code>__libc_start_main()</code> function shall perform any necessary initialization of the execution environment, call the <em>main</em> function with appropriate arguments, and handle the return from <code>main()</code>. If the <code>main()</code> function returns, the return value shall be passed to the <code>exit()</code> function.</p>
221</blockquote>
222
223<p>And its definition is like so</p>
224
225<pre><code>int __libc_start_main(int *(main) (int, char * *, char * *), 
226int argc, char * * ubp_av, 
227void (*init) (void), 
228void (*fini) (void), 
229void (*rtld_fini) (void), 
230void (* stack_end));
231</code></pre>
232
233<p>Looking back at our disassembly</p>
234
235<pre><code>0x6a0: xor ebp, ebp
2360x6a2: mov r9, rdx
2370x6a5: pop rsi
2380x6a6: mov rdx, rsp
2390x6a9: and rsp, 0xfffffffffffffff0
2400x6ad: push rax
2410x6ae: push rsp
2420x6af: lea r8, [rip + 0x23a]
2430x6b6: lea rcx, [rip + 0x1c3]
244**0x6bd: lea rdi, [rip + 0xe6]**
2450x6c4: call qword ptr [rip + 0x200916]
2460x6ca: hlt
247... snip ...
248</code></pre>
249
250<p>but this time, at the <code>lea</code> or Load Effective Address instruction, which loads some address <code>[rip + 0xe6]</code> into the <code>rdi</code> register. <code>[rip + 0xe6]</code> evaluates to <code>0x7aa</code> which happens to be the address of our <code>main()</code> function! How do I know that? Because <code>__libc_start_main()</code>, after doing whatever it does, eventually jumps to the function at <code>rdi</code>, which is generally the <code>main()</code> function. It looks something like this</p>
251
252<p><img src="https://cdn-images-1.medium.com/max/800/0*oQA2MwHjhzosF8ZH.png" alt="" /></p>
253
254<p>To see the disassembly of <code>main</code>, seek to <code>0x7aa</code> in the output of the script we’d written earlier (<code>disas1.py</code>).</p>
255
256<p>From what we discovered earlier, each <code>call</code> instruction points to some function which we can see from the relocation entries. So following each <code>call</code> into their relocations gives us this</p>
257
258<pre><code>printf 0x650
259fgets  0x660
260strcmp 0x670
261malloc 0x680
262</code></pre>
263
264<p>Putting all this together, things start falling into place. Let me highlight the key sections of the disassembly here. It’s pretty self-explanatory.</p>
265
266<pre><code>0x7b2: mov edi, 0xa  ; 10
2670x7b7: call 0x680    ; malloc
268</code></pre>
269
270<p>The loop to populate the <code>*pw</code> string</p>
271
272<pre><code>0x7d0:  mov     eax, dword ptr [rbp - 0x14]
2730x7d3:  cdqe    
2740x7d5:  lea     rdx, [rax - 1]
2750x7d9:  mov     rax, qword ptr [rbp - 0x10]
2760x7dd:  add     rax, rdx
2770x7e0:  movzx   eax, byte ptr [rax]
2780x7e3:  lea     ecx, [rax + 1]
2790x7e6:  mov     eax, dword ptr [rbp - 0x14]
2800x7e9:  movsxd  rdx, eax
2810x7ec:  mov     rax, qword ptr [rbp - 0x10]
2820x7f0:  add     rax, rdx
2830x7f3:  mov     edx, ecx
2840x7f5:  mov     byte ptr [rax], dl
2850x7f7:  add     dword ptr [rbp - 0x14], 1
2860x7fb:  cmp     dword ptr [rbp - 0x14], 8
2870x7ff:  jle     0x7d0
288</code></pre>
289
290<p>And this looks like our <code>strcmp()</code></p>
291
292<pre><code>0x843:  mov     rdx, qword ptr [rbp - 0x10] ; *in
2930x847:  mov     rax, qword ptr [rbp - 8]    ; *pw
2940x84b:  mov     rsi, rdx             
2950x84e:  mov     rdi, rax
2960x851:  call    0x670                       ; strcmp  
2970x856:  test    eax, eax                    ; is = 0? 
2980x858:  jne     0x868                       ; no? jump to 0x868
2990x85a:  lea     rdi, [rip + 0xae]           ; "haha yes!" 
3000x861:  call    0x640                       ; puts
3010x866:  jmp     0x874
3020x868:  lea     rdi, [rip + 0xaa]           ; "nah dude"
3030x86f:  call    0x640                       ; puts  
304</code></pre>
305
306<p>I’m not sure why it uses <code>puts</code> here? I might be missing something; perhaps <code>printf</code> calls <code>puts</code>. I could be wrong. I also confirmed with radare2 that those locations are actually the strings “haha yes!” and “nah dude”.</p>
307
308<h3>Conclusion</h3>
309
310<p>Wew, that took quite some time. But we’re done. If you’re a beginner, you might find this extremely confusing, or probably didn’t even understand what was going on. And that’s okay. Building an intuition for reading and grokking disassembly comes with practice. I’m no good at it either.</p>
311
312<p>All the code used in this post is here: <a href="https://github.com/icyphox/asdf/tree/master/reversing-elf">https://github.com/icyphox/asdf/tree/master/reversing-elf</a></p>
313
314<p>Ciao for now, and I’ll see ya in #2 of this series — PE binaries. Whenever that is.</p>
315 
316    </div>
317   </body>
318   </div>
319</html>