build/blog/python-for-re-1/index.html (view raw)
1<!DOCTYPE html>
2<html lang=en>
3<link rel="stylesheet" href="/static/style.css" type="text/css">
4<link rel="stylesheet" href="/static/syntax.css" type="text/css">
5<link rel="shortcut icon" type="images/x-icon" href="/static/favicon.ico">
6<meta name="description" content="A blog where security is shilled, aggressively.">
7<meta name="viewport" content="initial-scale=1">
8<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
9<meta content="#021012" name="theme-color">
10<meta name="HandheldFriendly" content="true">
11<meta name="twitter:card" content="summary_large_image">
12<meta name="twitter:site" content="@icyphox">
13<meta name="twitter:title" content="Blog">
14<meta name="twitter:description" content="A blog where security is shilled, aggressively.">
15<meta name="twitter:image" content="/static/icyphox.png">
16<meta property="og:title" content="Blog">
17<meta property="og:type" content="website">
18<meta property="og:description" content="A blog where security is shilled, aggressively.">
19<meta property="og:url" content="https://icyphox.sh">
20<meta property="og:image" content="/static/icyphox.png">
21<html>
22 <title>
23 Python for Reverse Engineering #1: ELF Binaries
24 </title>
25<script src="//instant.page/1.1.0" type="module" integrity="sha384-EwBObn5QAxP8f09iemwAJljc+sU+eUXeL9vSBw1eNmVarwhKk2F9vBEpaN9rsrtp"></script>
26<div class="container-text">
27 <header class="header">
28 <a href="../">‹ back</a>
29 </header>
30<body>
31 <div class="content">
32 <div align="left">
33 <h1>Python for Reverse Engineering 1: ELF Binaries</h1>
34
35<h2>Building your own disassembly tooling for — that’s right — fun and profit</h2>
36
37<p>While solving complex reversing challenges, we often use established tools like radare2 or IDA for disassembling and debugging. But there are times when you need to dig in a little deeper and understand how things work under the hood.</p>
38
39<p>Rolling your own disassembly scripts can be immensely helpful when it comes to automating certain processes, and eventually build your own homebrew reversing toolchain of sorts. At least, that’s what I’m attempting anyway.</p>
40
41<h3>Setup</h3>
42
43<p>As the title suggests, you’re going to need a Python 3 interpreter before
44anything else. Once you’ve confirmed beyond reasonable doubt that you do,
45in fact, have a Python 3 interpreter installed on your system, run</p>
46
47<div class="codehilite"><pre><span></span><code><span class="gp">$</span> pip install capstone pyelftools
48</code></pre></div>
49
50<p>where <code>capstone</code> is the disassembly engine we’ll be scripting with and <code>pyelftools</code> to help parse ELF files.</p>
51
52<p>With that out of the way, let’s start with an example of a basic reversing
53challenge.</p>
54
55<div class="codehilite"><pre><span></span><code><span class="cm">/* chall.c */</span>
56
57<span class="cp">#include</span> <span class="cpf"><stdio.h></span><span class="cp"></span>
58<span class="cp">#include</span> <span class="cpf"><stdlib.h></span><span class="cp"></span>
59<span class="cp">#include</span> <span class="cpf"><string.h></span><span class="cp"></span>
60
61<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
62 <span class="kt">char</span> <span class="o">*</span><span class="n">pw</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="mi">9</span><span class="p">);</span>
63 <span class="n">pw</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'a'</span><span class="p">;</span>
64 <span class="k">for</span><span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">i</span> <span class="o"><=</span> <span class="mi">8</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">){</span>
65 <span class="n">pw</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">pw</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
66 <span class="p">}</span>
67 <span class="n">pw</span><span class="p">[</span><span class="mi">9</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'\0'</span><span class="p">;</span>
68 <span class="kt">char</span> <span class="o">*</span><span class="n">in</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="mi">10</span><span class="p">);</span>
69 <span class="n">printf</span><span class="p">(</span><span class="s">"password: "</span><span class="p">);</span>
70 <span class="n">fgets</span><span class="p">(</span><span class="n">in</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="n">stdin</span><span class="p">);</span> <span class="c1">// 'abcdefghi'</span>
71 <span class="k">if</span><span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">in</span><span class="p">,</span> <span class="n">pw</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
72 <span class="n">printf</span><span class="p">(</span><span class="s">"haha yes!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
73 <span class="p">}</span>
74 <span class="k">else</span> <span class="p">{</span>
75 <span class="n">printf</span><span class="p">(</span><span class="s">"nah dude</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
76 <span class="p">}</span>
77<span class="p">}</span>
78</code></pre></div>
79
80<p>Compile it with GCC/Clang:</p>
81
82<div class="codehilite"><pre><span></span><code><span class="gp">$</span> gcc chall.c -o chall.elf
83</code></pre></div>
84
85<h3>Scripting</h3>
86
87<p>For starters, let’s look at the different sections present in the binary.</p>
88
89<div class="codehilite"><pre><span></span><code><span class="c1"># sections.py</span>
90
91<span class="kn">from</span> <span class="nn">elftools.elf.elffile</span> <span class="kn">import</span> <span class="n">ELFFile</span>
92
93<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'./chall.elf'</span><span class="p">,</span> <span class="s1">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
94 <span class="n">e</span> <span class="o">=</span> <span class="n">ELFFile</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
95 <span class="k">for</span> <span class="n">section</span> <span class="ow">in</span> <span class="n">e</span><span class="o">.</span><span class="n">iter_sections</span><span class="p">():</span>
96 <span class="k">print</span><span class="p">(</span><span class="nb">hex</span><span class="p">(</span><span class="n">section</span><span class="p">[</span><span class="s1">'sh_addr'</span><span class="p">]),</span> <span class="n">section</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
97</code></pre></div>
98
99<p>This script iterates through all the sections and also shows us where it’s loaded. This will be pretty useful later. Running it gives us</p>
100
101<div class="codehilite"><pre><span></span><code><span class="go">› python sections.py</span>
102<span class="go">0x238 .interp</span>
103<span class="go">0x254 .note.ABI-tag</span>
104<span class="go">0x274 .note.gnu.build-id</span>
105<span class="go">0x298 .gnu.hash</span>
106<span class="go">0x2c0 .dynsym</span>
107<span class="go">0x3e0 .dynstr</span>
108<span class="go">0x484 .gnu.version</span>
109<span class="go">0x4a0 .gnu.version_r</span>
110<span class="go">0x4c0 .rela.dyn</span>
111<span class="go">0x598 .rela.plt</span>
112<span class="go">0x610 .init</span>
113<span class="go">0x630 .plt</span>
114<span class="go">0x690 .plt.got</span>
115<span class="go">0x6a0 .text</span>
116<span class="go">0x8f4 .fini</span>
117<span class="go">0x900 .rodata</span>
118<span class="go">0x924 .eh_frame_hdr</span>
119<span class="go">0x960 .eh_frame</span>
120<span class="go">0x200d98 .init_array</span>
121<span class="go">0x200da0 .fini_array</span>
122<span class="go">0x200da8 .dynamic</span>
123<span class="go">0x200f98 .got</span>
124<span class="go">0x201000 .data</span>
125<span class="go">0x201010 .bss</span>
126<span class="go">0x0 .comment</span>
127<span class="go">0x0 .symtab</span>
128<span class="go">0x0 .strtab</span>
129<span class="go">0x0 .shstrtab</span>
130</code></pre></div>
131
132<p>Most of these aren’t relevant to us, but a few sections here are to be noted. The <code>.text</code> section contains the instructions (opcodes) that we’re after. The <code>.data</code> section should have strings and constants initialized at compile time. Finally, the <code>.plt</code> which is the Procedure Linkage Table and the <code>.got</code>, the Global Offset Table. If you’re unsure about what these mean, read up on the ELF format and its internals.</p>
133
134<p>Since we know that the <code>.text</code> section has the opcodes, let’s disassemble the binary starting at that address.</p>
135
136<div class="codehilite"><pre><span></span><code><span class="c1"># disas1.py</span>
137
138<span class="kn">from</span> <span class="nn">elftools.elf.elffile</span> <span class="kn">import</span> <span class="n">ELFFile</span>
139<span class="kn">from</span> <span class="nn">capstone</span> <span class="kn">import</span> <span class="o">*</span>
140
141<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'./bin.elf'</span><span class="p">,</span> <span class="s1">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
142 <span class="n">elf</span> <span class="o">=</span> <span class="n">ELFFile</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
143 <span class="n">code</span> <span class="o">=</span> <span class="n">elf</span><span class="o">.</span><span class="n">get_section_by_name</span><span class="p">(</span><span class="s1">'.text'</span><span class="p">)</span>
144 <span class="n">ops</span> <span class="o">=</span> <span class="n">code</span><span class="o">.</span><span class="n">data</span><span class="p">()</span>
145 <span class="n">addr</span> <span class="o">=</span> <span class="n">code</span><span class="p">[</span><span class="s1">'sh_addr'</span><span class="p">]</span>
146 <span class="n">md</span> <span class="o">=</span> <span class="n">Cs</span><span class="p">(</span><span class="n">CS_ARCH_X86</span><span class="p">,</span> <span class="n">CS_MODE_64</span><span class="p">)</span>
147 <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">md</span><span class="o">.</span><span class="n">disasm</span><span class="p">(</span><span class="n">ops</span><span class="p">,</span> <span class="n">addr</span><span class="p">):</span>
148 <span class="k">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'0x{i.address:x}:</span><span class="se">\t</span><span class="s1">{i.mnemonic}</span><span class="se">\t</span><span class="s1">{i.op_str}'</span><span class="p">)</span>
149</code></pre></div>
150
151<p>The code is fairly straightforward (I think). We should be seeing this, on running</p>
152
153<div class="codehilite"><pre><span></span><code><span class="go">› python disas1.py | less </span>
154<span class="go">0x6a0: xor ebp, ebp</span>
155<span class="go">0x6a2: mov r9, rdx</span>
156<span class="go">0x6a5: pop rsi</span>
157<span class="go">0x6a6: mov rdx, rsp</span>
158<span class="go">0x6a9: and rsp, 0xfffffffffffffff0</span>
159<span class="go">0x6ad: push rax</span>
160<span class="go">0x6ae: push rsp</span>
161<span class="go">0x6af: lea r8, [rip + 0x23a]</span>
162<span class="go">0x6b6: lea rcx, [rip + 0x1c3]</span>
163<span class="go">0x6bd: lea rdi, [rip + 0xe6]</span>
164<span class="go">**0x6c4: call qword ptr [rip + 0x200916]**</span>
165<span class="go">0x6ca: hlt</span>
166<span class="go">... snip ...</span>
167</code></pre></div>
168
169<p>The line in bold is fairly interesting to us. The address at <code>[rip + 0x200916]</code> is equivalent to <code>[0x6ca + 0x200916]</code>, which in turn evaluates to <code>0x200fe0</code>. The first <code>call</code> being made to a function at <code>0x200fe0</code>? What could this function be?</p>
170
171<p>For this, we will have to look at <strong>relocations</strong>. Quoting <a href="http://refspecs.linuxbase.org/elf/gabi4+/ch4.reloc.html">linuxbase.org</a></p>
172
173<blockquote>
174 <p>Relocation is the process of connecting symbolic references with symbolic definitions. For example, when a program calls a function, the associated call instruction must transfer control to the proper destination address at execution. Relocatable files must have “relocation entries’’ which are necessary because they contain information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process’s program image.</p>
175</blockquote>
176
177<p>To try and find these relocation entries, we write a third script.</p>
178
179<div class="codehilite"><pre><span></span><code><span class="c1"># relocations.py</span>
180
181<span class="kn">import</span> <span class="nn">sys</span>
182<span class="kn">from</span> <span class="nn">elftools.elf.elffile</span> <span class="kn">import</span> <span class="n">ELFFile</span>
183<span class="kn">from</span> <span class="nn">elftools.elf.relocation</span> <span class="kn">import</span> <span class="n">RelocationSection</span>
184
185<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'./chall.elf'</span><span class="p">,</span> <span class="s1">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
186 <span class="n">e</span> <span class="o">=</span> <span class="n">ELFFile</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
187 <span class="k">for</span> <span class="n">section</span> <span class="ow">in</span> <span class="n">e</span><span class="o">.</span><span class="n">iter_sections</span><span class="p">():</span>
188 <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">section</span><span class="p">,</span> <span class="n">RelocationSection</span><span class="p">):</span>
189 <span class="k">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'{section.name}:'</span><span class="p">)</span>
190 <span class="n">symbol_table</span> <span class="o">=</span> <span class="n">e</span><span class="o">.</span><span class="n">get_section</span><span class="p">(</span><span class="n">section</span><span class="p">[</span><span class="s1">'sh_link'</span><span class="p">])</span>
191 <span class="k">for</span> <span class="n">relocation</span> <span class="ow">in</span> <span class="n">section</span><span class="o">.</span><span class="n">iter_relocations</span><span class="p">():</span>
192 <span class="n">symbol</span> <span class="o">=</span> <span class="n">symbol_table</span><span class="o">.</span><span class="n">get_symbol</span><span class="p">(</span><span class="n">relocation</span><span class="p">[</span><span class="s1">'r_info_sym'</span><span class="p">])</span>
193 <span class="n">addr</span> <span class="o">=</span> <span class="nb">hex</span><span class="p">(</span><span class="n">relocation</span><span class="p">[</span><span class="s1">'r_offset'</span><span class="p">])</span>
194 <span class="k">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'{symbol.name} {addr}'</span><span class="p">)</span>
195</code></pre></div>
196
197<p>Let’s run through this code real quick. We first loop through the sections, and check if it’s of the type <code>RelocationSection</code>. We then iterate through the relocations from the symbol table for each section. Finally, running this gives us</p>
198
199<div class="codehilite"><pre><span></span><code><span class="go">› python relocations.py</span>
200<span class="go">.rela.dyn:</span>
201<span class="go"> 0x200d98</span>
202<span class="go"> 0x200da0</span>
203<span class="go"> 0x201008</span>
204<span class="go">_ITM_deregisterTMCloneTable 0x200fd8</span>
205<span class="go">**__libc_start_main 0x200fe0**</span>
206<span class="go">__gmon_start__ 0x200fe8</span>
207<span class="go">_ITM_registerTMCloneTable 0x200ff0</span>
208<span class="go">__cxa_finalize 0x200ff8</span>
209<span class="go">stdin 0x201010</span>
210<span class="go">.rela.plt:</span>
211<span class="go">puts 0x200fb0</span>
212<span class="go">printf 0x200fb8</span>
213<span class="go">fgets 0x200fc0</span>
214<span class="go">strcmp 0x200fc8</span>
215<span class="go">malloc 0x200fd0</span>
216</code></pre></div>
217
218<p>Remember the function call at <code>0x200fe0</code> from earlier? Yep, so that was a call to the well known <code>__libc_start_main</code>. Again, according to <a href="http://refspecs.linuxbase.org/LSB_3.1.0/LSB-generic/LSB-generic/baselib---libc-start-main-.html">linuxbase.org</a></p>
219
220<blockquote>
221 <p>The <code>__libc_start_main()</code> function shall perform any necessary initialization of the execution environment, call the <em>main</em> function with appropriate arguments, and handle the return from <code>main()</code>. If the <code>main()</code> function returns, the return value shall be passed to the <code>exit()</code> function.</p>
222</blockquote>
223
224<p>And its definition is like so</p>
225
226<div class="codehilite"><pre><span></span><code><span class="kt">int</span> <span class="nf">__libc_start_main</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">(</span><span class="n">main</span><span class="p">)</span> <span class="p">(</span><span class="kt">int</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span> <span class="o">*</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span> <span class="o">*</span><span class="p">),</span>
227<span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span> <span class="o">*</span> <span class="n">ubp_av</span><span class="p">,</span>
228<span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">init</span><span class="p">)</span> <span class="p">(</span><span class="kt">void</span><span class="p">),</span>
229<span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">fini</span><span class="p">)</span> <span class="p">(</span><span class="kt">void</span><span class="p">),</span>
230<span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">rtld_fini</span><span class="p">)</span> <span class="p">(</span><span class="kt">void</span><span class="p">),</span>
231<span class="kt">void</span> <span class="p">(</span><span class="o">*</span> <span class="n">stack_end</span><span class="p">));</span>
232</code></pre></div>
233
234<p>Looking back at our disassembly</p>
235
236<pre><code>0x6a0: xor ebp, ebp
2370x6a2: mov r9, rdx
2380x6a5: pop rsi
2390x6a6: mov rdx, rsp
2400x6a9: and rsp, 0xfffffffffffffff0
2410x6ad: push rax
2420x6ae: push rsp
2430x6af: lea r8, [rip + 0x23a]
2440x6b6: lea rcx, [rip + 0x1c3]
245**0x6bd: lea rdi, [rip + 0xe6]**
2460x6c4: call qword ptr [rip + 0x200916]
2470x6ca: hlt
248... snip ...
249</code></pre>
250
251<p>but this time, at the <code>lea</code> or Load Effective Address instruction, which loads some address <code>[rip + 0xe6]</code> into the <code>rdi</code> register. <code>[rip + 0xe6]</code> evaluates to <code>0x7aa</code> which happens to be the address of our <code>main()</code> function! How do I know that? Because <code>__libc_start_main()</code>, after doing whatever it does, eventually jumps to the function at <code>rdi</code>, which is generally the <code>main()</code> function. It looks something like this</p>
252
253<p><img src="https://cdn-images-1.medium.com/max/800/0*oQA2MwHjhzosF8ZH.png" alt="" /></p>
254
255<p>To see the disassembly of <code>main</code>, seek to <code>0x7aa</code> in the output of the script we’d written earlier (<code>disas1.py</code>).</p>
256
257<p>From what we discovered earlier, each <code>call</code> instruction points to some function which we can see from the relocation entries. So following each <code>call</code> into their relocations gives us this</p>
258
259<pre><code>printf 0x650
260fgets 0x660
261strcmp 0x670
262malloc 0x680
263</code></pre>
264
265<p>Putting all this together, things start falling into place. Let me highlight the key sections of the disassembly here. It’s pretty self-explanatory.</p>
266
267<pre><code>0x7b2: mov edi, 0xa ; 10
2680x7b7: call 0x680 ; malloc
269</code></pre>
270
271<p>The loop to populate the <code>*pw</code> string</p>
272
273<pre><code>0x7d0: mov eax, dword ptr [rbp - 0x14]
2740x7d3: cdqe
2750x7d5: lea rdx, [rax - 1]
2760x7d9: mov rax, qword ptr [rbp - 0x10]
2770x7dd: add rax, rdx
2780x7e0: movzx eax, byte ptr [rax]
2790x7e3: lea ecx, [rax + 1]
2800x7e6: mov eax, dword ptr [rbp - 0x14]
2810x7e9: movsxd rdx, eax
2820x7ec: mov rax, qword ptr [rbp - 0x10]
2830x7f0: add rax, rdx
2840x7f3: mov edx, ecx
2850x7f5: mov byte ptr [rax], dl
2860x7f7: add dword ptr [rbp - 0x14], 1
2870x7fb: cmp dword ptr [rbp - 0x14], 8
2880x7ff: jle 0x7d0
289</code></pre>
290
291<p>And this looks like our <code>strcmp()</code></p>
292
293<pre><code>0x843: mov rdx, qword ptr [rbp - 0x10] ; *in
2940x847: mov rax, qword ptr [rbp - 8] ; *pw
2950x84b: mov rsi, rdx
2960x84e: mov rdi, rax
2970x851: call 0x670 ; strcmp
2980x856: test eax, eax ; is = 0?
2990x858: jne 0x868 ; no? jump to 0x868
3000x85a: lea rdi, [rip + 0xae] ; "haha yes!"
3010x861: call 0x640 ; puts
3020x866: jmp 0x874
3030x868: lea rdi, [rip + 0xaa] ; "nah dude"
3040x86f: call 0x640 ; puts
305</code></pre>
306
307<p>I’m not sure why it uses <code>puts</code> here? I might be missing something; perhaps <code>printf</code> calls <code>puts</code>. I could be wrong. I also confirmed with radare2 that those locations are actually the strings “haha yes!” and “nah dude”.</p>
308
309<h3>Conclusion</h3>
310
311<p>Wew, that took quite some time. But we’re done. If you’re a beginner, you might find this extremely confusing, or probably didn’t even understand what was going on. And that’s okay. Building an intuition for reading and grokking disassembly comes with practice. I’m no good at it either.</p>
312
313<p>All the code used in this post is here: <a href="https://github.com/icyphox/asdf/tree/master/reversing-elf">https://github.com/icyphox/asdf/tree/master/reversing-elf</a></p>
314
315<p>Ciao for now, and I’ll see ya in #2 of this series — PE binaries. Whenever that is.</p>
316
317 </div>
318 </body>
319 </div>
320</html>