Shellcode used to be a protective shell for a bad program.
Then it used to mean spawning a shell on a box with assembly payload
It has evolved to assembly payload
Shellcode is commonly used in code injection, thus it has no idea where it will land.
Can not use hard coded addresses
Must resolve EIP, Imports, etc.
PEB stored at fs[0x30]
Kernel32.dll has GetProcAddress and LoadLibraryA.
Kernel32.dll is always the 3rd entry in the InMemoryOrderModuleList
Parse the PEB to find LDR list, go to proper one, walk _LIST_ENTRY and find base address of kernel32.dll PE header
Resolve functions from Kernel32.dll using the previously mentioned API calls
PE Header -> Export Table -> Number of Exports -> AddressOfNames -> Hash a Name and Compare it to our value -> Resolve absolute base address of function call
Shellcode does not want to use strings for function values
It uses a hash of a function name
ROR13 quite common (described by Skape)
Substitution cipher
for c in str( function ):
function_hash = ror( function_hash, bits )
function_hash = (function_hash + ord(c))
def ror( dword, bits ):
return (( dword >> bits | dword << ( 32 - bits ) ) & 0xFFFFFFFF)
Shellcode that uses hashes will hash most functions in a DLL and compare the hash values. If they match, resolve the address of the function it matched.
C:\Python27\Scripts>hashing.py kernel32.dll LoadLibraryA
[+] Ran on Thu Apr 16 12:17:45 2015
[+] 0xEC0E4E8E = kernel32.dll!LoadLibraryA
C:\Python27\Scripts>hashing.py /mod c:\windows\system32 kernel32.dll | less
[+] Ran on Thu Apr 16 12:22:00 2015
[+] Scanning module 'kernel32.dll' in directory 'c:\windows\system32'.
IN SCAN
[+] 0xA77D8D5A = kernel32.dll!AcquireSRWLockExclusive
[+] 0xE2385C49 = kernel32.dll!AcquireSRWLockShared
[+] 0x2FA60624 = kernel32.dll!ActivateActCtx
[+] 0xECFC3453 = kernel32.dll!AddAtomA
[+] 0xECFC3469 = kernel32.dll!AddAtomW
[+] 0x99161276 = kernel32.dll!AddConsoleAliasA
[+] 0x9916128C = kernel32.dll!AddConsoleAliasW
[+] 0xE3EAC3E7 = kernel32.dll!AddDllDirectory
[+] 0x730CEFAE = kernel32.dll!AddIntegrityLabelToBoundaryDescriptor
[+] 0x33675025 = kernel32.dll!AddLocalAlternateComputerNameA
[+] 0x3367503B = kernel32.dll!AddLocalAlternateComputerNameW
[+] 0xBFC36E12 = kernel32.dll!AddRefActCtx
[+] 0xCDC729AB = kernel32.dll!AddSIDToBoundaryDescriptor
[+] 0x6B8B8FD9 = kernel32.dll!AddSecureMemoryCacheCallback
[+] 0x1A945C3B = kernel32.dll!AddVectoredContinueHandler
[+] 0x159B3EA0 = kernel32.dll!AddVectoredExceptionHandler
[+] 0x2CA68404 = kernel32.dll!AdjustCalendarDate
[+] 0xD9F868D8 = kernel32.dll!AllocConsole
[+] 0x6493DFD5 = kernel32.dll!AllocateUserPhysicalPages
[+] 0xDD6573EA = kernel32.dll!AllocateUserPhysicalPagesNuma
[+] 0xCEB4FB95 = kernel32.dll!ApplicationRecoveryFinished
Use hashes to look at shellcode imports
Locate type of ROR or Cipher shellcode is using and reverse the algorithm
The IDA Pro Book 2nd Edition – Chris Eagle
Practical Malware Analysis – Michael Sikorski and Andrew Honig
We can extract binaries and DLLs.
Import Address Table is destroyed when extracting using Volatility.
Understanding the
Algorithm Discovery
Understanding Purpose
Source → Compiler → Assembler → Linker → Binary on Disk
Because we went through compilation process
We are left with machine code!
01111111 01000101 01001100 01000110
Generate assembly from machine language
Assembly Disasm(machine[])
Objdump, Dumpbin, IDA, Ollydbg...
Any non-IDE debugger performs disassembly
Each architecture has different assembly/machine code mapping
Commonly malware is Intel Arch, some in ARM.
Algorithm needs to know how to read the file format to discover what is executable code / what is data.
PE has a lot of data as well as executable sections
The IA32 Instruction 'ret'
Returns from a function and returns to current stack pointer (Should be old EIP).
Machine Instruction is 0xc3.
Dissassembler sees 0xc3
** Linear Sweep
Recursive Descent**
Both take in machine code and output assembly code
Both fail on obfuscated code
Parse down the .text segment linearly.
Iterate over a block of code and disassemble one instruction at a time
Determines size of instructions and then starts at the next one!
No regard for flow-control.
Data embedded within code chokes this algorithm.
Debuggers
Complete coverage over program's code sections.
Goes through linear flow while reading each instruction and building a list of locations to disassemble.
For each call, jump, etc, add destination to a list to disassemble.
Stops parsing at a 'ret' or unconditional branch.
Static and Dynamic approaches
Static
Dynamic
Does not require running the malware
Analyze the code and structure of a program
Antivirus / YARA
Hashing
Strings
Determining if file is packed
Viewing imports / exports
Disassemble
Some approaches useful to write YARA signatures.
Do not rely solely on output of strings, dependencies, etc.
Disassembly listings quite good, unless heavily obfuscated.
Recall packers alter the structure of a binary
Run the malware!
Requires a secure environment
Observe malware functionality
Run through a sandbox
Process Monitor
Procmon
Regshot
Fake a network (iNetSim/ApateDNS)
Wireshark
IA32 / x86
Small amount of data storage available on CPU Quick access 8 General Purpose Segment Registers Status Register Instruction Pointer
Building blocks of assembly.
Consists of a mnemonic and zero or more operands.
Mnemonic
Operands
Mov eax, 0x42
Turn into opcodes at time of assemble.
Little Endian.
Immediate
Register
Memory Address
IA32 uses Dst, Src
LIFO Data Structure
Grows from high memory to low memory
Elements pushed onto the top of the stack and popped off the top of the stack.
Most functions consist of a prologue and an epilogue
Prologue sets up the stack frame for the current function
Push Ebp
Mov ebp, esp
Sub esp, sizeOfLocalArgs
Epilogue restores old stack pointers and returns to saved EIP before function call
Mov esp, ebp
Pop ebp
ret
Several calling conventions
Win32 uses stdcall
A call to a function foo with the arguments int x and int y
foo(int x, int y)
would need to be passed on the stack as such
push y
push x
Also the address of the next instruction after the call to foo will be pushed as the old EIP.
Decisions based on comparisons
Assembly has instructions to make comparsions
And instructions to branch
Conditionally executed statement to control flow
Jmp, Jne, Jz, je. . .
Interactive Disassembler from Hex-Rays
Recursive descent Disassembler
As well as heuristics to find additional code not found during the algorithm.
Uses a working database
You are not modifying an executable when you make changes
F.L.I.R.T.
Fast Library Identification and Recognition Technology
IDA recognizes standard library functions generated by certain compilers.
Will parse through a binary using recursive descent and then apply key structs to the binary if they match.
IDA matched two winAPI functions for manipulating the registry.
sub_xxx
loc_xxx
Byte, word, dword, etc_xxx
arg_xx
var_xx
Graph and Text mode.
Functions Window
Color band/navigation band
Green = Branch Taken / TRUE
Red = Branch Not taken / FALSE
Blue = Unconditional
Imports/Exports Window
Strings
Names Window
Xrefs
Renaming functions
Comments
Edit function “Alt+P”
Assembly is hard to learn
Line Prefixes and Stack Pointer
There is NO UNDO IN IDA!
Table of Contents | t |
---|---|
Exposé | ESC |
Full screen slides | e |
Presenter View | p |
Source Files | s |
Slide Numbers | n |
Toggle screen blanking | b |
Show/hide slide context | c |
Notes | 2 |
Help | h |