Skip to content
deresz edited this page Mar 19, 2013 · 4 revisions

avwhy: reversing anti-virus detection signatures

Do you want to know why your anti-virus detects something as malicious ? If yes then this script is for you!

A lot of antivirus products seem to use a variant of simple file hashing as a detection signature. The test I performed on a sample I wirte about later on in this post showed that for a file detected by 20 AV products, just adding one byte to the end of the file (which in fact only changes the hash function on the file but does not change the functionality and semantics), eliminates the detection of 8 of them. Here are MD5 hashes of the files (can be found on Virustotal):

6ea6487c68dbfcca7ab8b0c2a406295b 1d32a9c05a4324ad107a6cf0a3703d26

This time though I decided to go a bit futher than that. But first, how did it start ? I had a very irritating AV detections coming up at a customer site and they kept asking why is that and wanted to confirm the false positive detection. The file was a part of a wireless keyboard driver and although the code was really badly written and using some old third party libraries, I could not find anything malicious in it per se. Yes, it was hooking the keyboard, but was not saving keystrokes to a file nor sending them to a remote server. Yes, it contained code to go to some remote servers, but only to benign site like iTunes. And for the particular anti-virus product I was targeting (MS Security Essential), simply changing the hash sum of the file did not change the detection

So, I decided to figure out which signatures are responsible for the said detection, hoping that it might point me to some place in the code such that I can confirm it is malicious. At first, I was thinking about doing a hardcore reverse engineering with a debugger attached to the scanner but I would have to at the very least reverse and understand some part of the scanning engine until I could start looking at why it catches on the sample. As this seemed like loads of work, a simpler idea came to my mind. Why not do a "behavioral" analysis, change a single byte at a time in the sample, and then, after each change, run the anti-virus scanner to check if the sample is still detected as malicious ?

This is what the script does. It is splitting the fuzzed files in batches and scans them. Currently it supports McAfee uvscan for Linux and MS Security Essentials for Windows, but adding other scanners is trivial. For more info please look into the script's code and the command line help.

Here is an example output of the script. For the file in question that I had investigated, I have obtained the following results:

$ python avwhy.py ms suspicious.exe 2>/dev/null len: 17 00B047 61 70 70 6C 65 2E 63 6F 6D 2F 69 74 75 6E 65 73 apple.com/itunes 00B057 2F /

len: 37 00B05C 43 61 6E 27 74 20 66 6F 75 6E 64 20 74 68 65 20 Can't found the 00B06C 69 54 75 6E 65 73 20 6F 6E 20 79 6F 75 72 20 73 iTunes on your s 00B07C 79 73 74 65 6D ystem

len: 37 00B084 41 72 65 20 79 6F 75 20 77 61 6E 74 20 74 6F 20 Are you want to 00B094 64 6F 77 6E 6C 6F 61 64 20 61 20 69 54 75 6E 65 download a iTune 00B0A4 73 20 6E 6F 77 s now

len: 14 00B0C3 6D 75 73 69 63 6D 61 74 63 68 2E 63 6F 6D musicmatch.com

len: 49 00B0DC 43 61 6E 27 74 20 66 6F 75 6E 64 20 74 68 65 20 Can't found the 00B0EC 4D 75 73 69 63 4D 61 74 63 68 20 4A 75 6B 65 62 MusicMatch Jukeb 00B0FC 6F 78 20 6F 6E 20 79 6F 75 72 20 73 79 73 74 65 ox on your syste 00B10C 6D m

len: 49 00B110 41 72 65 20 79 6F 75 20 77 61 6E 74 20 74 6F 20 Are you want to 00B120 64 6F 77 6E 6C 6F 61 64 20 61 20 4D 55 53 49 43 download a MUSIC 00B130 4D 41 54 43 48 20 4A 75 6B 65 62 6F 78 20 6E 6F MATCH Jukebox no 00B140 77 w

Funny enough ? :-) So, I could conclude: it detects certain strings in the binary, but these are not malicious code fragments. And this was all I wanted to know. If you want to know as well why your antivirus is yelling about something I hope you will find this thing usefull.

This description is also available here.

Clone this wiki locally