Identifying unknown files by using fuzzy hashing July 25, 2011
Posted by lvdeijk in Uncategorized.4 comments
Identifying unknown files by using fuzzy hashing
Over the last couple of years I have captured about 2 gigabytes of malware using the Dionaea honeypot. Analysing and identifying those files can mostly be done by sites as Virustotal, Anubis or CWsandbox. By modifying the ihandler section in the dionaea.conf this can be done fully automated.
Every now and then even these excellent analysis sites come up with nothing. No result or whatsoever. This could be because its a brand new sample of malware which simply isn’t recognised yet or it is a morphed sample of a known and existing one.
There still is a method to determine what kind of malware the file represent. This method is called fuzzy hashing. The technique finds its origin in spam filtering (spamsum)
From the README file:
“spamsum is a tool for generating and testing signatures on files. The signature is designed to be particularly suitable for producing a result that can be used to compare two emails and see if they are ‘similar’. This can provide the core of a SPAM detection system.
The algorithms in spamsum are in two parts. The first part generates a signature which is encoded as a string of ascii characters less than 72 characters long. The second part takes a new signature and a database of existing signatures (actually just a text file with one
signature per line) and finds the existing signature that best matches the new signature. A match result in the range of 0 to 100 is generated, where 100 is a perfect match and 0 is a complete mismatch.”
A similar tool based on spamsum is SsDeep maintained by Jesse Kornblum (if you google for it, a link to a sourceforge page shows up. This site is down on the time of writing this text but there are ubuntu packages available in the ubuntu package-tree. So a apt-get install ssdeep should do the trick ).
So this can be done for unrecognized malware as well. By generating a hash from the alleged malware, we can compare it against the 2 gigabyte collection already caught and identified malware.
By using ./ssdeep -lr 11a1f1acc4ed824dc1e332ce8c2fd50e > testhash
you generate a file that looks like this:
ssdeep,1.0–blocksize:hash:hash,filename
3072:GiSkUYBQgZ+z1vezLPVr7Qe4lAtWhazqiatiPiHpOKeXmPFYZK/z:Gi3BBZ+5v0LtQx+tQauieHAXCFycz,”11a1f1acc4ed824dc1e332ce8c2fd50e”
So if we do: ./ssdeep -lrm testhash .
snip
./3a74bc105edfe54445d1fca28cc4f542 matches testhash:11a1f1acc4ed824dc1e332ce8c2fd50e (99)
./556b6807d33ebfe2ec95f3598e168f62 matches testhash:11a1f1acc4ed824dc1e332ce8c2fd50e (85)
./daf46feccab82f6c86daae4f366bfbe1 matches testhash:11a1f1acc4ed824dc1e332ce8c2fd50e (75)
./3bcd999965892aea89be5606f6811bfa matches testhash:11a1f1acc4ed824dc1e332ce8c2fd50e (69)
./33a91a9ed61fe8f59190f4d73791bf06 matches testhash:11a1f1acc4ed824dc1e332ce8c2fd50e (82)
./525fc4565d588c11a5b56aaf4f3c7a12 matches testhash:11a1f1acc4ed824dc1e332ce8c2fd50e (99)
./fead84c5df2e585749a8da2ce583c926 matches testhash:11a1f1acc4ed824dc1e332ce8c2fd50e (99)
/snip
So for example, if we take out the last result “fead84c5df2e585749a8da2ce583c926” and run a clamscan against it, we come up with the following result:
fead84c5df2e585749a8da2ce583c926: Worm.Kido-175 FOUND
Where daf46feccab82f6c86daae4f366bfbe1 seems to match with Worm.Kido-268 FOUND. Another variant from the same malware family.
We we can safely assume that the file is for 99 percent the same as “11a1f1acc4ed824dc1e332ce8c2fd50e” and is a variant of Kido-175
Probably the same malware has been identified under different names. So, to be sure we have identified it correctly, we can also match it to other 99% matches in the list, e.g. “3a74bc105edfe54445d1fca28cc4f542″.
To sum up: All matches seem to indicate that this particular piece of malware is _some_ variant of Kido. Possibly a new incarnation. Even if we can’t pinpoint which type it is exactly, we still can make some educated guesses as to the family and its dangers. Knowing what a certain malware tends to do (e.g. it tries to find a C&C server for further instructions) we can assess the potential threat this piece of malware poses. If all connections to C&C servers are blocked (because all known C&C are filtered and the usual IRC traffic blocked) an infection with this type of malware doesn’t immediately mean a widespread breakout or data-leakage.
So, even if the md5 checksums don’t match, fuzzy hashing can come in handy to identify unknown and suspicious files.
Thanks Dennis Lemckert (@dlemckert) for helping me out on some grammar issues
Closing the loop February 21, 2011
Posted by lvdeijk in Uncategorized.3 comments
I am working with honeypots for some time now, and every now and then I get questions like “How are honeypots going to protect my network?” At first I would say “They won’t”.
So, then what’s the use installing and maintaining one? Two major reasons:
- They can help you to understand how malware works and, by understanding that, you can justify investments in defensive measures to your management.
- They do deliver more results than knowledge alone. They actually catch stuff and stuff caught by them can be used as input for the systems which are installed as the defensive line.
An example
I’ve written before about the Dionaea honeypot (made by Markus Koetter). I’ve also talked about it in various podcasts. It emulates known Microsoft OS weaknesses. By “playing along” with the offered malware, one can actually obtain a copy from the attacking malware, because Dionaea logs everything and saves everything it gets chucked at.
So, you have a live sample caught from the wires using the dionaea honeypot. Now what? First of all, be careful with it. It is live malware after all. So do take protective measures.
- NEVER work in the production environment.
- Use Airgaps whenever possible.
- Use virtual environments whenever possible. (yes, I know there’s malware out there which specifically looks if it’s running in a virtual environment to prevent detection. I’m still trying to figure out how to get at those without running on bare iron.)
By editing the dionaea.conf file (/opt/dionaea/etc/dionaea/) Dionaea gan be directed to submit the malware automatically to different analysis sites like, for instance, Anubis or CWsandbox. And when you subscribe to Virustotal for an account, you get your own identifying key. Putting that key in dionaea.conf file will upload the malware files to Virustotal for antivirus vendor detection. There are also some other good documented API’s for automated uploads and analysis to be used.
A realization
The files saved by the Dionaea honeypot are malware. They don’t hit the honeypot by accident, one can be sure about that because the Dionaea system doesn’t do anything useful for any person. So, as stated above: any and all bitstreams picked up by Dionaea is either malware or junk. Junk gets chucked out, malware gets saved. But with all the variants of malware (usually pe32 files) and the measures taken to obfuscate them (e.g. Packing), it could be that only one or two AV vendors registered at Virustotal actually recognise the files as potential malware.
| To get an idea, a real live VirusTotal example: | And after two days: |
| File name: app.exe Submission date: 2011-02-15 13:42:07 (UTC) Current status: finished Result: 1 /43 (2.3%) |
File name: winfixer.exe Submission date: 2011-02-17 16:02:45 (UTC) Current status: finished Result: 25 /43 (58.1%) |
As you can see it took two whole days for the malware to be detected by 25 out of 43 AV vendors. The MD5 checksum for the malware is ca86f875c2a85f72a315e61bb784a91c so you can look it up.
It is good practise to use more than just one AV product. But even if you use more than one, there’s always a timegap in which malware stays under the radar, leaving your infrastructure vulnerable for that particular nasty piece of code. Let’s call this code “0day Malware”. Malware which is out there, but isn’t recognized (yet).
One of the very first CERT teams (CERT/CC) have documented all their experiences in so-called “Best practises”. They have defined 17 services, or processes a CERT team could (or perhaps even should) have in place to become successfull. A few of those services describe Protect Infrastructure, Detection of Events, Vulnerability Management, and Artifact Analysis.
Here, Dionaea is used for both the catching of the malware and the analysis (Artifact Analysis). Having the malware at hand doesn’t help you with Vulnerability Management, yet. It also doesn’t help you with the Protect Infrastructure bit, or the Detection of Events. That’s what the AV is for, right?
But, what if one not only has a piece of malware at hand, but actually feeds it back into the defensive mechanisms already in place? What if that nasty code can be fed to the AV?
Closing the loop
While an interesting question in and of itself, one usually hears a standard response: “That’s why you feed it to VirusTotal. That’s where the AV vendors get their signal from that something new has been detected”. True though it might be, we still see a timegap between the moment a piece of malware is found and the time it’s recognized by AV products. Then there’s a timegap between the actual recognition and the update, delivered by said AV vendor.
When one looks at aforementioned processes, described by CERT/CC, one can see it’s possible to create a feedback-loop. The outcome of the process can be fed back into itself.
In this particular case, the malware found by Dionaea (in the Artifact Analysis stage) can be fed back at two stages:
- By updating the AV products already in place. This should be done even without a security organization in place. Just common knowledge and regular management. This has a potential timegap though.
- By declaring the unknowns being Malware (which it has to be, considering it’s dropped on a Honeypot, remember?) and feeding said malware into the AV products by a custom process.
This effectively closes the loop between signalling the malware and the AV vendor’s update.
A Possibility
The free AV ClamAV comes with a interesting tool called “Sigtool” wich allows you to generate your own signatures. The easiest way to create signatures for ClamAV is to use MD5 checksums. However this method can be only used against static malware. To create a signature for test.exe use the –md5 option of sigtool:
sigtool –md5 test.exe > test.hdb
cat test.hdb
48c4533230e1ae1c118c741c0db19dfb:17387:test.exe
Of course: if you’re adding extra signatures to your .hdb file, use >> in stead of >
That’s it! The signature is ready for use. Copy the .hdb file to the location (in my case /var/lib/clamav/) where the main.cvd and daily.cvd folder and test.exe will be recognised and blocked.
Clamscan will produce output like this:
clamscan test.exe
test.exe: test.exe FOUND
———– SCAN SUMMARY ———–
Known viruses: 1
Scanned directories: 0
Engine version: 0.92.1
Scanned files: 1
Infected files: 1
Data scanned: 0.02 MB
Time: 0.024 sec (0 m 0 s)
ClamAV now detects unknown malware!
ClamAV is also known for its ability to scan email for malware. By using this tool you can effectively protect your infrastructure from 0day Malware. Of course, there are more ways and methods to generate your own signatures. These methods are well described in the Sigtool documentation.
Summary
So, in this case I used Dionaea results, combined with the sigtool from ClamAV to take extra protective measures. Even if major AV vendors don’t detect the malware (yet), my system gets nervous anyway and raises the alarm. I’ve closed the loop in my processes and improved the defense of my infrastructure.
2010 in review January 2, 2011
Posted by lvdeijk in Uncategorized.add a comment
The stats helper monkeys at WordPress.com mulled over how this blog did in 2010, and here’s a high level summary of its overall blog health:

The Blog-Health-o-Meter™ reads This blog is doing awesome!.
Crunchy numbers
A Boeing 747-400 passenger jet can hold 416 passengers. This blog was viewed about 1,500 times in 2010. That’s about 4 full 747s.
In 2010, there were 4 new posts, growing the total archive of this blog to 6 posts.
The busiest day of the year was January 29th with 63 views. The most popular post that day was Using metasploit to gain access and migrate to another process.
Where did they come from?
The top referring sites in 2010 were twitter.com, blog.infosanity.co.uk, bigextracash.com, infosanity.wordpress.com, and secuobs.com.
Some visitors came searching, mostly for lvdeijk, memory carving, foremost memory image, upload file to anubis sandbox commandline, and cipsonel.
Attractions in 2010
These are the posts and pages that got the most views in 2010.
Using metasploit to gain access and migrate to another process January 2010
Carving malware from live memory November 2009
4 comments
About November 2009
Some kippo results September 2010
E-Mail chainletters & Hoaxes February 2010
Some kippo results September 28, 2010
Posted by lvdeijk in Uncategorized.1 comment so far
On the 23th of July I started with the SSH honeypot kippo. So after a good two months I decided to collect all the urls/locations those “1337 h4x0rs” are wgetting all their files from. (rootkits/ircbots/scanners)
I came up with the following list:
- http://arhive.xp3.biz/.x/ (multiple times)
- http://r.o.o.t.hi2.ro/
- pibo.com/.x/
- http://smithboy.webs.com/scan/
- http://smithboy.webs.com/emech/
- http://y2khom3.evonet.ro/
- http://eyesz.is-the-boss.com/
- iuliseverin.go.ro/ (multiple times)
- http://linuxhk.webs.com/xxplex/
- webmail.planetarium.com.br/~clayton/iadus/hide
- http://mdtorrent.hi2.ro/upload/
- blackdj.110mb.com/ (multiple times)
- austryaku.110mb.com/
- http://www.freewebs.com/iulianshooter/
- http://pinky.clan.su/flood/ (multiple times)
- freefun.do.am/ (multiple times)
- http://teste.meister.tripod.com/
- http://cake.do.am/ (multiple times)
- www.iadus.hi2.ro/
- http://clubhack.ucoz.org/ (multiple times)
- freewebtown.com/baietzas/Arhive/
- hurricane.home.ro
- http://LinuxSyS.Webs.Com/ (multiple times)
- http://www.packetstormsecurity.org/Crackers/ (legitimate site)
- keylogger123.home.ro/
- http://rohacker.ucoz.ru/ (multiple times)
- kok.ucoz.de/ (multiple times)
- http://freedphoto.com/~test/ (multiple times)
- http://vladutz.110mb.com/trades/
- chicktool.com/.x/others/
- www.freewebtown.com/hotzu/altele/
- freewebtown.com/codz/py/
- http://freewebtown.com/tarxvfz/
- http://freewebtown.com/evilish12/
- www.freewebtown.com/hotzu/xp/
- freewebtown.com/gigel/ (multiple times)
- http://aditzu.ucoz.net/
- http://blackenergy.110mb.com/Emech/
- http://iReaL-Clan.Webs.Com/Arhive/
- http://N-A-S-A.tk/Stifler/mech/
- http://eyesz.is-the-boss.com/
- bezbol.go.ro/ (multiple times)
- http://blackenergy.110mb.com/PsyBNC/
- http://blackenergy.110mb.com/Flood/
- http://blackenergy.110mb.com/Scanner/
- http://solid.go.ro/
- http://pokolake.is-the-boss.com/tgz/ (multiple times)
- cipsonel.com/lipi/ (multiple times)
- http://webfun.evonet.ro/tcl/
- web.clicknet.ro/mirel19/
- adelinuangell.lx.ro/cote/
- http://www.lourdesabarbosa.com/null/
- http://67.227.209.217/~admin/xd/
- http://thecooters.com/
- nasa.tradelinux.org/flood/
- http://mirc.go.ro/
- friguros.com/
- http://sipvicious.googlecode.com/files/ (legitimate site)
- http://tbdev.hi2.ro/
- http://geox.at.ua/
- http://csmioveni.tripod.com/Hack/
- http://208.75.230.43/drugsloco/
Now, I am not saying that these sites are “evil”. Chances are most likely that they are compromised themselves. So, just simply putting them on a blacklist isn’t a good idea.
Some of these links contain open directories, including all sorts of files, while other sites simply may have disappeared into thin air. It’s purely a list I extracted from the database my kippo is writing it’s results to.
As kippo also stores the obtained files, I have a copy of every single one of them for further analysis.
Use this information and/or containing files at you own risk.
Kippo also keeps track of every typed command in every “session”
One particular session I found too funny not sharing it:
Thanks to Justin Elze, for helping me out with the video.
Dissectingthehack June 24, 2010
Posted by lvdeijk in Uncategorized.add a comment
Apart from this little webblog of mine I’m also involved in a cool project ran by Jayson Street called Dissectingthehack.
Dissectingthehack is exactly what it says it is. It is a community of people who study vulnerabilities and the techniques people use to exploit them. This to get a better understanding on how to secure systems on Internet. We talk about it, share our views and often joke about it. Don’t let the word “hack” make you think we are criminals! The founder, Jayson Street, is a well-respected security specialist who, among others, consults the FBI on cybercrime issues and author of the book “Dissecting the Hack, The Forbidden Network”.
So, why do I contribute on www.dissectingthehack.com ? Well, I am a strong believer in sharing knowledge. It is my opinion that sharing your experience/knowledge works both ways. When you share, people are willing to share back, so to speak. Sharing my knowledge brings me in a position where i can learn as well.
Everybody wins.
Everybody profits.
Like minded people stimulate each other. Its not a bragging contest where everyone’s ego is blocking the will to learn from others.
Videos withdrawn June 24, 2010
Posted by lvdeijk in Uncategorized.2 comments
It seems that there are people out there on the Internet who actually read my blog postings and watch my video’s. That’s nice
but:
It also seems a large part of the audience is misinterpreting the contents of the videos. Come to think of it, that might be explained because the vids were intended as mostly self-explanatory and aren’t placed in their correct context. I failed big-time there.
That shook me a bit, so i blocked access to them temporary to figure out a way to put them in the right perspective. In every way it is NOT my intention to “train” wannabe hackers. Neither is it my intention to stimulate people in criminal activities. The vids might be seen as manuals to abuse systems, but the actual content can’t be used in that way. Trouble is, people still think it can.
I’ll reopen access when I’ve figured out a good way and add more content when I’m satisfied with a method.
Hints and tips would be welcome though.
E-Mail chainletters & Hoaxes February 2, 2010
Posted by Dennis Lemckert in Uncategorized.1 comment so far
Today I had a funny one at my work.
for some reason, people still keep falling for e-mail chainletters and hoaxes. The question just asks itself: why?
First of all, for those who don’t know what either is:
chainletters & hoaxes
An e-mail chainletter is some e-mail message like this: Bill Gates Fortune. Aside from being a hoax as well, usually people refer to false virus-notifications as being a hoax, like: The Olympic Torch Hoax. Both are hoaxes as said.
Hoaxes can be categorized into three rather distinct types:
- A Virus-notification
- Missing persons
- Free money
All have some characteristics by which one can recognize a hoax:
- Usually it claims to originate from a well-known source or authority, Big companies and lawyer-firms and such.
- It sports large amounts of exclamation marks.
- It wants you to perform some weird action like delete a specific file or be on the lookout for a certain person, or don’t open a certain e-mail/file/attachment/website.
- It asks to be forwarded to as many people in one’s address book as possible.
The ones claiming to be a virus-notification fooling unsuspecting users I can understand. The hoaxes about ‘missing’ persons as well because of the perceived tragedy, but especially the ones claiming to grant some form of free money, like the Bill Gates Fortune hoax, should be an obvious fake to most sensible people. Still, that doesn’t seem to be so.
why they work
Personally I think the reason people fall for this kind of messages is threefold:
- People want to be nice and be liked
- People are greedy; they want to get more money and/or they’re afraid to lose what they have
- People are afraid of what they don’t understand
The first is blatantly obvious and blatantly simple: People are social creatures, want to be liked and therefor tend to act nicely towards others. Some explanation can be found at the Social Engineering Framework. These hoaxes capitalize on this phenomenon: They ask the reader to do something, which they simply are inclined to do. The two major reasons for this reaction are: The sender of the message is most likely someone the reader knows and likes and the message appears to come from some authority, be it with many steps between, but still.
The second seems obvious as well: Most hoaxes are either based on the threat of a non-working PC, which in the mind of the reader equates to losing money, or they’re based on receiving free money by massively forwarding said message. The original ‘analog’ chainletter did the same: Forward the letter to as many people one could think of, put one’s name on the bottom of the list, remove the topmost name and send money to that person. The whole system just screams ‘Pyramid Scheme‘.
The last is a bit harder. This one more has to do with the fact that a reader doesn’t see he is reading a hoax than something else. Just imagine: Joe Average User just gets a panicky message from his aunt: Some eeevil virus is on the loose, eating PC’s. Noone can detect it, but some high-up tech-savvy company found it anyway and offers a simple solution:
- step 1: Forward the message to everyone but the devil and
- step 2: Delete file foo. (Seeing that some of these files can be rather critical, their deletion having a crash as result, this order is important)
How, not being tech-savvy, does Joe A. User know it’s just a hoax? Seeing it’s from a techy company and his aunt sent it to him, reason 1 and 2 kick in. He makes a Pascal’s Wager and chooses the safest option for him: Forward, Delete the file and hope for the best.
the effects
Being at work in a large organization, in my experience these kind of hoaxes have a two-way effect: Huge bandwidth usage on the mail-backbone and address-harvesting for spamming.
bandwidth
Just one incoming hoax-message can have serious effects on a corporate mail-system. People who react with a CTRL-A, CTRL-C, CTRL-V on the Global Addresslist multiply the original message by each and every e-mail address in that list. Knowing most companies have many distribution lists as well, each with a number of recipients in it, one can imagine the amount of messages it will create when one unsuspecting user wants to be helpful and hits that [SEND] button. Next comes the facts that some users aren’t online or are connected through small datalinks because they’re at some remote location. These messages need to be stored.
The fun starts when some _other_ user, being slightly more tech-savvy than the original sender sees the message, knows it’s a hoax and wants to be helpful by notifying the organization of the sender’s error by, oh the irony, hitting [REPLY TO ALL].
Then there’s the funny functions of ‘notification of receipt’ and ‘notification of read’ added to the mix and the surefire effect of the first [REPLY TO ALL] inciting the massive amount of reactions of others telling it’s wrong to use [REPLY TO ALL].
Result: Mailserver goes

Sometimes I think this was the original reason the phenomenon of the hoax exists in the first place.
address harvesting
People who have ever seen a fully matured hoax-message know what I’m talking about when I say it’s easily possible to gather about 3000-10000 e-mail addresses from one stream of ‘FW: FW: FW: FW: your average hoax’. Everyone makes the same mistake: Hit Forward, Copy all addresses from one’s address book and paste them into the TO: field. Somewhere down the line, someone’s bound to pick up the mail, gather all addresses from it and feeds them into his spam-server.
Hasn’t anyone ever heard of using Blind Carbon Copy? If not for the safekeeping of one’s address book, then at least to keep the size of the forwarded message within reasonable limits!
These days, with organizations having quite a bit more bandwidth and storage capacity than, say 10 years ago, this problem is even bigger than the first one. No matter how well an organization tries to keep it’s employee database to itself, there’s always someone who just dumps the entire addresslist in some chainletter and sends it to the outside world.
The only way to minimize the damage of this effect is to implement a strict policy of who may use how many addresses of the global address list at once. It still doesn’t prevent Joe A. User to use his own address list at home of course. For that, more awareness will be needed. Awareness on all levels of the internet-population.
To wrap it all, some useful links to pass on to who’s interested or needs the awareness:
http://vil.mcafee.com./hoax.asp
http://en.wikipedia.org/wiki/Virus_hoax
and one more to learn about social engineering which, after all, is the basis of the effectiveness of a hoax:
http://www.social-engineer.org/
RAM carving malware December 11, 2009
Posted by lvdeijk in Uncategorized.add a comment
Well, it seems there are other purposes for RAM carving according to this post on securityfocus
As the article mentions it is mostly used in targeted attacks…for now. If this type of attack becomes mainstream in malware behavior it could develop into a really nasty attack vector.
Carving malware from live memory November 17, 2009
Posted by lvdeijk in Uncategorized.6 comments
Introduction
After spending some time in our laboratory, experimenting with some ruby scripts for the metasploit framework, I conducted a small experiment. I was wondering what if I could carve files out of memory-dump files ?! It could be possible to carve out portable executables/malware as well. This write-up demonstrates what I did.
How to get malware
Getting infected with malware these days is simple. Just put an unpatched home PC online and you are bound to get infected with one ore more pieces of nasty code.
As easy as this is, collecting malware for further analysis in a laboratory environment requires some other type of machine.
Collecting malware, trojans, irc-bots, worms and other type nasties to study their behavior in a safe and controlled environment requires computer systems called honeypots.
A honeypot designed for collecting malware is a machine that emulates different operating systems weaknesses to deliberately convince malware to think that it has a potential target. So, without getting infected, you can “catch” the offered malware for further analysis.
Honeypots are divided into low or high interaction honeypots.
High-Interaction
- Real services, OS/s, or applications
- Higher Risk
- Hard to deploy and maintain
- Capture extensive amount of information
Low-Interaction
- Emulation of TCP/IP stack, vulnerabilities and so on
- Lower risk
- Easy to deploy and maintain
- Capture quantitative information about attacks
There are several honeypots for free available on the Internet.
I have had good results with the Nepenthes honeypot (http://nepenthes.carnivore.it/) which is a low interaction solution.
A great resource about honeypots is “Virtual Honeypots” by Niels Provos and Thorsten Holz (ISBN 978-0-321-33632-3)
As described in this book both type of honeypots have advantages and disadvantages.
Malware, what does it do ?
A quick and simple way to determine what type of malware is caught by the honeypot, is running a antivirus scanner against the detected files. Signature based scanning however, doesn’t show what the malware exactly does (or wants to do).
To gather a view of what actions the executed malware would have performed, one can use a sandbox. A sandbox is best classified as a sort of a high interaction honeypot as it does not just emulate a vulnerable service but it executes and track the executed malware as well.
One publicly available sandbox is Anubis that is maintained by the university of Vienna (http://anubis.iseclab.org).
The analysis eventually results in a downloadable report with an in-depth analysis about the uploaded files which presumable is malware.
Another way to demonstrate the working of malware is to visualize it. The people who designed Anubis also make a pcap file available from the complete communication of the malware during the analysis. This file can be loaded into protocol analyzers such as Wireshark (figure 1) or Etherape (figure 2).
Figure 2
Most malware comes in the form of a PE file (Portable Executable). These type of files have everything on board to do what they are designed to do. So, if such a file is executed on a vulnerable machine, you are basically infecting that machine with live malware.
It is advisable to take some protective measures. An infected machine should never be connected outside the lab environment !
Carving malware out of live memory.
This is precisely what I did:
I picked a file from my honeypot which ClamAV (a open source anti virus solution) had identified as a Blaster-A. Nepenthes stores the found malware using the md5-hash as filename. Once renamed to msblast.exe I ran this file to infect an XP machine in my lab. The task-manager clearly showed that msblast.exe (figure 3)was indeed running in RAM.
The blaster worm outbreak started in 2003 and was based on the vulnerability that was patched by the MS03-026 patch from Microsoft. So, running this on a fully patched system doesn’t work anymore.
Getting the memory image
Next thing I did was to fire up msfconsole from the Metasploit framework (www.metasploit.com) on a Ubuntu machine that was wired to the infected XP machine using a utp-cross cable.
I used the weakness that was used to deploy the Conficker (MS08-069) outbreak in 2008 to remotely take over the (already) infected machine. (figure 4)
figure 4
Using the meterpreter as the payload and a ruby-script (memdump.rb / http://www.darkoperator.com/meterpreter/) for this payload, I was able to upload mdd.exe (figure 5). This little program basically dumps the entire RAM and its contents into a file (for the POSIX people: it basically makes a coredump). This file is subsequently downloaded to my Ubuntu machine.
figure 5
Analyzing the memory image
Next I used a file carving utility called foremost (http://foremost.sourceforge.net/).
The following description is from their project page:
“Foremost is a console program to recover files based on their headers, footers, and internal data structures. This process is commonly referred to as data carving. Foremost can work on image files, such as those generated by dd, Safeback, Encase, etc, or directly on a drive. The headers and footers can be specified by a configuration file or you can use command line switches to specify built-in file types. These built-in types look at the data structures of a given file format allowing for a more reliable and faster recovery.“
Foremost automatically carves, identifies and stores files separately into the appropriate folders. Foremost also carves .exe files which is useful for this exercise.
Running ClamAV onto these files I bumped into my good old msblast.exe proving that it is possible to carve out executables out of a snapshot from live memory including malware
Wrap-up
So why using all these different techniques for obtaining malware I already had in my possession in the first place one might ask ?
Well, that’s the entire premise of having a lab to conduct experiments. By deliberately infecting a machine with some malware I have knowledge of, I am able to validate that the results match with my expectations. Therefor I have evidence that my method is a sound one.
Conclusions
This procedure could come in handy in some forensic information gathering situations. The footprint in RAM of mdd.exe is very small. One thing to keep in mind however, is that collecting a memory image this way uses an amount of disk space equal to the amount of internal RAM in the targeted machine.
For this reason, the value of this technique depends on the kind of investigation you are running. In any case, it’s a nice exercise on forensic information gathering. In a penetration test it could come in handy to determine if a machine is infected with a known piece of malware.
In whatever case this technique is used, playing/studying with malware can be fun and highly interesting.
But do realize however that this IS real-live, working, potentially dangerous malware that can do a lot of damage !
Some great sources on a relating techniques:
Collecting memory images by using cold boot attacks
http://www.mcgrewsecurity.com/tools/msramdmp/
http://citp.princeton.edu/memory/
Thanks
I would like to thank the following people for their advise and positive criticism:
Robert Wesley McGrew (Mississippi State University, USA)
Mikael Keri (Handelsbanke CERT, Finland)
Tiel Notenboom (MoD CERT, The Netherlands)
Dennis Lemckert (MoD CERT, The Netherlands)
Andrew Waite (www.infosanity.co.uk UK)
GodertJan van Manen (NorthWave, The Netherlands)







