OP Tanjawi: Forensic Techniques on Fire - Forensic Analysis to VirtualBox

Hi, minions:

This is another one of those works stuck in a corner of the pending publication... and whose beginning goes back to the first months of the year 2018... In fact, the original name of the project was 'OP Carrot', in honor to the family of 'Follow the White Rabbit', a band of 'dangerous hackers' to whom I owe a lot. A big hug from this part of the screen, friends!

I tried to find the right moment to present it, but some circumstances extended it for a long time, until one day I decided to present it to the first C1b3rwall Congress CFP, together with Eduardo Sánchez Toril, in a design that presented a 'redteam' assumption combined with another 'blueteam' one, resulting in winners and having the opportunity to present this work on stage.


A good friend says, "If you can write about something, you have understood it. And, if you've understood it, you're able to tell it. Therefore, if you are able to tell it, you can make others understand it. That's how you make the chain of knowledge."

This paper is about the forensic analysis of virtual machines, mounted on VirtualBox.

Increasingly, we men of law are using virtualization. Virtualization is a technology that has many advantages when it comes to conducting research. There is no doubt about it. But these advantages can also become major drawbacks when someone is confronted with them, because the 'bad guys' are also increasingly using this virtualization technology to carry out their misdeeds.

First of all, I promise not to expand on this explanation, we must know what virtualization is, what it is for, what it does and all those definitions that are necessary to read and understand. There is a lot of material published in relation to this concept. You can read great articles about virtualization. But I have chosen to use a couple of short articles published by Microsoft for this purpose:

What is virtualization?

"Virtualization creates a simulated, or virtual, computing environment instead of a physical one. It often includes computer-generated versions of hardware, operating systems, storage devices, etc."

What is a virtual machine?

"A virtual machine is a PC file, usually called an image, that behaves just like a real computer. In other words, it's creating a computer within a computer."

Getting back to the work in question... I originally worked on a Windows 7 System, on which I ran, with VirtualBox, a series of virtual machines, also on Windows 7. When I took it up again for the second time, I worked on a Windows 10 System, on which I ran, with VirtualBox, a series of virtual machines, on Windows 7. From the tests carried out on one system and another, a series of results were produced with some slight differences. All this, on a SATA hard disk, mechanical, 80 GB. The objective I set when I started this project was to extract, or locate, the content of the virtual machines using a series of assumptions, which I will mention below.

I have to clarify that the following story is purely fictional. Which is a bit of a short piece of literature.

In this story, a police operation, which I have called 'OP Tanjawi', against a crime of exaltation of Salafist Jihadist terrorism, indoctrination and self-indoctrination, is exploited, resulting in the arrest of an individual and the seizure of various digital materials, including multiple hard disks with virtual machines and various, very diverse, USB storage devices.

A word of advice: Never, ever, ever discard any object, however ridiculous it may seem. (You'd be surprised).


We are proceeding with the exploitation of an operation called 'OP Tanjawi', in which we have, as the main person under investigation, a person who is carrying out work of exaltation of Salafist Jihadist terrorism and is carrying out the self-indoctrination and indoctrination of third persons. We know, through trusted external collaborators, (yes, a snitch), that he is carrying out these tasks by means of USB devices, which he facilitates to these third persons, inserting them in his computer, inside virtual machines, for their later viewing, both alone and in the company of others. We know that the person investigated has knowledge about computer security, so we must act quickly and cautiously, (as always).

We gained access to the house by breaking down the door with the appropriate operating Computer... and we observed that the person under investigation is making use of your system...

- "Boss, the Computer's on, what do we do?" -
- "Value!!" –

Recreated scenarios

Several possible scenarios have been recreated using a virtualized Windows 7 system under VirtualBox, on a host computer with a Windows 10 system. As mentioned above, the computer has a mechanical 80 GB SATA hard disk.

1.   Virtual machine started
a.    Non-Encrypted
b.   Encrypted
2.   Virtual machine stop
a.    Non-Encrypted
b.   Encrypted
3.   Virtual machine eliminated
a.    Non-Encrypted
b.   Encrypted
4.   Virtual machine eliminated and fragmented
a.    Non-Encrypted
b.   Encrypted

If that were not enough, within these scenarios, I proposed others related to the way to proceed with the team:

1.   Proceed with the collection of the volatile evidence, with the physical Computer on.
2.   Carry out the normal shutdown of the physical Computer.
3.   Disconnect the physical Computer from the electrical grid.

(For each scenario, not counting the original, a corresponding forensic image of the hard drive was created. As far as you can imagine, if you do the math, the number of hours that hard drive had to suffer... and myself).

With regard to the actions carried out in the system, we have proceeded to

1.   The insertion of a USB device in the virtual machine
2.   Copying two files, from the USB device to a folder in the virtualized system
3.   Removing the USB device from the system
4.   The display of an image file, named 'Crushing the enemy.jpg
5.   The display of a video file, named 'Inside Khilfah.mp4
6.   Other content available within the USB device, which has not been explored.

Given the volume of data I have worked with, I am not going to present all the results obtained. I'll only present those that I think are of interest. Otherwise it could be a much more extensive article than is usually the case.

To know...

Windows has anti-forensic sides. Windows writes files, constantly, to the hard drive. On systems with Windows 7 this happens less than on systems with Windows 10 because Windows 10 works with more than twice as many processes as Windows 7.

By default, Windows has a number of automatic maintenance tasks scheduled. Very dangerous are the tasks of 'ScheduledDefrag', which takes care of the defragmentation of the hard disk, (only in the mechanical disks), and 'SilentCleanup', which takes care of the silent cleaning of the disk, when the disk space is scarce. Automatic maintenance can be avoided by disconnecting the power supply to the device if it has a battery.

If the Computer is off, it should not be turned on. There should be no doubt in that case. If the computer is turned on, you must proceed to capture the RAM memory, search for encrypted drives, ... Always? Memory is a mine of information, very volatile, but always?

I'll leave that question in the air for now.

In this case, we are going to use the Volatility tool to perform the analysis of the RAM memory. As always, each case will require the use of a series of plugins, or others. We are going to use the following ones:


We are also going to perform a file recovery over the RAM, using the Foremost tool.

Once the RAM memory has been dumped, it is necessary to proceed with the creation of images, (images is not just an image), forensics of the hard disk.

With those forensic images of the hard disk we are going to perform a file extraction of interest with the SleuthKit tool, which is a framework with a set of small utilities that can process disk images. SleuthKit can extract both non-delete and delete files.

Since we are going to encounter encrypted virtual machines, we are going to apply brute force to this machine, thanks to the invaluable help of Guillermo Román Ferrero, in order to extract its encryption key.

We're going to be looking for virtual hard drives hosted within the hard drive itself, within the forensic image itself.

We're going to search for specific, raw content by running that search through the entire forensic image of the disk for a string of text that we're going to specify.

And the purpose of this whole procedure is the detection of illicit content, of indications of the same, or of possible evidence of elimination of evidence.

Memory analysis with Volatility, (mftparser)

Before starting the actual analysis of the RAM dump, it must be correctly identified. This process is carried out by means of the kdbgscan plugin, paying attention to the values returned by the 'Build string (NtBuildLab)' and 'KdCopyDataBlock' fields, and the 'PsActiveProcessHead' field must contain a non-zero value.

We could use the timeliner plugin to make a timeline, where we could see the execution of applications, the hours of device connections to the system and some other data. But timeliner only shows information about the host computer.

If we consider that everything, in the NTFS file system, is a file. And if we consider that every file, in the NTFS file system, is recorded in the MFT. If we add to this fact that all the MFTs of any volume mounted in the system are loaded in the memory, we find that we can extract content from any connected device, even if it has not been explored.

In the following image we can see 3 different MFT. We find the MFT corresponding to volume 1 and volume 2. And we can appreciate the existence of an MFT that corresponds to the 'HarddiskVolume209' and that belongs to a virtual machine.

Therefore we are going to make a timeline on the MFT(s).

What is MFT? MFT is the Master File Table. It could be said, in a very summarized way, that it is an index that is going to indicate in which part, or parts, of the disk a certain file is lodged and it is going to show us, besides other many data, all the metadata of that file. All this information is stored in the records of the MFT. One record for each file.
Si ejecutamos:

     mftparser --output=body --output-file MFTParser.txt -D MFT
     mactime -b MFTParser.txt >> MFTParser.csv

We will get a timeline of the MFT(s), with a tabbed '.csv' file, and we will also download all the content resident in them. Because it is possible to find resident data in the MFT. Files that do not occupy disk space.

Si ejecutamos:

     mftparser --output-file=/mnt/c/Results/mftparser.txt

We will get a detailed view of all the content in all the MFT, running applications, user names, paths of the different virtual machines, the name of the virtual machines themselves, the time stamps of the virtual machines, content stored in the virtual machines, ...

In this way, and as an example, we can see that the user 'MC' has opened an image file, because we can get the recent documents.

Or we can see that a video with the name 'Inside Khilfah.mp4' is hosted inside the 'Downloads' folder of the user 'MC'.

We can even get the unscanned content from the USB device itself that we have used to recreate the scenarios, by getting folder names and file names, with their corresponding timestamps.

All this content is that belonging to a virtual machine. In short, we can see any content of any device, of any volume, that is or has been found in the system, whether it has been opened and explored or not.

Memory analysis with Volatility, (dumpfiles)

RAM caches, stores, all files that the user opens, but also caches files that have not been opened by the user. This content is stored temporarily. So we can search which files are loaded in memory, using 'filescan'.

Just because a file is loaded into memory does not mean that it can be extracted. Remember that memory is very volatile and is constantly changing.

We can use 'dumpfiles' to download files that are cached in memory to extract, for example, virtual machine logs and virtual machine configuration files.

If we execute:

     dumpfiles -n -S Summary.txt -D Dumpfiles

We're going to download all the files that are cached, stored, in memory. In this case, we can find, for example, the configuration file of the hard disk of the virtual machine, 'Win7x64.vmdk'.

Memory analysis with Volatility, (vaddump)

We can list the system processes using 'pslist' to see a list of running applications, with their position in memory, their name, their ID, their parent process, ...

And with that list of processes we also get the start and exit date and time of each process, in case they are not active at the time of acquisition.

If the processes related to VirtualBox are running, we download the memory pages of those processes, using 'vaddump':

     vaddump -p (ID Process) -D Vaddump/

And if the processes related to VirtualBox are not running, we are going to download the memory pages of the other processes as well. Why?

Because processes are assigned a private memory space, but they are also assigned a memory space that is shared with other processes, and thus the same content can be found in either of those two memory workspaces.

In this case we will extract, through the memory pages, the content of the virtual machine, indicating its position in memory and the process to which it corresponds.

In this way, for example, we can see several times the two files hosted in the virtual machine and we can see that they have been opened, even with their display date.

We can see the contents of a folder on the USB device.

We can see that a video has been played, being able to determine the system name of the virtual machine and the user who has played it.

We can even see and extract the configuration file '.vbox' from a virtual machine, with its corresponding encryption key. 

This file can be reconstructed and can be decrypted.

Recovery of files in memory, (Foremost)

File recovery is something "easy" to do and does not require "too much time". With carving we recover files through their headers, footers and data structures.

Both complete files and file fragments can be recovered and extracted. We can obtain videos, or frames from videos, images, documents, ... any extension we indicate.

As a general rule, carving is usually done on the created forensic images themselves, whether they are from memory or from disk. But this time we are going to do it on the directory of the memory pages that we have downloaded before. Why? Two reasons.

Firstly, because if we carve the image of the memory dump, we can obtain results that disconcert us, so to speak. For example, if we have inserted a USB device in the computer to proceed with the corresponding RAM memory capture, and we have, besides the memory capture tool itself, other tools, we are going to obtain, for example, images or fragments of those other tools. In other words, we are contaminating the evidence.

If we are 'twisted', we could have lodged in our USB device, device with which we carry out the memory capture, another type of illicit content so that, when the carving is carried out on the memory dump, that content appears in the results obtained. That is, we could involve a person, without being aware of it. And we are not twisted. We seek the truth.

Secondly, we are going to carry out the carving on the memory pages because, if we carry out it on the memory dump, we may not be able to determine its origin, while if we carry out it on the memory pages we can determine that it belongs to a certain process, a virtual machine, and we can determine that it has been visualized and opened in a time band, between the beginning and the exit of the process related to the virtual machine.

That is, if we proceed with the carving on the directory of the memory pages, we can determine the source of that file.

In this case, we could compare the results obtained with a database, if available, with other original content. Or we could pay special attention to symbolism. A little OSINT never hurt anyone.

Hard disk image analysis, (Virtual machine not deleted)

It has already been mentioned above that RAM is a gold mine, as far as the information that can be found in it is concerned. And it has also been mentioned that Windows has anti-forensic sides, such as maintenance tasks or the very fact that Windows constantly writes files to the hard disk. The latter is very harmful for forensic analysis, in terms of recovering and extracting deleted content from the system.

You should know that, a virtual machine, has logs, that it has configuration files, that it has at least one hard disk file and that it can be encrypted.

On the other hand, you should know that, a hard disk file is a file, it has a header and it can be fragmented. Likewise, a disk file behaves like a hard disk, has a header and contains information.

For all these reasons, time is a very critical factor and we must know how to act at all times.

This way, if the virtual machine has not been removed and is not encrypted, we can mount the forensic image of the hard disk, then mount the virtual machine's hard disk and we can scan and extract the content hosted in that virtual machine.

If the virtual machine has not been deleted and is encrypted, we can mount the forensic image of the hard disk, extract the configuration files and apply brute force, with the tool 'vboxDieCracker-py', on the file '.vbox', which is the one that stores the encryption key. Once the encryption key has been extracted from the virtual machine, we can make a copy of the virtual machine, configure it in our laboratory and remove its encryption to analyze it like any other system.

Hard disk image analysis, (Virtual Machine deleted)

If the virtual machine has been deleted, the record for that file is marked as 'Deleted' in the MFT, but its contents, its clusters, remain intact until its information is overwritten.

When Windows creates a file, it assigns the lowest record number in the MFT and assigns the first clusters that are not associated with any record in the MFT as well.

When does Windows create a file? Simply place the mouse cursor over an icon to do so. For this reason, you only need to touch the system for basic and essential tasks.

The file on the virtual machine's hard disk may remain intact in the forensic image and could be seen in the scan of the same image, provided some conditions are met. Namely, that the system has not started maintenance tasks, that the system is quickly disconnected, when the virtual machine has been removed and if no clusters assigned to that virtual machine hard disk file have been overwritten.

There are several options for retrieving that content, that virtual machine. We're going to use a very complete and powerful framework, called The SleuthKit.

We can use 'fls' to list the MFT entries and then, using 'icat', we can extract the content we are interested in.

Hard disk image analysis, (Virtual Machine deleted and fragmented)

If the virtual machine has been removed and the clusters have been overwritten... and the virtual machine was not encrypted, we can search for hard drives within the forensic image of the hard drive, using:

     xxd -c 256 imagen.dd | grep -i -P “00(([0-9A-F]8)|([1-9A-F]0))\s0000\s([0-9A-F]{4}\s){26}55aa\s\s”

That line shows us the boot offset of each of the hard disks, both physical and virtual, that are in the forensic image of a hard disk. In this case, we have the physical disk in first position, (at offset 100), and 3 virtual disks.

With this data we can manually extract the content from the offset that interests us, from a position on the hard disk, and proceed to its subsequent study.

Similarly, if the virtual machine has been removed and the clusters have been overwritten... and the machine was not encrypted, we can search for the content we want and sometimes extract it.

I like to think that we all have a list of key words for our investigations.

     blkls -e -f ntfs -o 1126400 imagen.dd | egrep -abi ‘Aplastando|Khilfah|Enemigo’

That line searches the hard disk forensic image for the words 'Crushing|Khilfah|Enemy' and what we see in this case are file names, paths and computer and user names, for the words I used in the search.

The search can be carried out both for file names and for the content itself.

In any case, the result will take us to a location, which consists of the exact byte where the content we have searched for is located. What do we do now?

We take paper, pencil and calculator... because first we are going to determine the start offset of the partitions in the forensic image of the hard disk.

To do this we use 'mmls' on the image of the hard disk.

Then we have to determine the size of the sector and the size of the file system cluster. Operation we can do with 'fsstat'.

In this case, we get that the sector size is 512 bytes and the cluster size is 4096 bytes, that is, 8 sectors. We write down all this data, because we have a content of interest in byte number 764,020,177.

If a cluster is equivalent to 8 sectors and if a sector is 512 bytes, we can look for the cluster corresponding to that content with a small formula, which consists of dividing the number of bytes by 4096. (512 bytes per sector and 8 sectors per cluster).

     echo “764020177/(8*512)” | bc

This small formula shows us that the cluster containing the information we are interested in is number 186528.

But this function does not return decimal values... Be very careful with the decimals!

Let's use a calculator because if the decimal of the result is higher than 0.5, one more cluster must be added to the result. For example, in this case we have that 764,020,177 divided by 4096 equals 186,528.36. If instead of the decimal a value of 0.36 were 0.56, the cluster would not be the number 186,528, but 186,529.

Through 'blkstat', we can determine if the content of that cluster is associated to any file or is not associated to any file, as it returns the values of 'Not allocated' or 'Allocated'.

If the content of a cluster is associated with any file, we can search for that file with the value of the cluster.

We can use 'ifind' to see if that content is related to the corresponding metadata in the MFT. ‘ifind' can produce two results: 'Inode not found', which means that it is not associated with any record in the MFT, and a number, which corresponds to a record in a file record in the MFT.

If we get that registration number from the MFT, we can use 'ffind' to determine the name of that file.

And with the MFT record number, we can use 'icat' to extract that file.


  1. Think fast and act smart!!
  2. Time is a critical factor. One should not act without a plan. You have to work with foresight, with the possible assumptions that we may encounter.
  3. Nothing must be connected that is not absolutely necessary for the correct extraction of evidence.
  4. A tool for every device. We don't want to contaminate anything that could mess up all the work of an investigation.
  5. If we come across Computer that has a battery, we can disconnect it than the electrical current to prevent the system from entering the maintenance phase.
  6. The acquisition of RAM memory is vital. Always?
  7. Any action leaves a trace and the absence of evidence is evidence that something has happened, so we should look for what has happened.
  8. Basic, in any case, knowing what to look for, because if we don't know what we are looking for, we can't know where to look and we can't know how to look.
  9. Use your brain and reason.

Tools used


That's all.


No hay comentarios:

Publicar un comentario