Hey! How are you? Today I'm bringing a challenge for you guys, and for me too. I got pretty interested in Computer firensics due to a work colleague whose main tasks are forensic related, I had no idea about this topic so I started an online course in order to learn about it, and I wanted to share some basic concepts with you.
First of all computer forensics are computer evidence handling methods to obtain information for crime investigation. It basically has three parts:
- Data acquisition: is the secure process to obtain data from the original source without damaging or modifying it. There are several tools to do this, depending on the OS and other details (such as the state of the evidence) but we will explain Linux based tools.
- Data preservation: the acquired digital evidence must be preserved in its original state, using cryptographic hash algorithms.
- Data analysis: Making sense of the data acquired by analysing it and extracting information from it. It has several steps itself, such as identifying partitions, MAC times and others.
When analysed, the responsible is supossed to write a report.
Data acquisition
We could separate the data we need to obtain in two, volatile and not volatile. So, depending on which are we focusing we will use different kind of tools. I f we turn off the suspects machine, we will use volatile data, so if possible, this data should be taken before turning off the machine. If the suspect has installed rootkits to destroy evidence upon receiving graceful shutdown command, there's a posibility of loosing important content. Ah! I forgot to mention the chain of custody. The chain of custody is a record of how evidence has been handled, for further reporting it. Now, the chain of custody starts as the evidence handling starts.
Before we start we should prepare the tools:
- A bootable live CD, as suspects tools are not to be trusted.
- A powerful machine for the investigator
We should first acquire the data that is most volatile, as it's constantly changing. We are actually changing it while collecting data so we should try to leave the least of a footprint as possible. Is a difficult task, as normal tools we would use are not an option, for example using CP will modify the original file's access time. So, what can we use? Let's take a look at some options:
$ lsof
This tool will list all open files that belong to any active processes. We can also use it with different options such as lsof -i IP_adress
that will list will list the Internet connections belonging to the given IP address. But, ha! you can also find lsof
to find malicious processes that use hidden disc spaces!
$ nc
Netcat /ᐠ。_。ᐟ\ will read from and write to network connections using TCP or UDP. It could be used to transfer or to retrieve the data to a forensic workstation.
Other interesting things to look at: uname -a
, ifconfig
, date
, uptime
. All of this should be redirected to the USB for collecting evidence. Oh! we can check if the attacker is still connected somehow using w
.
For memory acquisition we can use lime open tool, a Loadable Kernel Module (LKM) which allows for volatile memory acquisition from Linux.
Now, moving to non-volatile data we are going to use dd
. This tools comes with most of the Linux systems. For example, for preparing the forensic disk, and wipe the drive /dev/hda with all zeros we could use dd if=/dev/zero of=example
. Let's get evidence. Remember we said using cp
is not an option? we can copy the evidence using dd, as follows dd if=stuff of=evidence/stuff.dd
. If we use dd
with netcat, first we set the listener and then send it data.
Here, listener terminal:
$ nc -l 8888 > nc_info
Here, transmiter terminal:
$ dd if=stuff | nc localhost 8888
Data preservation
Forensics uses cryptographic hash algorithms to preserve evidence. A forensic should prove that the evidence is the same as the original source, for that, they only need to calculate their hashes. If the hashes are same, the two images have to be the same.
For this we can use MD5 or SHA. Let's explore the md5sum
command. We are creating a file containing "hello there" string called ex_file, then cat it to see the original content and then calculate the hash.
$ echo "hello there" > ex_file
$ cat ex_file
hello there
$ md5sum ex_file
2d01d5d9c24034d54fe4fba0ede5182d ex_file
As you guys can see it's pretty simple. If we modify the file, saying, adding "hiya" to the string, the hash will change.
$ echo "hiya" >> ex_file
$ cat ex_file
hello there
hiya
$ md5sum ex_file
ddfdaf6c131be9a522038488f6823537 ex_file
See? different. On the other hand if we have two files with the same content BUT different name...
$ echo "hello world" > file1
$ echo "hello world" > file2
$ md5sum file1
6f5902ac237024bdd0c176cb93063dc4 file1
$ md5sum file2
6f5902ac237024bdd0c176cb93063dc4 file2
Same hash. This is because changing metadata wont change the hash. Another example of this is changing permissions.
$ chmod g-r file2
$ md5sum file2
6f5902ac237024bdd0c176cb93063dc4 file2
Data analysis
Now, now, getting useful information of the data is an important task. We are working in the copy, of course. A nice way to start is identifying partitions using fdisk our_device
. afterwards we can use dd
to carve interestig partitions. We can use mmls
of Sleuthkit in order to get partition information once we got our dd file, for example using mmls our_file.dd
. We can also specify de media type using -t
option, as in mmls -t dos our_file.dd
. Mounting is important too, for that we will use mount
, of course. Now, be careful because for the shake of the investigation, it should be a read-only mounting. The forensic has to prove all the time tha data has not changed.
mount –o ro,loop /my_file.dd/mnt/example
Now, MAC times. This is useful, as this kind of information give us timing of different kind of actions, such as create or modify files. This should be run before anything else as this information is so sensitive to change. We are using a combination of fls
command and mactime
for this.
$ fls -f ext3 -m "/" -r images/root.dd > data/body
$ mactime -b filename [time]
M indicates that displayed date time is the last modification time; A indicates that the displayed date and the time is the last access time; C indicates the last inode changing time.
There are many other things to look at in an investigation, logs for example. But let's leave it here!
I hope you enjoyed! It's so interesting to know better how computer forensic investigations are made... Now you can go and detect lies in CSI series. :p
Top comments (11)
Hi, this is really interesting, I had never thought of such use for netcat. Anyway, I just want to point out that MD5 hashes are no longer safe. You can make that two totally different files have the same MD5 hash in a pretty trivial way. You can see more here if you wish.
exploit-db.com/docs/english/46047-...
It is wise nowadays use 2 or more hash algorithms, although SHA256 is strong today no one knows in a few years, as shattered.io/ demonstrate on SHA1.
Using 2 or more it get way more troublesome to generate the same hash even on 2 not safe anymore algorithms.
thank you for the advise!
Thanks Paula, nicely written and a good top-down start into forensics!
For those interested in learning more, I recommend the Forensics Wiki: forensicswiki.org/wiki/Main_Page which covers more interesting ways of imaging both volatile and persistent storage :)
yay! thanks
For data acquisition i recommend a forensic specific linux livecd, like caine-live.net/ or deftlinux.net/
Because on default settings linux distro usually don't mount storage as read-only, which it is a must on data acquisition.
Yep good start to this topic. This is what I studied in school. Great read!
your work about digital forensics is really good and very clear
thank you!
Thanks Paula
how secure is it to transfer the forensic data via network? couldn’t the network stack of the attacked machine be compromised, too? (e.g. send a copy of the data to the attacker?)