What is it?
Hashing is a method of determining the equivalence of two chunks of data. A cryptographic hash function generates a unique string for any set of data. Examples of these data could be files, strings, streams, and any other items that can be represented in binary format.
You've probably seen a hash string on the downloads page of some of your favorite tools, packages, or libraries. For example, Kali Linux has one for each of its releases. But why is that?
This is to ensure that the original file on their server is the same as the one that you've downloaded. For example, the SHA-256 hash of the Kali ISO is below.
If you download the file, you should hash your local copy. If the resulting hash is equivalent to the one found on their website, you can rest assured that the file has not been tampered with during the download and that you have the same, correct file.
Wait...but how do you hash stuff?
Excellent question. Let's get technical! I'm assuming you have Python 2 installed, by the way.
1- Let's import the library we need.
import hashlib as hash
2- Now let's choose our hashing algorithm. For more information on their differences, check this out.
sha = hash.sha256()
3- We're basically set up, now we'll go ahead test the function on a string.
# Insert the string we want to hash
sha.update('Hello World!')
# Print the hexadecimal format of the binary hash we just created
print sha.hexdigest()
""" 4d3cf15aa67c88742e63918825f3c80f203f2bd59f399c81be4705a095c9fa0e """
Awesome, there's a SHA-256 hash of the string "Hello World!". Now we'll prove that the hash is different for similar data.
# Note the missing '!'
sha.update('Hello World')
print sha.hexdigest()
""" a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e """
It's totally different.
4- Now that we know that our function works, let's try it on a file
# WARNING: Do NOT do this with large files.
# For large files, see the snippet here -> https://gist.github.com/aunyks/042c2798383f016939c40aa1be4f4aaf
with open('kali.iso', 'rb') as kali_file:
file_buffer = kali_file.read()
sha.update(file_buffer)
print sha.hexdigest()
""" 1d90432e6d5c6f40dfe9589d9d0450a53b0add9a55f71371d601a5d454fa0431 """
There we go. You've got some pretty good knowledge of hashing now. So, go. Go on! Secure the integrity of your data and hash all the things!
Also, follow me on Twitter and Github, please.
Top comments (8)
Just a quick point, but it's important. As the hash can't be 100% guaranteed to be unique (it's just highly likely to be unique) it can only be used to determine if something is different, not to see if two things are the same (although the typical mis-use is to compare for similarities). Given the hash space of Sha1, it's fairly unlikely there will be a hash collision, but not impossible (just look to the recent issues on WebKits SVN repository caused by hash collisions). Like I said, it's a small point, but an important one nonetheless.
Hashes are also used for passwords, which are the epitome of "hey, make sure that the hash is unique to only one password!"
It's important to know that there are special hash algorithms for passwords that are specifically made for shorter strings (rather than files), and take a relatively long time to compute (so that it's harder to brute-force them).
First two lines more mislead than explain, and as Ashley Sheridan pointed out they are not completely correct.
Thought it was an article about Data Structures . Nice One . But try to write something more .
Sweet and Simple.
This is a great, simple article!
what abount salting?