Hashing is a technique used to transform a variable length input into an irreversible and fixed-sized output which is known as a message digest or hash value. Storing passwords in software systems, ensuring the integrity of messages during communication and creating indexes in databases are some examples where hashing is used. In this blog post, I am discussing what hashing is and the usage of hashing in real world applications. In addition to that, I will be providing a sample java application for hashing, which you can use to learn the concept and also refer it for applying hashing techniques in software applications you develop.
Following diagram simply explains hashing. We have some input (text) value and using a hash function (i.e MD5, SHA-1, SHA-256) we can transform this input into a fixed length output which we call as the hash value. Using this hash value, it is not possible to obtain the original input. That is why we call hashing is irreversible.
If we need to preserve the data in original format for later use, we cannot use hashing. In such cases, we can encrypt the data and keep where we can later decrypt it and obtain the original value.
Some of the examples discussed in this post are taken from the Official (ISC)2 Guide to the CSSLP [1] book.
Hashing for Integrity Verification
Let’s say John wants to send a message to Jessie. Once Jessie receives the message, how can she verify that it is the original message sent by John and not altered by some other party during the communication ?
Here, John can write the message and calculate the hash value of the message (H1) using a hash function (i.e SHA-1). Then he can send the message to Jessie along with the hash value (H1) which we call as the message digest [2]. When Jessie receives the message and the hash value (H1), she can calculate the hash value of the message again using the same hash function as John. Now if the calculated hash value (H2) matches with the hash value (H1) sent by John, it ensures that the original message sent by John is not altered. If it was altered, then the new hash value (H2) would not match with the previous hash value (H1).
Hashing for Password Storage and Authentication
If you develop a software application in which you need to manage user accounts, how are you going to store the passwords of the users ? If you store the passwords in plain text, whoever who has access to the userstore (database, LDAP or Active Directory) where the users are stored, all the passwords would be visible and that person can even login to the system using the credentials of any user which should not be done at any cost. In this case, we can use hashing to overcome this risk.
Here what we can do is, instead of storing plain text password as it is, we can hash the password using a hashing function and store the hashed value. When a user tries to login to the system by providing his plain text password, the system has to hash the password using the same hashing function and compare the hashed value with the value stored in the userstore (i.e database). If they match, then the user is successfully authenticated.
In above diagram, both John and Jessie are having the same password ‘tiger123’. When we hash this password using a particular hashing algorithm, we get the same hashed value. Even with this, the security of the passwords cannot be fully guaranteed. Let’s say John uses a word in the Dictionary as his password. An attacker can prepare a list of hash values for all the words in the dictionary. Then if he has access to the userstore where the hashed passwords are stored, then he can compare the hashed dictionary words with the hashed user passwords and find if any matching record is found. That way, the attacker can get to know the password of John. In order to avoid dictionary attacks for hashed passwords, we can use the technique called Salting.
In Salting, the system can generate a random set of characters which we call as the salt. Then the user’s plaintext password is appended with this salt value. After that, this combined password is sent to the hashing function to obtain the hash value. This value is called as the salted hash. The system has to remember the salt value generated for each user. When a user tries to login to the system, the system can retrieve the salt value of the user, append it to the plaintext password, hash it and compare the salted hash value with the value stored in the system. If they match, then the user is successfully authenticated.
In following example, both John and Jessie have the same password ‘tiger123’. The system generates the salt ‘1234ABC’ for John and ‘9876XYZ’ for Jessie.
After adding the salt to John’s password, we get “tiger1231234ABC” and for Jessie, we get “tiger9876XYZ”. When these values are sent to the hashing function, John and Jessie get different hash values. So with this, we can avoid dictionary attacks.
Hashing for Indexing in Databases
When using hashing in database indexing, we can divide a set of records to different groups which we call as a bucket, based on a key. This key is a hash value. When we need to perform a search, the input is sent to a hashing function which returns a hash value that points to a particular bucket. Then we can get to know the particular records that are associated with the search. For more information, you can follow [3].
Hashing Functions
Some of the most common hash functions are the MD2, MD4, and MD5, which were all designed by Ronald Rivest; the Secure Hash Algorithms family (SHA-0, SHA-1, SHA-and SHA-2) designed by NSA and published by NIST to complement digital signatures, and HAVAL. The Ronald Rivest MD series of algorithms generate a fixed, 128-bit size output and has been proven to be not completely collision free. The SHA-0 and SHA-1 family of hash functions generated a fixed, 160-bit sized output. The SHA-2 family of hash functions includes SHA-224 and SHA-256, which generate a 256-bit sized output and SHA-384 and SHA-512 which generate a 512-bit sized output. HAVAL is distinct in being a hash function that can produce hashes in variable lengths (128 bits - 256 bits). HAVAL is also flexible to let users indicate the number of rounds (3-5) to be used to generate the hash for increased security. As a general rule of thumb, the greater the bit length of the hash value that is supported, the greater the protection that is provided, making cryptanalysis work factor significantly greater. So when designing the software, it is important to consider the bit length of the hash value that is supported [1].
When selecting a hashing function for your systems, it is important to check the strength of the function. For example, MD5 is proven to have collisions [4]. A collision is when two different input values result in getting the same hash value.
Sample Java Application with Hashing Implementation
For demonstrating hashing, I have developed a sample java swing GUI application. The source code of the application can be found at [5] as a maven project. If you do not want to build the application from source but want to try out, you can download the jar file from [6] and run it with “java -jar <jar file name>” command.
Here you can type a password in plain text and select a hashing algorithm and obtain the hash value of the password. For a particular password, the output will be same with a particular hashing algorithm as we do use salting.
In the application, you can use salting by checking the “Salted Password” checkbox. If you select “Auto Generate Salt” option, the application itself will generate a salt value for you.
If not you can uncheck the “Auto Generate Salt” option and provide your own salt value for hashing the password.
You can verify the result obtained from this application using the online tool [7].
References
Tharindu Edirisinghe
Platform Security Team
WSO2