Intro
A hash function is a function that maps an input of any size to a fixed-size hash output. The output is unique for the input and is also called a hash value, digest, or simply hash. Hash functions have the following properties:
- Deterministic
- Arbitrary length of input
- Fixed-sized output
- Efficiently computable (A small change in plaintext leads to the big change in digest.)
Cryptographic hash functions
Hash functions with special properties are called cryptographic hash functions. These special characteristics include:
- Pre-image resistance Pre-image resistance means that given a hash value, it is computationally infeasible to find any input that produces that specific hash value. Essentially, it should be very hard to find such that
- Second pre-image resistance Second pre-image resistance means that given an input and its hash value, it is computationally infeasible to find a different input that produces the same hash value. Essentially, it should be very hard to find different from given such that .
- Collision resistance Collision resistance means that it is computationally infeasible to find two different inputs that produce the same hash value. Essentially, it should be very hard to find any two different values and that . To satisfy this property, hash value should be twice as long as that required for preimage resistance. Also, cryptographic hash function can be considered as a one-way function.
Funny Facts About Collision-Resistance
Actually, collision-resistance does not mean that “there is no collision.” It means that collisions cannot be found in polynomial time. For example, collisions for previously popular hash functions like MD5 and SHA-1 were found in 2007 and 2017, respectively. The image below shows PDF files that have different contents but the same hash values. Therefore, hash functions such as SHA-2 or SHA-3, popular these days, are said to be secure because no collisions have been found yet. There is also a bounty offered for finding hash collisions of SHA-1, SHA-256, etc., in bitcoin script.
Different Types of Hash Functions
- SHA-256 SHA-256 is one of implementations in SHA-2. SHA-256 is used heavily Bitcoin and others.
- KECCAK-256 KECCAK-256 is an implementation of SHA-3. It is used in Ethereum.
Use cases
Data structure
Hash table
A hash function is usually used for a quick search of data from a large database (e.g., Hash table). Hash functions used in this case are not required to be cryptographically secure. However, for efficiency, it is recommended to use hash functions with few collisions.
Merkle tree
A Merkle Tree is a binary tree where each leaf node is a hash of an original value, and each non-leaf node is a hash of its child nodes. The root hash (Merkle Root) represents all transactions in the block. This structure allows inclusion proof and consistency proof.
Network
Message integrity
Message integrity can be verified by comparing hash values before and after the transmission. A cryptographic hash function is needed here because attackers can spoof the original message. In PKI(Public key infrastructure), the sender signs the message and the digest of the message to ensure integrity (Digital signature).
Blockchain
PoW(Proof of Work)
In PoW(Proof of Work),miners compete to solve a cryptographic puzzle, which involves finding a Nonce. The nonce is combined with the block data and passed through a hash function to produce a hash value that meets certain conditions.
Block header
Each block in the blockchain contains a block header and a list of transactions. The block header includes the previous block’s hash, a timestamp, and a nonce. This chaining of blocks ensures data integrity and security.
Transaction ID
A transaction ID in the blockchain is generated by hashing the transaction’s data. This unique hash is used as the identifier for the transaction. It ensures each transaction can be uniquely referenced and verified.