Hashing is a technique in Competitive Programming which is mostly used with string. In many cases, using hash can relieve you from doing other complex data structures or complicated dynamic programming optimization.
A drawback of hashing is that: you are not completely sure that your solution will work on all cases. If you are really really unlucky, your solution may fail one test-case and receive Wrong Answer. But the chance of failing is very low, and you can even lower that chance by using multiple hashes. So, in the scope of a programming contest, hashing is acceptable and can give you generous advantages. When applicable, hashing often helps you solve problems with a shorter time compared to the official solution.
It’s about 90% when we use Hash that we only care about lower-case English letters (not upper-case letters, digits nor other symbols). So in this post, I will assume that the string we hash contains only lower-case letters.
Hash function
There are several useful hash-functions. I will introduce in this post the most common one. It is common because it is simple and versatile. Let’s see:
Given a string s contains only lower-case English letters. As the length of s is s.length, we have:
s = s[0..(s.length-1)]
What we will do is to calculate an array, let’s name it ‘hash’, which has the same size as s, and hash[i] is a number that represents the substring s[0..i]. What I mean by “represents” is: when we compare 2 strings (or 2 substrings), we don’t need to loop through every character of the 2 strings (or substrings) to check if they are equal. Instead, we can compare their hash values. If their hash values are different, we are 100% sure that the 2 strings (or substrings) are different. If their hash values are equal, then the 2 strings or (substrings) are probably equal. 2 identical strings will have the same hash value. 2 different strings usually have different hash values, but there is very little chance that they can have the same hash value.
Back to the case of string s. Because s contains only lower-case letters (from a-z), so it can contain only 26 different characters. Let’s encode a equals to 1, b equals to 2, etc , z equals to 26. We will obtain array hash as follow:
hash[0] = s[0]
hash[1] = hash[0] +
hash[2] = hash[1] +
hash[3] = hash[2] +
…
hash[s.length – 1] = hash[s.length – 2] + s[s.length – 1]
For example, let s = “apple”. We have the array hash as:
(as ‘a’ = 1)
Finally, hash = [1, 417, 11233, 222145, 2597025].
We can model this process on a table:
Index | 0 | 1 | 2 | 3 | 4 |
Character | a | p | p | l | e |
Encode char to int | 1 | 16 | 16 | 12 | 5 |
Multiplier at index | 1 | 26 | |||
Hash | 1 | 417 | 11233 | 222145 | 2597025 |
As we said above, hash[i] is a representation of s[0..i]. So for example, the hash value of ‘app’ is 11233, the hash value of ‘appl’ is 222145, etc.
But what is the hash value of ‘ppl’ (‘ppl’ is s[1..3])? It turns out that we can easily compute it using the hash array.
Let’s recall, normally, we will compute the hash value of ‘ppl’ as:
Hash value of ‘ppl’ . Or:
Index | 0 | 1 | 2 |
Character | p | p | l |
Encode char to int | 16 | 16 | 12 |
Multiplier at index | 1 | 26 | |
Hash | 16 | 432 | 8544 |
But wait, we already had the hash array of string ‘apple’. And the table of ‘apple’ and ‘ppl’ have something very similar. This is your exercise: let’s compare the 2 tables and try to obtain the hash value of ‘ppl’ using the table of ‘apple’!
…
…
And here is the result: On the table of ‘apple’, first we will remove the index 4 because the character ‘e’ has nothing to do with ‘ppl’. Then, we also remove the index 0, because ‘a’ is also not in ‘ppl’. Then the table becomes:
Index | 0 | 1 | 2 | 3 | 4 |
Character | p | p | l | ||
Encode char to int | 16 | 16 | 12 | ||
Multiplier at index | 1 | 26 | |||
Hash | 416 | 11232 | 222144 |
Ok, we have just the string ‘ppl’ on the table now. But we have to do one more thing: shift the string ‘ppl’ to the left. Because as in the table of ‘ppl’, its characters start from index position 0.
By shifting to the left 1 unit, we just have to divide the hash value for 26. That is divide 222144 by 26, and we will get 8544, which is the hash value of ‘ppl’.
Formally, we have:
hash value of
This computation is done in just O(1), given we pre-calculated array hash and for all i from 0 to s.length – 1.
MOD value
As you might have noticed, when we do as above, if the length of string s is longer, values in our array hash will be overflowed, even if we use long long type. That’s why we need a MOD value. That is, in each computation above, we always take the result mod the MOD value.
E.g.:
, for
hash value of
modular-inverse()) .
(If you don’t know about modular-inverse, just search for “Fermat’s Little Theorem and Modular Inverse” on the internet.)
This will prevent our values from overflowing.
My code for Hash function is below.
struct HashFunction { string s; ll *hash; ll *modular_inverse; ll *pexp; ll p, HMOD; ll mul_mod(ll x, ll y, ll MOD) { if (MOD < (1<<31)) { if (x > MOD) x %= MOD; if (y > MOD) y %= MOD; return (x * y) % MOD; } else { ll res = 0, tmp = x%MOD; while (y > 0) { if (y&1) res = (res + tmp) % MOD; y >>= 1; tmp = (tmp + tmp) % MOD; } return res; } } ll pow_mod(ll x, ll exp, ll MOD) { if (exp == 0) return 1; if (x > MOD) x %= MOD; ll half = pow_mod(x, exp >> 1, MOD); if (exp & 1) { return mul_mod(mul_mod(half, half, MOD), x, MOD); } else { return mul_mod(half, half, MOD); } } HashFunction(string s_, ll p_ = 26, ll HMOD_ = 1000000007) : s(s_), p(p_), HMOD(HMOD_) { // compute hashing array and modular-inverse hash = new ll[s.size()]; modular_inverse = new ll[s.size()]; ll seed = 1; hash[0] = s[0] - '`'; modular_inverse[0] = 1; pexp[0] = 1; for (int i = 1; i < s.size(); ++i) { seed = (seed * p) % HMOD; hash[i] = (hash[i-1] + (s[i] - '`')*seed) % HMOD; modular_inverse[i] = pow_mod(seed, HMOD-2, HMOD); pexp[i] = pexp[i-1] * p % MOD; } } ll getHash(int i, int j) { // get hash of s[i..j] if (i == 0) return hash[j]; else return ((hash[j] - hash[i-1] + HMOD) * modular_inverse[i]) % HMOD; } ll isEqualSubstring(int i, int j, int u, int v) { // compare s[i..j] and s[u..v] if (i > u) { swap(i, u); swap(j, v); } ll hashFirst, hashSecond; if (i == 0) hashFirst = hash[j]; else hashFirst = ((hash[j] - hash[i-1] + HMOD) % HMOD); if (u == 0) hashSecond = hash[v]; else hashSecond = (hash[v] - hash[u-1] + HMOD) % HMOD; return (hashFirst * pexp[u-i] % HMOD) == hashSecond ; } }; // Example: // string s = "onetwothreetwoone"; // HashFunction hash(s); // cout << hash.isEqualSubstring(0, 2, 14, 16) << endl; // cout << hash.isEqualSubstring(0, 2, 13, 15) << endl;
Practice problems