String Hashing¶
Polynomial rolling hash maps a string to an integer so you can compare substrings in \(O(1)\) after \(O(n)\) preprocessing—subject to collision risk, so use two moduli or compare with a direct check when hashes match.
Polynomial hash¶
For base \(B\) and modulus \(M\), map characters to \([1, \sigma]\) and define:
\[
H(s_0,\ldots,s_{k-1}) = \left(\sum_{i=0}^{k-1} s_i \cdot B^{k-1-i}\right) \bmod M
\]
Rolling: remove left character, add right character, using precomputed powers of \(B\).
#include <string>
#include <vector>
using namespace std;
// pref[i+1] = hash of s[0..i]; powB[i] = B^i mod mod
void buildRollingHash(const string& s, long long B, long long mod,
vector<long long>& powB, vector<long long>& pref) {
int n = (int)s.size();
powB.assign(n + 1, 1);
for (int i = 1; i <= n; i++) powB[i] = powB[i - 1] * B % mod;
pref.assign(n + 1, 0);
for (int i = 0; i < n; i++)
pref[i + 1] = (pref[i] * B + (s[i] - 'a' + 1)) % mod;
}
// inclusive 0-based [l, r]
long long substringHash(const vector<long long>& pref, const vector<long long>& powB,
long long mod, int l, int r) {
return (pref[r + 1] - pref[l] * powB[r - l + 1] % mod + mod) % mod;
}
Use 64-bit unsigned long long with large random \(B\) for speed (still probabilistic), or two long long moduli for stronger guarantees.
Collisions¶
Different strings can share a hash. Mitigations: double hashing (two \((B, M)\) pairs), verify with memcmp when hashes match, or use KMP/Z for certainty on a single pattern.
Related¶
- KMP / Z Algorithm — exact linear matching without hashing
- Bitmask Tricks — when alphabet is tiny (bitmasks per character)