Skip to content

String Hashing

Polynomial rolling hash maps a string to an integer so you can compare substrings in \(O(1)\) after \(O(n)\) preprocessing—subject to collision risk, so use two moduli or compare with a direct check when hashes match.

Polynomial hash

For base \(B\) and modulus \(M\), map characters to \([1, \sigma]\) and define:

\[ H(s_0,\ldots,s_{k-1}) = \left(\sum_{i=0}^{k-1} s_i \cdot B^{k-1-i}\right) \bmod M \]

Rolling: remove left character, add right character, using precomputed powers of \(B\).

#include <string>
#include <vector>
using namespace std;

// pref[i+1] = hash of s[0..i]; powB[i] = B^i mod mod
void buildRollingHash(const string& s, long long B, long long mod,
                      vector<long long>& powB, vector<long long>& pref) {
    int n = (int)s.size();
    powB.assign(n + 1, 1);
    for (int i = 1; i <= n; i++) powB[i] = powB[i - 1] * B % mod;
    pref.assign(n + 1, 0);
    for (int i = 0; i < n; i++)
        pref[i + 1] = (pref[i] * B + (s[i] - 'a' + 1)) % mod;
}

// inclusive 0-based [l, r]
long long substringHash(const vector<long long>& pref, const vector<long long>& powB,
                        long long mod, int l, int r) {
    return (pref[r + 1] - pref[l] * powB[r - l + 1] % mod + mod) % mod;
}

Use 64-bit unsigned long long with large random \(B\) for speed (still probabilistic), or two long long moduli for stronger guarantees.

Collisions

Different strings can share a hash. Mitigations: double hashing (two \((B, M)\) pairs), verify with memcmp when hashes match, or use KMP/Z for certainty on a single pattern.