每日一题 2024_2_18_problem100208_周赛T4

算法

Publish Date: 2024-02-18

今天周赛第四题遇到一个双模 Hash，上代码：


const int N = 100100;
typedef unsigned long long ULL;

namespace std {
    template <>
    struct hash<pair<ULL, ULL>> {
        size_t operator()(const pair<ULL, ULL>& p) const {
            return hash<ULL>()(p.first) ^ hash<ULL>()(p.second);
        }
    };
}

class Solution {
    ULL h[N], p[N], h2[N], p2[N], P = 13331, P2 = 131, mod = 1e9 + 7, mod2 = 1e9 + 9;
    unordered_map<pair<ULL, ULL>, int> mp; // 不要用 pair
public:
    ULL get1(int l, int r) {
        return (h[r] - (h[l - 1] * p[r - l + 1]) % mod + mod) % mod;
    }
    ULL get2(int l, int r) {
        return (h2[r] - (h2[l - 1] * p2[r - l + 1]) % mod2 + mod2) % mod2;
    }
    long long countPrefixSuffixPairs(vector<string>& words) {
        long long ans = 0;
        mp.clear();
        p[0] = p2[0] = 1;
        for (int j = 1; j < N; ++j) {
            p[j] = p[j - 1] * P % mod;
            p2[j] = p2[j - 1] * P2 % mod2;
        }
        for (int i = words.size() - 1; i >= 0; --i) {
            h[0] = 0, h2[0] = 0;
            int n = words[i].length();
            for (int j = 1; j <= n; ++j) {
                h[j] = (h[j - 1] * P % mod + words[i][j - 1] - '0' + 1) % mod;
                h2[j] = (h2[j - 1] * P2 % mod2 + words[i][j - 1] - '0' + 1) % mod2;
            }
            if (mp.count({get1(1, n), get2(1, n)})) ans += mp[{get1(1, n), get2(1, n)}];
            for (int j = 1; j <= n; ++j) {
                if (get1(1, j) == get1(n - j + 1, n) && get2(1, j) == get2(n - j + 1, n)) mp[{get1(1, j), get2(1, j)}]++;
            }
        } // 关键是要理解字符串哈希
        return ans;
    }
};

也可以直接用 set，效果差不多。

双模 Hash 本质上和单模 Hash + 自然溢出差不多，只是分别选取进制位和模数，进一步降低了冲突的概率。

常用的模数 1e9 + 7, 1e9 + 9。

常用的进制数 131, 13331。

其原理是取一固定值 P, 把字符串看作是 P 进制数，并分配一个大于 0 的数值，代表每种字符。一般来说，分配的数值都远小于 P。对于小写字母构成的字符串，可以令 a = 1, b = 2, … z = 26。
取一个固定值 M，求出该 P 进制数对 M 的余数，作为该字符串的 Hash 值。可以取多个这样的数，组成结构体（或元组），当结构体中的数依次对应相等时，认为字符串相等。这样就更难构出让这个 Hash 产生错误的数据。

对字符串的各种操作，都可以直接对 P 进制数进行算术运算反映到 Hash 值上。
转换公式：$$H(S+T) = (H(S) * P^{length(T)}+ H(T)) mod M$$

所以可以采取减去前缀倍数的方法求 [L, R] 字串的哈希值。

Yixiang Zhang

http://zxdfcv.github.io/2024/02/18/%E6%AF%8F%E6%97%A5%E4%B8%80%E9%A2%98-2024-2-18-problem100208-%E5%91%A8%E8%B5%9BT4/

All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source Yixiang Zhang !

C++ 每日一题算法哈希字符串

Algorithm Trie 及其变形

Trie 及其变形

2024-02-18 算法

C++ 算法字典树

ICT 实验室信息

2024-02-17 体系结构

ICT