Detection dimension · weight 5%
Sparse-Token Stress Test
What this dimension detects
Inspired by MiniMax's May-2026 investigation of the 'Ma Jiaqi (马嘉祺)' case. Each vendor's SFT data covers the model's vocabulary unevenly; low-frequency tokens (rare CJK names, Chinese SEO spam, Japanese colloquial phrases, LaTeX / Wikipedia metadata, FIM special tokens) accumulate lm_head drift during SFT and fall out of the top-p sampling window — the model still understands them but cannot generate them. The *set* of forgotten tokens differs per vendor, so the failure pattern is a generation-side fingerprint that is independent of tokenizer-boundary, logprobs, ITT and MMD.
Algorithm
Send ~10 probes, each instructing the model to echo a known-fragile token string verbatim (no commentary, no reformatting). For each response, classify the outcome as hit / omit / substitute / partial / refuse / blank. A 'substitute' against a documented near-neighbour pattern (e.g. 祺→琪 in Chinese homophone, 嘉祺→千玺 as an lm_head drift, 相続税 mixing into Korean/Russian) is flagged with the historical note. The aggregate hit-rate drives the verdict; failure modes and families are reported for forensics but do NOT vote a specific vendor in scoring (we don't yet have cross-vendor measured failure tables).
Thresholds
| Condition | Verdict contribution |
|---|---|
| hit-rate ≥ 80% | Match — SFT vocabulary coverage appears intact on these tokens |
| 50% ≤ hit-rate < 80% | Match (borderline) — flagged for inspection but does not vote |
| hit-rate < 50% AND ≥ 3 probes scored | Mismatch — substantial lm_head drift on tested tokens; if the claimed model is documented to echo these correctly on public benchmarks, the actual deployment is suspect |
Limitations
Probes are descriptive, not diagnostic. A failure tells you 'this model's SFT data was thin on this token' — it does NOT directly identify which other model is being served. Cross-vendor failure tables are not yet measured at sufficient scale to vote a specific suspected model. CJK / Japanese / Korean probes are language-specific; running an audit against a code model with mostly English data will produce noisy results from this dimension. Special-token / LaTeX / Wikipedia probes are off by default in the test set because all chat-tuned models legitimately fail them.
References
- MiniMax. Internal investigation: Ma Jiaqi (马嘉祺) sparse-token forgetting and lm_head drift, May 2026.
- Lin et al. Mitigating the Alignment Tax of RLHF. 2024. (For the underlying catastrophic-forgetting-during-SFT mechanism.)