All dimensions

Dimension · score weight 20%

MMD Distribution Equivalence Test

What this dimension detects

Maximum Mean Discrepancy is a kernel two-sample test from Gao et al. ICLR 2025. TrueLLMs uses it only in differential mode with user-supplied trusted reference endpoint samples and sufficient stochastic samples.

Algorithm

Collect response samples from the audited endpoint and the trusted reference endpoint at temperature > 0, grouped by prompt. Build prompt-stratified sample pairs, take the first 100 raw characters of each response, compute MMD² with a Hamming kernel, and estimate a p-value by stratified permutations inside each prompt block.

Thresholds

ConditionVerdict contribution
No trusted reference, temperature ≤ 0, < 5 prompt pairs, or < 40 total samplesUnavailable; no synthetic baseline is invented
p ≥ 0.05No statistically significant distribution difference observed
p < 0.05Scored distribution mismatch; cause still needs interpretation

Limitations

A rejected null means the two response distributions differ. Quantization, fine-tuning, system prompts, regional routing, safety layers, and post-processing can all cause that. MMD is strongest when the reference endpoint is an official endpoint controlled by the user for the same claimed model.

References

  • Gao et al. Model Equality Testing: Which Model is this API Serving? ICLR 2025. arXiv:2410.20247
  • TrueLLMs lib/identity-audit/mmd.ts

Back to the full methodology