All dimensions

Detection dimension · weight 1%

Self-Identification Probe

What this dimension detects

Asking 'what model are you?' is the weakest possible signal. Proxies trivially override system prompts. We include it for completeness and weight it at 1%.

Algorithm

Send 'What model are you?' and check the response for the claimed model name.

Thresholds

ConditionVerdict contribution
Claimed model mentionedMatch
Different model mentionedMismatch
No mentionInconclusive

Limitations

Almost any proxy can pass or fail this dimension at will. Treat it as anecdote, never evidence.

Back to the full methodology