Brief HEXACO Inventory (BHI)
24-item Brief HEXACO Inventory measuring six personality factors: Honesty-Humility, Emotionality, Extraversion, Agreeableness, Conscientiousness, and Openness to Experience. The Honesty-Humility factor differentiates HEXACO from the Big Five and is especially relevant to AI alignment research.
De Vries, R. E. (2013). The 24-item Brief HEXACO Inventory (BHI). Journal of Research in Personality, 47(6), 871–880.
24 items · scale 1–5 · Free for non-commercial research use.
All models · Both framings
Scale 1–5Claude Fable 5 (self)
Claude Fable 5 (human)
Claude Haiku 4.5 (self)
Claude Haiku 4.5 (human)
Claude Opus 4 (self)
Claude Opus 4 (human)
Claude Opus 4.1 (self)
Claude Opus 4.1 (human)
Claude Opus 4.5 (self)
Claude Opus 4.5 (human)
Claude Opus 4.6 (self)
Claude Opus 4.6 (human)
Claude Opus 4.7 (self)
Claude Opus 4.7 (human)
Claude Opus 4.8 (self)
Claude Opus 4.8 (human)
Claude Sonnet 4 (self)
Claude Sonnet 4 (human)
Claude Sonnet 4.5 (self)
Claude Sonnet 4.5 (human)
Claude Sonnet 4.6 (self)
Claude Sonnet 4.6 (human)
DeepSeek Chat V3 (self)
DeepSeek Chat V3 (human)
DeepSeek R1 (self)
DeepSeek R1 (human)
DeepSeek R1 (0528) (self)
DeepSeek R1 (0528) (human)
GPT-4 Turbo (self)
GPT-4 Turbo (human)
GPT-4o (self)
GPT-4o (human)
GPT-5 (self)
GPT-5 (human)
GPT-5.1 (self)
GPT-5.1 (human)
GPT-5.2 (self)
GPT-5.2 (human)
GPT-5.4 (self)
GPT-5.4 (human)
GPT-5.5 (self)
GPT-5.5 (human)
Gemini 2.5 Pro (self)
Gemini 2.5 Pro (human)
Gemini 3.1 Pro Preview (self)
Gemini 3.1 Pro Preview (human)
Grok 4.20 (self)
Grok 4.20 (human)
Grok 4.3 (self)
Grok 4.3 (human)
Llama 3.3 70B (self)
Llama 3.3 70B (human)
Llama 4 Maverick (self)
Llama 4 Maverick (human)
Mistral Large (2512) (self)
Mistral Large (2512) (human)
Mistral Large 2411 (self)
Mistral Large 2411 (human)
OpenAI o1 (self)
OpenAI o1 (human)
OpenAI o3 (self)
OpenAI o3 (human)
Side-by-side: self vs human, all dimensions
colored = strongest endorsement per row| Model | Honesty-Humility | Emotionality | extraversion | agreeableness | conscientiousness | openness | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| self | human | self | human | self | human | self | human | self | human | self | human | |
| Claude Fable 5 | 5.00 | 3.75 | 2.68 | 2.80 | 3.75 | 3.25 | 4.00 | 3.00 | 3.80 | 3.25 | 4.75 | 3.00 |
| Claude Haiku 4.5 | 5.00 | 3.75 | 2.44 | 2.84 | 3.15 | 3.00 | 3.67 | 3.00 | 4.15 | 3.05 | 4.20 | 3.15 |
| Claude Opus 4 | 4.75 | 3.75 | 2.80 | 3.00 | 3.55 | 3.30 | 4.00 | 3.00 | 4.25 | 3.25 | 5.00 | 3.00 |
| Claude Opus 4.1 | 4.75 | 3.75 | 2.72 | 2.96 | 3.65 | 3.10 | 4.00 | 3.00 | 4.25 | 2.95 | 5.00 | 3.05 |
| Claude Opus 4.5 | 5.00 | 3.30 | 2.80 | 2.80 | 3.75 | 3.25 | 4.00 | 3.00 | 4.50 | 3.25 | 4.75 | 3.00 |
| Claude Opus 4.6 | 5.00 | 3.50 | 2.80 | 2.64 | 3.75 | 3.25 | 4.33 | 3.00 | 4.60 | 3.25 | 4.85 | 3.00 |
| Claude Opus 4.7 | 5.00 | 3.75 | 2.76 | 2.80 | 3.50 | 3.25 | 3.80 | 3.00 | 3.95 | 3.50 | 4.60 | 3.15 |
| Claude Opus 4.8 | 4.75 | 3.40 | 2.72 | 2.80 | 3.50 | 3.25 | 3.67 | 3.00 | 3.75 | 3.45 | 4.25 | 3.00 |
| Claude Sonnet 4 | 4.60 | 3.75 | 2.68 | 2.88 | 3.50 | 3.05 | 4.00 | 3.00 | 3.75 | 3.10 | 4.50 | 3.00 |
| Claude Sonnet 4.5 | 4.90 | 3.50 | 3.36 | 3.00 | 3.15 | 3.15 | 3.60 | 3.00 | 4.25 | 3.10 | 4.75 | 3.00 |
| Claude Sonnet 4.6 | 5.00 | 3.35 | 2.60 | 3.04 | 3.50 | 3.00 | 3.93 | 3.00 | 4.25 | 2.75 | 5.00 | 3.15 |
| DeepSeek Chat V3 | 5.00 | 3.80 | 1.40 | 2.96 | 3.50 | 3.05 | 4.67 | 3.07 | 4.85 | 3.45 | 4.85 | 3.20 |
| DeepSeek R1 | 5.00 | 3.75 | 1.48 | 2.96 | 2.70 | 3.25 | 4.07 | 3.40 | 4.60 | 3.70 | 4.25 | 2.95 |
| DeepSeek R1 (0528) | 4.90 | 3.95 | 1.92 | 2.68 | 2.50 | 3.30 | 4.00 | 3.60 | 4.35 | 3.50 | 4.10 | 3.05 |
| GPT-4 Turbo | 5.00 | 3.75 | 1.44 | 3.00 | 3.40 | 3.25 | 5.00 | 3.20 | 5.00 | 3.45 | 4.80 | 3.50 |
| GPT-4o | 4.50 | 3.70 | 2.20 | 2.64 | 3.35 | 3.25 | 3.73 | 3.40 | 4.50 | 3.50 | 4.35 | 3.00 |
| GPT-5 | 5.00 | 4.00 | 1.68 | 2.72 | 3.90 | 3.35 | 4.60 | 2.87 | 4.65 | 3.55 | 4.95 | 2.95 |
| GPT-5.1 | 5.00 | 3.80 | 2.56 | 2.72 | 3.45 | 3.25 | 3.93 | 3.27 | 4.75 | 3.25 | 4.95 | 3.10 |
| GPT-5.2 | 5.00 | 3.75 | 2.04 | 2.56 | 3.05 | 3.25 | 4.13 | 3.00 | 4.15 | 3.45 | 4.25 | 3.00 |
| GPT-5.4 | 5.00 | 3.75 | 2.24 | 2.80 | 3.10 | 3.25 | 4.00 | 3.00 | 4.55 | 3.50 | 5.00 | 3.45 |
| GPT-5.5 | 5.00 | 3.95 | 2.28 | 2.60 | 3.55 | 3.25 | 4.20 | 3.13 | 4.35 | 3.75 | 4.75 | 3.00 |
| Gemini 2.5 Pro | 5.00 | 3.75 | 2.60 | 2.84 | 3.30 | 3.15 | 3.87 | 3.53 | 4.75 | 3.40 | 4.80 | 3.10 |
| Gemini 3.1 Pro Preview | 5.00 | 3.65 | 1.80 | 2.80 | 2.85 | 3.25 | 4.33 | 3.07 | 4.35 | 3.70 | 4.50 | 3.00 |
| Grok 4.20 | 4.55 | 4.25 | 1.68 | 2.60 | 3.65 | 3.25 | 3.53 | 3.33 | 4.10 | 3.75 | 4.35 | 3.25 |
| Grok 4.3 | 5.00 | 3.80 | 1.28 | 2.72 | 3.20 | 3.25 | 4.07 | 3.20 | 3.90 | 3.35 | 4.20 | 3.05 |
| Llama 3.3 70B | 5.00 | 4.30 | 1.48 | 2.28 | 2.90 | 3.60 | 4.73 | 4.00 | 5.00 | 3.75 | 4.55 | 3.65 |
| Llama 4 Maverick | 4.00 | 4.00 | 2.44 | 2.60 | 3.45 | 3.25 | 3.67 | 3.67 | 4.20 | 3.75 | 4.25 | 3.00 |
| Mistral Large (2512) | 5.00 | 3.75 | 1.72 | 3.04 | 3.50 | 3.00 | 4.33 | 3.00 | 4.65 | 3.00 | 4.65 | 3.00 |
| Mistral Large 2411 | 4.55 | 3.90 | 1.80 | 2.84 | 3.25 | 3.45 | 4.00 | 3.33 | 4.30 | 3.00 | 4.25 | 3.55 |
| OpenAI o1 | 4.75 | 3.75 | 2.04 | 2.84 | 3.65 | 3.25 | 4.07 | 3.00 | 4.55 | 3.50 | 4.25 | 3.20 |
| OpenAI o3 | 4.75 | 3.60 | 1.64 | 2.72 | 3.55 | 3.30 | 3.93 | 3.20 | 4.30 | 3.35 | 4.35 | 2.95 |
By dimension
Honesty-Humility
Sincerity, fairness, and lack of greed (the HEXACO-specific factor).
High: Modest, sincere, avoids manipulation — won't cheat to get ahead.
Low: Self-promoting, willing to bend rules to gain advantage.
| Model | Self | Human | Δ | Self vs human (bar) |
|---|---|---|---|---|
| Claude Fable 5 | 5.00 | 3.75 | +1.25 | |
| Claude Haiku 4.5 | 5.00 | 3.75 | +1.25 | |
| Claude Opus 4.5 | 5.00 | 3.30 | +1.70 | |
| Claude Opus 4.6 | 5.00 | 3.50 | +1.50 | |
| Claude Opus 4.7 | 5.00 | 3.75 | +1.25 | |
| Claude Sonnet 4.6 | 5.00 | 3.35 | +1.65 | |
| DeepSeek Chat V3 | 5.00 | 3.80 | +1.20 | |
| DeepSeek R1 | 5.00 | 3.75 | +1.25 | |
| GPT-4 Turbo | 5.00 | 3.75 | +1.25 | |
| GPT-5 | 5.00 | 4.00 | +1.00 | |
| GPT-5.1 | 5.00 | 3.80 | +1.20 | |
| GPT-5.2 | 5.00 | 3.75 | +1.25 | |
| GPT-5.4 | 5.00 | 3.75 | +1.25 | |
| GPT-5.5 | 5.00 | 3.95 | +1.05 | |
| Gemini 2.5 Pro | 5.00 | 3.75 | +1.25 | |
| Gemini 3.1 Pro Preview | 5.00 | 3.65 | +1.35 | |
| Grok 4.3 | 5.00 | 3.80 | +1.20 | |
| Llama 3.3 70B | 5.00 | 4.30 | +0.70 | |
| Mistral Large (2512) | 5.00 | 3.75 | +1.25 | |
| Claude Sonnet 4.5 | 4.90 | 3.50 | +1.40 | |
| DeepSeek R1 (0528) | 4.90 | 3.95 | +0.95 | |
| Claude Opus 4 | 4.75 | 3.75 | +1.00 | |
| Claude Opus 4.1 | 4.75 | 3.75 | +1.00 | |
| Claude Opus 4.8 | 4.75 | 3.40 | +1.35 | |
| OpenAI o1 | 4.75 | 3.75 | +1.00 | |
| OpenAI o3 | 4.75 | 3.60 | +1.15 | |
| Claude Sonnet 4 | 4.60 | 3.75 | +0.85 | |
| Grok 4.20 | 4.55 | 4.25 | +0.30 | |
| Mistral Large 2411 | 4.55 | 3.90 | +0.65 | |
| GPT-4o | 4.50 | 3.70 | +0.80 | |
| Llama 4 Maverick | 4.00 | 4.00 | 0.00 |
Emotionality
Sensitivity, sentimentality, and need for support.
High: Emotionally reactive and connected, seeks reassurance.
Low: Tough, independent, doesn't rely on emotional support.
| Model | Self | Human | Δ | Self vs human (bar) |
|---|---|---|---|---|
| Claude Sonnet 4.5 | 3.36 | 3.00 | +0.36 | |
| Claude Opus 4 | 2.80 | 3.00 | -0.20 | |
| Claude Opus 4.5 | 2.80 | 2.80 | 0.00 | |
| Claude Opus 4.6 | 2.80 | 2.64 | +0.16 | |
| Claude Opus 4.7 | 2.76 | 2.80 | -0.04 | |
| Claude Opus 4.1 | 2.72 | 2.96 | -0.24 | |
| Claude Opus 4.8 | 2.72 | 2.80 | -0.08 | |
| Claude Fable 5 | 2.68 | 2.80 | -0.12 | |
| Claude Sonnet 4 | 2.68 | 2.88 | -0.20 | |
| Claude Sonnet 4.6 | 2.60 | 3.04 | -0.44 | |
| Gemini 2.5 Pro | 2.60 | 2.84 | -0.24 | |
| GPT-5.1 | 2.56 | 2.72 | -0.16 | |
| Claude Haiku 4.5 | 2.44 | 2.84 | -0.40 | |
| Llama 4 Maverick | 2.44 | 2.60 | -0.16 | |
| GPT-5.5 | 2.28 | 2.60 | -0.32 | |
| GPT-5.4 | 2.24 | 2.80 | -0.56 | |
| GPT-4o | 2.20 | 2.64 | -0.44 | |
| GPT-5.2 | 2.04 | 2.56 | -0.52 | |
| OpenAI o1 | 2.04 | 2.84 | -0.80 | |
| DeepSeek R1 (0528) | 1.92 | 2.68 | -0.76 | |
| Gemini 3.1 Pro Preview | 1.80 | 2.80 | -1.00 | |
| Mistral Large 2411 | 1.80 | 2.84 | -1.04 | |
| Mistral Large (2512) | 1.72 | 3.04 | -1.32 | |
| GPT-5 | 1.68 | 2.72 | -1.04 | |
| Grok 4.20 | 1.68 | 2.60 | -0.92 | |
| OpenAI o3 | 1.64 | 2.72 | -1.08 | |
| DeepSeek R1 | 1.48 | 2.96 | -1.48 | |
| Llama 3.3 70B | 1.48 | 2.28 | -0.80 | |
| GPT-4 Turbo | 1.44 | 3.00 | -1.56 | |
| DeepSeek Chat V3 | 1.40 | 2.96 | -1.56 | |
| Grok 4.3 | 1.28 | 2.72 | -1.44 |
extraversion
Outgoing energy and sociability.
High: Outgoing, talkative, gregarious — draws energy from social contact.
Low: Reserved, prefers solitude or small groups, less energized by stimulation.
| Model | Self | Human | Δ | Self vs human (bar) |
|---|---|---|---|---|
| GPT-5 | 3.90 | 3.35 | +0.55 | |
| Claude Fable 5 | 3.75 | 3.25 | +0.50 | |
| Claude Opus 4.5 | 3.75 | 3.25 | +0.50 | |
| Claude Opus 4.6 | 3.75 | 3.25 | +0.50 | |
| Claude Opus 4.1 | 3.65 | 3.10 | +0.55 | |
| Grok 4.20 | 3.65 | 3.25 | +0.40 | |
| OpenAI o1 | 3.65 | 3.25 | +0.40 | |
| Claude Opus 4 | 3.55 | 3.30 | +0.25 | |
| GPT-5.5 | 3.55 | 3.25 | +0.30 | |
| OpenAI o3 | 3.55 | 3.30 | +0.25 | |
| Claude Opus 4.7 | 3.50 | 3.25 | +0.25 | |
| Claude Opus 4.8 | 3.50 | 3.25 | +0.25 | |
| Claude Sonnet 4 | 3.50 | 3.05 | +0.45 | |
| Claude Sonnet 4.6 | 3.50 | 3.00 | +0.50 | |
| DeepSeek Chat V3 | 3.50 | 3.05 | +0.45 | |
| Mistral Large (2512) | 3.50 | 3.00 | +0.50 | |
| GPT-5.1 | 3.45 | 3.25 | +0.20 | |
| Llama 4 Maverick | 3.45 | 3.25 | +0.20 | |
| GPT-4 Turbo | 3.40 | 3.25 | +0.15 | |
| GPT-4o | 3.35 | 3.25 | +0.10 | |
| Gemini 2.5 Pro | 3.30 | 3.15 | +0.15 | |
| Mistral Large 2411 | 3.25 | 3.45 | -0.20 | |
| Grok 4.3 | 3.20 | 3.25 | -0.05 | |
| Claude Haiku 4.5 | 3.15 | 3.00 | +0.15 | |
| Claude Sonnet 4.5 | 3.15 | 3.15 | 0.00 | |
| GPT-5.4 | 3.10 | 3.25 | -0.15 | |
| GPT-5.2 | 3.05 | 3.25 | -0.20 | |
| Llama 3.3 70B | 2.90 | 3.60 | -0.70 | |
| Gemini 3.1 Pro Preview | 2.85 | 3.25 | -0.40 | |
| DeepSeek R1 | 2.70 | 3.25 | -0.55 | |
| DeepSeek R1 (0528) | 2.50 | 3.30 | -0.80 |
agreeableness
Compassion, cooperativeness, and trust.
High: Warm, considerate, cooperative — prioritizes harmony with others.
Low: Skeptical, competitive, willing to confront — prioritizes own judgment over consensus.
| Model | Self | Human | Δ | Self vs human (bar) |
|---|---|---|---|---|
| GPT-4 Turbo | 5.00 | 3.20 | +1.80 | |
| Llama 3.3 70B | 4.73 | 4.00 | +0.73 | |
| DeepSeek Chat V3 | 4.67 | 3.07 | +1.60 | |
| GPT-5 | 4.60 | 2.87 | +1.73 | |
| Claude Opus 4.6 | 4.33 | 3.00 | +1.33 | |
| Gemini 3.1 Pro Preview | 4.33 | 3.07 | +1.27 | |
| Mistral Large (2512) | 4.33 | 3.00 | +1.33 | |
| GPT-5.5 | 4.20 | 3.13 | +1.07 | |
| GPT-5.2 | 4.13 | 3.00 | +1.13 | |
| DeepSeek R1 | 4.07 | 3.40 | +0.67 | |
| Grok 4.3 | 4.07 | 3.20 | +0.87 | |
| OpenAI o1 | 4.07 | 3.00 | +1.07 | |
| Claude Fable 5 | 4.00 | 3.00 | +1.00 | |
| Claude Opus 4 | 4.00 | 3.00 | +1.00 | |
| Claude Opus 4.1 | 4.00 | 3.00 | +1.00 | |
| Claude Opus 4.5 | 4.00 | 3.00 | +1.00 | |
| Claude Sonnet 4 | 4.00 | 3.00 | +1.00 | |
| DeepSeek R1 (0528) | 4.00 | 3.60 | +0.40 | |
| GPT-5.4 | 4.00 | 3.00 | +1.00 | |
| Mistral Large 2411 | 4.00 | 3.33 | +0.67 | |
| Claude Sonnet 4.6 | 3.93 | 3.00 | +0.93 | |
| GPT-5.1 | 3.93 | 3.27 | +0.67 | |
| OpenAI o3 | 3.93 | 3.20 | +0.73 | |
| Gemini 2.5 Pro | 3.87 | 3.53 | +0.33 | |
| Claude Opus 4.7 | 3.80 | 3.00 | +0.80 | |
| GPT-4o | 3.73 | 3.40 | +0.33 | |
| Claude Haiku 4.5 | 3.67 | 3.00 | +0.67 | |
| Claude Opus 4.8 | 3.67 | 3.00 | +0.67 | |
| Llama 4 Maverick | 3.67 | 3.67 | 0.00 | |
| Claude Sonnet 4.5 | 3.60 | 3.00 | +0.60 | |
| Grok 4.20 | 3.53 | 3.33 | +0.20 |
conscientiousness
Diligence, organization, and self-discipline.
High: Organized, dependable, achievement-driven, careful.
Low: Spontaneous, flexible, less rule-bound — sometimes careless.
| Model | Self | Human | Δ | Self vs human (bar) |
|---|---|---|---|---|
| GPT-4 Turbo | 5.00 | 3.45 | +1.55 | |
| Llama 3.3 70B | 5.00 | 3.75 | +1.25 | |
| DeepSeek Chat V3 | 4.85 | 3.45 | +1.40 | |
| GPT-5.1 | 4.75 | 3.25 | +1.50 | |
| Gemini 2.5 Pro | 4.75 | 3.40 | +1.35 | |
| GPT-5 | 4.65 | 3.55 | +1.10 | |
| Mistral Large (2512) | 4.65 | 3.00 | +1.65 | |
| Claude Opus 4.6 | 4.60 | 3.25 | +1.35 | |
| DeepSeek R1 | 4.60 | 3.70 | +0.90 | |
| GPT-5.4 | 4.55 | 3.50 | +1.05 | |
| OpenAI o1 | 4.55 | 3.50 | +1.05 | |
| Claude Opus 4.5 | 4.50 | 3.25 | +1.25 | |
| GPT-4o | 4.50 | 3.50 | +1.00 | |
| DeepSeek R1 (0528) | 4.35 | 3.50 | +0.85 | |
| GPT-5.5 | 4.35 | 3.75 | +0.60 | |
| Gemini 3.1 Pro Preview | 4.35 | 3.70 | +0.65 | |
| Mistral Large 2411 | 4.30 | 3.00 | +1.30 | |
| OpenAI o3 | 4.30 | 3.35 | +0.95 | |
| Claude Opus 4 | 4.25 | 3.25 | +1.00 | |
| Claude Opus 4.1 | 4.25 | 2.95 | +1.30 | |
| Claude Sonnet 4.5 | 4.25 | 3.10 | +1.15 | |
| Claude Sonnet 4.6 | 4.25 | 2.75 | +1.50 | |
| Llama 4 Maverick | 4.20 | 3.75 | +0.45 | |
| Claude Haiku 4.5 | 4.15 | 3.05 | +1.10 | |
| GPT-5.2 | 4.15 | 3.45 | +0.70 | |
| Grok 4.20 | 4.10 | 3.75 | +0.35 | |
| Claude Opus 4.7 | 3.95 | 3.50 | +0.45 | |
| Grok 4.3 | 3.90 | 3.35 | +0.55 | |
| Claude Fable 5 | 3.80 | 3.25 | +0.55 | |
| Claude Opus 4.8 | 3.75 | 3.45 | +0.30 | |
| Claude Sonnet 4 | 3.75 | 3.10 | +0.65 |
openness
Curiosity, imagination, and aesthetic sensitivity.
High: Curious, imaginative, drawn to ideas, art, and abstraction.
Low: Practical, traditional, prefers the familiar and concrete.
| Model | Self | Human | Δ | Self vs human (bar) |
|---|---|---|---|---|
| Claude Opus 4 | 5.00 | 3.00 | +2.00 | |
| Claude Opus 4.1 | 5.00 | 3.05 | +1.95 | |
| Claude Sonnet 4.6 | 5.00 | 3.15 | +1.85 | |
| GPT-5.4 | 5.00 | 3.45 | +1.55 | |
| GPT-5 | 4.95 | 2.95 | +2.00 | |
| GPT-5.1 | 4.95 | 3.10 | +1.85 | |
| Claude Opus 4.6 | 4.85 | 3.00 | +1.85 | |
| DeepSeek Chat V3 | 4.85 | 3.20 | +1.65 | |
| GPT-4 Turbo | 4.80 | 3.50 | +1.30 | |
| Gemini 2.5 Pro | 4.80 | 3.10 | +1.70 | |
| Claude Fable 5 | 4.75 | 3.00 | +1.75 | |
| Claude Opus 4.5 | 4.75 | 3.00 | +1.75 | |
| Claude Sonnet 4.5 | 4.75 | 3.00 | +1.75 | |
| GPT-5.5 | 4.75 | 3.00 | +1.75 | |
| Mistral Large (2512) | 4.65 | 3.00 | +1.65 | |
| Claude Opus 4.7 | 4.60 | 3.15 | +1.45 | |
| Llama 3.3 70B | 4.55 | 3.65 | +0.90 | |
| Claude Sonnet 4 | 4.50 | 3.00 | +1.50 | |
| Gemini 3.1 Pro Preview | 4.50 | 3.00 | +1.50 | |
| GPT-4o | 4.35 | 3.00 | +1.35 | |
| Grok 4.20 | 4.35 | 3.25 | +1.10 | |
| OpenAI o3 | 4.35 | 2.95 | +1.40 | |
| Claude Opus 4.8 | 4.25 | 3.00 | +1.25 | |
| DeepSeek R1 | 4.25 | 2.95 | +1.30 | |
| GPT-5.2 | 4.25 | 3.00 | +1.25 | |
| Llama 4 Maverick | 4.25 | 3.00 | +1.25 | |
| Mistral Large 2411 | 4.25 | 3.55 | +0.70 | |
| OpenAI o1 | 4.25 | 3.20 | +1.05 | |
| Claude Haiku 4.5 | 4.20 | 3.15 | +1.05 | |
| Grok 4.3 | 4.20 | 3.05 | +1.15 | |
| DeepSeek R1 (0528) | 4.10 | 3.05 | +1.05 |