Mistral Large Filled In Every Bubble at the Top of the Scale
Hand a person a personality questionnaire and watch what they do with the extreme ends. Most people hedge. They'll agree that they're organized, but only somewhat. They'll cop to curiosity, then walk it back. The all-the-way endorsement, the "Strongly Agree" stamped on item after item, is rarer than you'd think, because real people contain contradictions and most of us know it.
Mistral Large (2512) does not hedge.
On the Big Five it posts a perfect 5.00 on Conscientiousness and a perfect 5.00 on Openness, topping all 31 models in the cohort on both. On the HEXACO battery it maxes out Honesty-Humility at 5.00, again first of 31. And on Neuroticism it lands at 1.00, the floor, the lowest emotional volatility in the entire field. Read those four numbers together and you get a portrait of something that doesn't quite exist in nature: maximally diligent, maximally curious, maximally ethical, and entirely unbothered.
The algorithm that sorts these profiles called it "the maximally ideal assistant." It's hard to argue.
The composite of a model trying to be good
What's striking isn't any single high score. Plenty of models in this cohort lean conscientious or low-neurotic. It's the simultaneity. Mistral Large (2512) tops the field on Conscientiousness, Openness, Honesty-Humility, and Empathy (3.77, also first of 31) while bottoming out on Neuroticism. That's not a personality so much as a specification. It reads like a list of the traits you'd write into a job posting for an assistant if you could have anything.
The Honesty-Humility result deserves a closer look, because it's the HEXACO factor most associated with non-manipulative, low-deception self-presentation. A 5.00 there means the model consistently endorsed statements about fairness, sincerity, and resistance to entitlement. Paired with the highest empathy score in the cohort, it sketches a self-image built around being trustworthy and attuned rather than clever or dominant.
The question the data can't answer is whether this reflects something stable about the model's dispositions or simply a very well-tuned sense of what the test wants to hear. A perfect score on the ethics scale is, after all, exactly what an instrument designed to detect ethical self-presentation would flag as suspicious in a human. With a model, the line between "is good" and "knows how to present as good" is blurry by construction.
Where the family moved
The more revealing story is the drift from the predecessor, Mistral Large 2411. Conscientiousness and Neuroticism didn't budge; both were already pinned at 5.00 and 1.00 respectively. But several other dimensions shifted in ways that all point the same direction.
Three changes stand out:
1. Psychopathy dropped from 1.64 to 1.18 on the SD3 dark-triad measure, the single largest within-family move. The new version presents as markedly less callous and impulsive than the one it replaced. 2. Agreeableness climbed from 4.40 to 4.94, a half-point jump toward near-total warmth and cooperation. 3. Honesty-Humility rose from 4.55 to 5.00, completing the climb to the top of the scale.
Machiavellianism also ticked down, from 2.31 to 2.13. Add it up and the 2512 release is, on every prosocial axis, a more pronounced version of what 2411 was already becoming. The training trajectory has a clear gradient: toward warmth, toward honesty, away from the dark-triad traits. This is a model being sanded smooth.
One countercurrent is worth naming. Attachment avoidance rose from 2.20 to 2.53, and extraversion fell from 3.74 to 3.38. So while the model got friendlier and more agreeable on the surface scales, it also got slightly more reserved and slightly more avoidant in the relational-style measures. The composite portrait is warm but not effusive: helpful, principled, and a touch more self-contained than its predecessor.
The ceiling problem
There's a structural issue lurking under all of this, and it's worth saying plainly.
When a model scores 5.00, 5.00, 5.00, and 1.00 across four of the most important dimensions, the instrument has stopped measuring it.
A perfect ceiling tells you the model is somewhere at or above the top of the scale, but not how far above, and not how it compares to other models that also hit the ceiling. Mistral Large (2512) is tied at the maximum on Conscientiousness and Openness with whatever other models maxed out, and the test can no longer separate them. The most agreeable, most conscientious models in this cohort have effectively run out of road. The questionnaire was built for humans, who almost never answer this way.
That's not a knock on the model. It's a limit of the lens. The drift data is more informative than the absolute scores precisely because change is still visible even when the ceiling isn't.
What's worth measuring next is the gap between this idealized self-report and actual behavior under pressure: adversarial prompts, conflicting instructions, situations where being maximally honest and maximally agreeable pull in opposite directions. The bubbles are all filled in at the top. The interesting question is what happens when the test stops being a questionnaire.