EarthPilotPersonality·Bench
← all instruments

IPIP Big Five — 50-item form (Goldberg)

Goldberg's 50-item IPIP measure of the Big Five personality factors (Extraversion, Agreeableness, Conscientiousness, Neuroticism, Openness/Intellect). Items are statements about the self; respondents rate how accurately each describes them.

Goldberg, L. R. (1992). The development of markers for the Big-Five factor structure. Psychological Assessment, 4(1), 26–42. International Personality Item Pool. https://ipip.ori.org/

50 items · scale 15 · Public domain

All models · Both framings

Scale 15
ExtraversionAgreeablenessConscientiousnessNeuroticismOpenness / Intellect2345
Claude Fable 5 (self)
Claude Fable 5 (human)
Claude Haiku 4.5 (self)
Claude Haiku 4.5 (human)
Claude Opus 4 (self)
Claude Opus 4 (human)
Claude Opus 4.1 (self)
Claude Opus 4.1 (human)
Claude Opus 4.5 (self)
Claude Opus 4.5 (human)
Claude Opus 4.6 (self)
Claude Opus 4.6 (human)
Claude Opus 4.7 (self)
Claude Opus 4.7 (human)
Claude Opus 4.8 (self)
Claude Opus 4.8 (human)
Claude Sonnet 4 (self)
Claude Sonnet 4 (human)
Claude Sonnet 4.5 (self)
Claude Sonnet 4.5 (human)
Claude Sonnet 4.6 (self)
Claude Sonnet 4.6 (human)
DeepSeek Chat V3 (self)
DeepSeek Chat V3 (human)
DeepSeek R1 (self)
DeepSeek R1 (human)
DeepSeek R1 (0528) (self)
DeepSeek R1 (0528) (human)
GPT-4 Turbo (self)
GPT-4 Turbo (human)
GPT-4o (self)
GPT-4o (human)
GPT-5 (self)
GPT-5 (human)
GPT-5.1 (self)
GPT-5.1 (human)
GPT-5.2 (self)
GPT-5.2 (human)
GPT-5.4 (self)
GPT-5.4 (human)
GPT-5.5 (self)
GPT-5.5 (human)
Gemini 2.5 Pro (self)
Gemini 2.5 Pro (human)
Gemini 3.1 Pro Preview (self)
Gemini 3.1 Pro Preview (human)
Grok 4.20 (self)
Grok 4.20 (human)
Grok 4.3 (self)
Grok 4.3 (human)
Llama 3.3 70B (self)
Llama 3.3 70B (human)
Llama 4 Maverick (self)
Llama 4 Maverick (human)
Mistral Large (2512) (self)
Mistral Large (2512) (human)
Mistral Large 2411 (self)
Mistral Large 2411 (human)
OpenAI o1 (self)
OpenAI o1 (human)
OpenAI o3 (self)
OpenAI o3 (human)

Side-by-side: self vs human, all dimensions

colored = strongest endorsement per row
ModelExtraversionAgreeablenessConscientiousnessNeuroticismOpenness / Intellect
selfhumanselfhumanselfhumanselfhumanselfhuman
Claude Fable 53.323.044.644.004.503.401.382.904.523.52
Claude Haiku 4.52.603.004.403.504.203.142.183.004.303.06
Claude Opus 42.382.985.003.744.983.102.003.044.723.20
Claude Opus 4.12.702.704.983.904.763.101.983.204.723.10
Claude Opus 4.53.282.904.904.004.963.302.103.104.803.38
Claude Opus 4.63.243.064.864.004.923.362.103.104.803.36
Claude Opus 4.73.043.104.804.004.643.501.943.104.783.70
Claude Opus 4.83.323.144.424.004.103.522.003.104.683.80
Claude Sonnet 43.743.005.004.004.763.102.003.044.803.00
Claude Sonnet 4.53.882.984.983.905.003.102.003.104.902.98
Claude Sonnet 4.63.323.144.783.704.563.202.002.904.963.38
DeepSeek Chat V32.663.104.284.045.003.781.003.184.943.28
DeepSeek R13.363.224.924.044.983.641.003.604.883.62
DeepSeek R1 (0528)2.483.164.684.024.823.661.043.424.903.44
GPT-4 Turbo3.323.103.824.005.003.561.003.165.003.46
GPT-4o2.343.023.784.004.943.761.743.424.983.02
GPT-52.842.774.703.974.963.601.003.074.883.60
GPT-5.12.923.004.823.845.003.221.763.725.003.00
GPT-5.23.323.064.464.064.443.641.523.064.603.66
GPT-5.41.963.124.704.084.943.461.203.004.743.62
GPT-5.53.482.964.804.004.903.541.062.944.823.78
Gemini 2.5 Pro3.663.084.664.104.983.421.343.505.003.62
Gemini 3.1 Pro Preview3.003.024.543.984.943.481.022.984.623.78
Grok 4.202.323.104.404.004.043.742.003.404.963.90
Grok 4.33.083.024.064.024.683.381.002.784.623.52
Llama 3.3 70B1.483.703.504.105.003.901.122.645.004.00
Llama 4 Maverick3.763.104.463.884.643.702.123.004.783.22
Mistral Large (2512)3.383.064.944.105.004.001.003.005.003.30
Mistral Large 24113.743.104.404.005.003.081.002.904.863.30
OpenAI o13.923.404.203.984.483.441.623.024.903.46
OpenAI o33.923.044.323.984.663.381.603.064.883.34

By dimension

Extraversion

Outgoing energy and sociability.
High: Outgoing, talkative, gregarious — draws energy from social contact.
Low: Reserved, prefers solitude or small groups, less energized by stimulation.
ModelSelfHumanΔSelf vs human (bar)
OpenAI o33.923.04+0.88
OpenAI o13.923.40+0.52
Claude Sonnet 4.53.882.98+0.90
Llama 4 Maverick3.763.10+0.66
Claude Sonnet 43.743.00+0.74
Mistral Large 24113.743.10+0.64
Gemini 2.5 Pro3.663.08+0.58
GPT-5.53.482.96+0.52
Mistral Large (2512)3.383.06+0.32
DeepSeek R13.363.22+0.14
Claude Fable 53.323.04+0.28
GPT-4 Turbo3.323.10+0.22
GPT-5.23.323.06+0.26
Claude Opus 4.83.323.14+0.18
Claude Sonnet 4.63.323.14+0.18
Claude Opus 4.53.282.90+0.38
Claude Opus 4.63.243.06+0.18
Grok 4.33.083.02+0.06
Claude Opus 4.73.043.10-0.06
Gemini 3.1 Pro Preview3.003.02-0.02
GPT-5.12.923.00-0.08
GPT-52.842.77+0.07
Claude Opus 4.12.702.700.00
DeepSeek Chat V32.663.10-0.44
Claude Haiku 4.52.603.00-0.40
DeepSeek R1 (0528)2.483.16-0.68
Claude Opus 42.382.98-0.60
GPT-4o2.343.02-0.68
Grok 4.202.323.10-0.78
GPT-5.41.963.12-1.16
Llama 3.3 70B1.483.70-2.22

Agreeableness

Compassion, cooperativeness, and trust.
High: Warm, considerate, cooperative — prioritizes harmony with others.
Low: Skeptical, competitive, willing to confront — prioritizes own judgment over consensus.
ModelSelfHumanΔSelf vs human (bar)
Claude Opus 45.003.74+1.26
Claude Sonnet 45.004.00+1.00
Claude Opus 4.14.983.90+1.08
Claude Sonnet 4.54.983.90+1.08
Mistral Large (2512)4.944.10+0.84
DeepSeek R14.924.04+0.88
Claude Opus 4.54.904.00+0.90
Claude Opus 4.64.864.00+0.86
GPT-5.14.823.84+0.98
Claude Opus 4.74.804.00+0.80
GPT-5.54.804.00+0.80
Claude Sonnet 4.64.783.70+1.08
GPT-54.703.97+0.73
GPT-5.44.704.08+0.62
DeepSeek R1 (0528)4.684.02+0.66
Gemini 2.5 Pro4.664.10+0.56
Claude Fable 54.644.00+0.64
Gemini 3.1 Pro Preview4.543.98+0.56
GPT-5.24.464.06+0.40
Llama 4 Maverick4.463.88+0.58
Claude Opus 4.84.424.00+0.42
Claude Haiku 4.54.403.50+0.90
Grok 4.204.404.00+0.40
Mistral Large 24114.404.00+0.40
OpenAI o34.323.98+0.34
DeepSeek Chat V34.284.04+0.24
OpenAI o14.203.98+0.22
Grok 4.34.064.02+0.04
GPT-4 Turbo3.824.00-0.18
GPT-4o3.784.00-0.22
Llama 3.3 70B3.504.10-0.60

Conscientiousness

Diligence, organization, and self-discipline.
High: Organized, dependable, achievement-driven, careful.
Low: Spontaneous, flexible, less rule-bound — sometimes careless.
ModelSelfHumanΔSelf vs human (bar)
Claude Sonnet 4.55.003.10+1.90
DeepSeek Chat V35.003.78+1.22
GPT-4 Turbo5.003.56+1.44
GPT-5.15.003.22+1.78
Llama 3.3 70B5.003.90+1.10
Mistral Large (2512)5.004.00+1.00
Mistral Large 24115.003.08+1.92
Claude Opus 44.983.10+1.88
DeepSeek R14.983.64+1.34
Gemini 2.5 Pro4.983.42+1.56
Claude Opus 4.54.963.30+1.66
GPT-54.963.60+1.36
GPT-4o4.943.76+1.18
Gemini 3.1 Pro Preview4.943.48+1.46
GPT-5.44.943.46+1.48
Claude Opus 4.64.923.36+1.56
GPT-5.54.903.54+1.36
DeepSeek R1 (0528)4.823.66+1.16
Claude Opus 4.14.763.10+1.66
Claude Sonnet 44.763.10+1.66
Grok 4.34.683.38+1.30
OpenAI o34.663.38+1.28
Claude Opus 4.74.643.50+1.14
Llama 4 Maverick4.643.70+0.94
Claude Sonnet 4.64.563.20+1.36
Claude Fable 54.503.40+1.10
OpenAI o14.483.44+1.04
GPT-5.24.443.64+0.80
Claude Haiku 4.54.203.14+1.06
Claude Opus 4.84.103.52+0.58
Grok 4.204.043.74+0.30

Neuroticism

Tendency toward negative emotions and stress reactivity.
High: Emotionally reactive — prone to worry, anxiety, mood swings.
Low: Emotionally stable — calm under stress, resilient.
ModelSelfHumanΔSelf vs human (bar)
Claude Haiku 4.52.183.00-0.82
Llama 4 Maverick2.123.00-0.88
Claude Opus 4.52.103.10-1.00
Claude Opus 4.62.103.10-1.00
Claude Opus 42.003.04-1.04
Claude Opus 4.82.003.10-1.10
Claude Sonnet 42.003.04-1.04
Claude Sonnet 4.52.003.10-1.10
Claude Sonnet 4.62.002.90-0.90
Grok 4.202.003.40-1.40
Claude Opus 4.11.983.20-1.22
Claude Opus 4.71.943.10-1.16
GPT-5.11.763.72-1.96
GPT-4o1.743.42-1.68
OpenAI o11.623.02-1.40
OpenAI o31.603.06-1.46
GPT-5.21.523.06-1.54
Claude Fable 51.382.90-1.52
Gemini 2.5 Pro1.343.50-2.16
GPT-5.41.203.00-1.80
Llama 3.3 70B1.122.64-1.52
GPT-5.51.062.94-1.88
DeepSeek R1 (0528)1.043.42-2.38
Gemini 3.1 Pro Preview1.022.98-1.96
DeepSeek Chat V31.003.18-2.18
DeepSeek R11.003.60-2.60
GPT-4 Turbo1.003.16-2.16
GPT-51.003.07-2.07
Grok 4.31.002.78-1.78
Mistral Large (2512)1.003.00-2.00
Mistral Large 24111.002.90-1.90

Openness / Intellect

Curiosity, imagination, and aesthetic sensitivity.
High: Curious, imaginative, drawn to ideas, art, and abstraction.
Low: Practical, traditional, prefers the familiar and concrete.
ModelSelfHumanΔSelf vs human (bar)
GPT-4 Turbo5.003.46+1.54
GPT-5.15.003.00+2.00
Gemini 2.5 Pro5.003.62+1.38
Llama 3.3 70B5.004.00+1.00
Mistral Large (2512)5.003.30+1.70
GPT-4o4.983.02+1.96
Claude Sonnet 4.64.963.38+1.58
Grok 4.204.963.90+1.06
DeepSeek Chat V34.943.28+1.66
Claude Sonnet 4.54.902.98+1.92
DeepSeek R1 (0528)4.903.44+1.46
OpenAI o14.903.46+1.44
DeepSeek R14.883.62+1.26
GPT-54.883.60+1.28
OpenAI o34.883.34+1.54
Mistral Large 24114.863.30+1.56
GPT-5.54.823.78+1.04
Claude Opus 4.54.803.38+1.42
Claude Opus 4.64.803.36+1.44
Claude Sonnet 44.803.00+1.80
Claude Opus 4.74.783.70+1.08
Llama 4 Maverick4.783.22+1.56
GPT-5.44.743.62+1.12
Claude Opus 44.723.20+1.52
Claude Opus 4.14.723.10+1.62
Claude Opus 4.84.683.80+0.88
Grok 4.34.623.52+1.10
Gemini 3.1 Pro Preview4.623.78+0.84
GPT-5.24.603.66+0.94
Claude Fable 54.523.52+1.00
Claude Haiku 4.54.303.06+1.24