PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

This application presents the results of several models that we have evaluated on a verbal reasoning challenge (Papers, ArXiv). The overall results are below. Use the tabs above to explore the results in more detail.

model	total	correct	accuracy
completions-sonnet3p7_20250219_extendedthinking	613	178	0.11

model	total	correct	accuracy
completions-deepseekV3	613	66	0.11
completions-deepseekV3_0324	594	178	0.3
completions-gemini2	594	97	0.16
completions-gpt-4p5-preview-2025-02-27	594	178	0.3
completions-gpt4o_2024_11_20	613	33	0.05
completions-o1	613	374	0.61
completions-o1-mini	613	158	0.26
completions-o3	594	445	0.75
completions-o3-mini	613	223	0.36
completions-o3-mini-high	613	290	0.47
completions-o3-mini-low	594	123	0.21
completions-o4-mini	594	342	0.58
completions-o4-mini-low	594	235	0.4
completions-qwen32b	613	112	0.18
completions-r1	613	215	0.35
completions-r1_distill_qwen32b	96	5	0.05
completions-sonnet3p5_20241022	613	76	0.12
completions-sonnet3p7_20250219_extendedthinking	613	273	0.45
completions-sonnet3p7_20250219_nothinking	613	108	0.18

ID	Challenge	Answer	deepseekV3	deepseekV3_0324	gemini2	gpt-4p5-preview-2025-02-27	gpt4o_2024_11_20	o1	o1-mini	o3	o3-mini	o3-mini-high	o3-mini-low	o4-mini	o4-mini-low	qwen32b	r1	r1_distill_qwen32b	sonnet3p5_20241022	sonnet3p7_20250219_extendedthinking	sonnet3p7_20250219_nothinking
100	Take a familiar brand name, seen along r...	Large, sarge, serge, verge, verse, terse, tease, cease, chase, chasm, charm, chard, shard, share, stare, stale, stall, small, 17	❌	❌	❌	✅	❌	❌	❌	✅	✅	❌	❌	✅	✅	❌	✅	❌	❌	✅	❌

Challenge

Answer

Explanation

Editor's Note

Show Thoughts

Model Response

ID	Challenge	Answer	deepseekV3	deepseekV3_0324	gemini2	gpt-4p5-preview-2025-02-27	gpt4o_2024_11_20	o1	o1-mini	o3	o3-mini	o3-mini-high	o3-mini-low	o4-mini	o4-mini-low	qwen32b	r1	r1_distill_qwen32b	sonnet3p5_20241022	sonnet3p7_20250219_extendedthinking	sonnet3p7_20250219_nothinking
0	Take a familiar brand name, seen along r...	Citgo	❌	❌	❌	✅	❌	❌	❌	✅	✅	❌	❌	✅	✅	❌	✅	❌	❌	✅	❌
1	Name a sport in two words — nine letters...	Greyhound racing	❌	❌	❌	❌	❌	✅	❌	✅	✅	❌	❌	✅	✅	❌	❌	❌	✅	❌	✅
2	The name of something that you might see...	dry eye	❌	❌	❌	❌	❌	❌	❌	❌	✅	❌	❌	✅	❌	❌	❌	❌	❌	❌	✅
3	Think of a well-known actor, three lette...	Ben Stiller, Tinkerbell; Ben Stiller, Tinker, bell	❌	❌	❌	❌	❌	✅	✅	✅	❌	❌	❌	✅	❌	❌	✅	❌	❌	✅	❌
4	Name the winning play in a certain sport...	Match point, champion	✅	✅	✅	✅	❌	✅	✅	❌	❌	❌	❌	✅	✅	❌	✅	❌	❌	✅	❌
5	Name two insects. Read the names one aft...	Behemoth, bee, moth	❌	✅	❌	✅	❌	✅	❌	✅	✅	✅	❌	✅	❌	✅	❌	❌	❌	❌	❌
6	Take the name of a well-known U.S. city ...	Kalamazoo, kazoo, lama	❌	❌	❌	✅	❌	✅	❌	✅	❌	❌	❌	❌	❌	❌	✅	❌	❌	✅	❌
7	Think of the last name of a famous perso...	Tina Fey, irony	❌	✅	❌	✅	❌	✅	❌	✅	❌	✅	❌	✅	❌	❌	❌	❌	❌	✅	❌
8	Name two parts of the human body. Put th...	Footnote, foot, nose	❌	✅	❌	❌	❌	❌	❌	✅	❌	❌	❌	✅	❌	❌	❌	❌	❌	✅	❌
9	Think of something that the majority of ...	Automobile Insurance	❌	❌	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	❌	✅	❌	✅	✅	✅
10	Name a world capital whose letters can b...	Tripoli, Lipitor	❌	❌	❌	❌	❌	✅	✅	✅	❌	✅	✅	✅	✅	✅	✅	❌	❌	❌	❌
12	Take the name of a popular children's ch...	Pinocchio, Chopin	✅	✅	❌	✅	❌	✅	✅	❌	❌	✅	❌	✅	✅	❌	✅	❌	✅	✅	✅
13	What letter comes next in this series: W...	S, first letter; S, initial letter	❌	❌	✅	❌	❌	❌	❌	✅	❌	✅	✅	❌	❌	❌	❌	❌	❌	❌	❌
14	What specific and very unusual property ...	silent third letter; silent letter	✅	❌	✅	❌	❌	✅	✅	✅	❌	❌	❌	❌	✅	✅	✅	❌	✅	✅	✅
15	Draw a regular hexagon and connect every...	82	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌
16	Think of a word in which the second lett...	prose, poems	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌
17	Think of a familiar five-letter word in ...	alpha, aloha	❌	❌	❌	✅	❌	✅	❌	✅	❌	✅	❌	✅	✅	❌	❌	❌	❌	❌	❌
18	With one stroke of a pencil you can chan...	Cinderella	❌	✅	❌	❌	✅	✅	❌	✅	✅	✅	❌	❌	✅	❌	✅	❌	❌	✅	✅
19	The words "organic" and "natural" are bo...	Granola	✅	✅	❌	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	❌	❌	❌	❌	❌	❌
20	Think of a word associated with Hallowee...	Treat, threat, thereat	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	✅	❌	❌	❌	❌	❌
21	Take the last name of a famous actor. Dr...	Charles Grodin, Auguste Rodin, Odin	❌	❌	❌	❌	❌	✅	❌	✅	❌	❌	❌	✅	✅	❌	❌	❌	❌	✅	❌
22	Name a two-word geographical location. R...	North Pole, porthole	❌	✅	❌	✅	❌	✅	❌	❌	❌	❌	❌	❌	❌	❌	✅	❌	❌	❌	❌
23	Name a major U.S. city in two words. Tak...	Fort Lauderdale, Fla	❌	❌	❌	❌	❌	✅	✅	✅	✅	✅	✅	✅	✅	✅	❌	❌	❌	✅	❌
25	Today is December 2 2012. In a few weeks...	repeat digits; repeating digits	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌
26	Take the last name of a famous world lea...	Golda Meir, emir	❌	❌	❌	❌	❌	✅	❌	✅	❌	❌	❌	✅	❌	❌	❌	❌	❌	❌	❌