\
  The most prestigious law school admissions discussion board in the world.
BackRefresh Options Favorite

GPT 5.2 is again so far ahead of everything that its a joke

I was honestly wondering if Claude was going to jump ahead w...
Bipolar lilac business firm
  01/12/26
how can you tell?
magenta excitant roommate
  01/12/26
I'm just guessing
Purple faggotry plaza
  01/12/26
...
Light Galvanic Juggernaut
  01/12/26
...
Bipolar lilac business firm
  01/12/26
...
,,,;,,,,,;:;,,,,;::::;,,,;;:::;:;:?:::::;;;;;;
  01/16/26
It's a hilarious/brutal market. There's effectively no lock...
razzle white international law enforcement agency rigor
  01/12/26
I've used ChatGPT a few times to make scripts for Windows, a...
walnut tantric nursing home people who are hurt
  01/12/26
Windows Scripts? You mean PowerShell? I haven't tried that. ...
razzle white international law enforcement agency rigor
  01/12/26
Powershell and just good old batch files for cmd.
walnut tantric nursing home people who are hurt
  01/12/26
i dont believe for a second that it is messing up powershell...
Bipolar lilac business firm
  01/12/26
...
Purple faggotry plaza
  01/12/26
Nothing nearly that long. And it has been things like CLI sy...
walnut tantric nursing home people who are hurt
  01/12/26
well idk. i dont use powershell I use ubuntu and mac os only...
Bipolar lilac business firm
  01/12/26
I've also noticed it's lackluster at powershell
Quality Lawing Center
  01/16/26
No it's horrible
cracking at-the-ready brunch therapy
  01/12/26
One of the things I've found most frustrating is that when y...
walnut tantric nursing home people who are hurt
  01/12/26
What are you trying to do and what prompt did you give the b...
razzle white international law enforcement agency rigor
  01/12/26
I am coding a discord replacement in assembly language B...
cracking at-the-ready brunch therapy
  01/12/26
lol yeah no wonder. you are using it to code assembly langua...
Bipolar lilac business firm
  01/12/26
...
razzle white international law enforcement agency rigor
  01/12/26
were u possessed by NSAM
indecent voyeur useless brakes
  01/12/26
I'm flaming online
cracking at-the-ready brunch therapy
  01/12/26
...
wonderful tripping temple chad
  01/12/26
Scripts are coding and it's good also try codex
Purple faggotry plaza
  01/12/26
(bizarro world where opus 4.5 doesn't exist)
fluffy mind-boggling parlor
  01/12/26
...
harsh center
  01/12/26
its much better than opus 4.5. opus 4.5 just comes up with a...
Bipolar lilac business firm
  01/12/26
ChatGPT is smarter likexo autist Claude is more polished ...
elite property deer antler
  01/12/26
yeah i agree with that
Bipolar lilac business firm
  01/12/26
...
harsh center
  01/12/26
...
Fragrant bbw
  01/12/26
What a Joke
brass private investor bawdyhouse
  01/12/26
...
Bipolar lilac business firm
  01/12/26
yeah turns out my intuition was correct. even though claude ...
chopped unc
  01/15/26
With "the best LLM" use case is everything.
.,.,...,..,.,.,:,,:,.,.,:::,...,:,...:..:.,:.::,.
  01/15/26
to me the generalized reasoning capability is the most impor...
chopped unc
  01/15/26
It changes on a weekly basis lately Opus 4.5 was kicking...
Patel Philippe
  01/15/26
this makes it sound like you think reasoning tasks themselve...
chopped unc
  01/15/26
I have a multi-model validation process that's a ghetto vers...
Patel Philippe
  01/15/26
You are confusing different output from the same prompt as t...
chopped unc
  01/15/26
I would say it's more fluid than week to week And I never...
Patel Philippe
  01/15/26
I rike a GRM
computer online
  01/16/26


Poast new message in this thread



Reply Favorite

Date: January 12th, 2026 2:22 PM
Author: Bipolar lilac business firm

I was honestly wondering if Claude was going to jump ahead with 5.1 being kind of lackluster. but then 5.2 made a huge leap ahead of opus. and obviously gemini is still way behind both of those

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49583695)



Reply Favorite

Date: January 12th, 2026 2:23 PM
Author: magenta excitant roommate

how can you tell?

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49583697)



Reply Favorite

Date: January 12th, 2026 2:24 PM
Author: Purple faggotry plaza

I'm just guessing

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49583701)



Reply Favorite

Date: January 12th, 2026 3:10 PM
Author: Light Galvanic Juggernaut



(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49583849)



Reply Favorite

Date: January 12th, 2026 4:32 PM
Author: Bipolar lilac business firm



(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584035)



Reply Favorite

Date: January 16th, 2026 2:27 AM
Author: ,,,;,,,,,;:;,,,,;::::;,,,;;:::;:;:?:::::;;;;;;




(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49593262)



Reply Favorite

Date: January 12th, 2026 2:27 PM
Author: razzle white international law enforcement agency rigor

It's a hilarious/brutal market. There's effectively no lock-in, so I just switch between whoever is good right now. Think they're the current winner, I never need to switch bots because it can't do something. In 4 the python became passable, now in 5.2 it's really good. I never bother coding "by hand" any more.

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49583708)



Reply Favorite

Date: January 12th, 2026 2:31 PM
Author: walnut tantric nursing home people who are hurt

I've used ChatGPT a few times to make scripts for Windows, and it always fucks up. Is it better at real coding?

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49583728)



Reply Favorite

Date: January 12th, 2026 2:32 PM
Author: razzle white international law enforcement agency rigor

Windows Scripts? You mean PowerShell? I haven't tried that. It does just fine with python, which isn't surprising. Python is one of the most documented languages on the planet.

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49583733)



Reply Favorite

Date: January 12th, 2026 2:37 PM
Author: walnut tantric nursing home people who are hurt

Powershell and just good old batch files for cmd.

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49583753)



Reply Favorite

Date: January 12th, 2026 4:42 PM
Author: Bipolar lilac business firm

i dont believe for a second that it is messing up powershell scripts. maybe a script 3K lines long you glued together to train a model might not work the first go around.

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584053)



Reply Favorite

Date: January 12th, 2026 4:58 PM
Author: Purple faggotry plaza



(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584090)



Reply Favorite

Date: January 12th, 2026 6:07 PM
Author: walnut tantric nursing home people who are hurt

Nothing nearly that long. And it has been things like CLI syntax errors and mistakes in handling errors.

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584273)



Reply Favorite

Date: January 12th, 2026 6:15 PM
Author: Bipolar lilac business firm

well idk. i dont use powershell I use ubuntu and mac os only. it never fucks up bash and python scripts for me. the older versions used to fuck them up a lot. I haven't seen gpt make a syntax error since like gpt 4.1. maybe once in a while on o3 and o4. but not since it switched to 5.

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584292)



Reply Favorite

Date: January 16th, 2026 2:22 AM
Author: Quality Lawing Center

I've also noticed it's lackluster at powershell

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49593260)



Reply Favorite

Date: January 12th, 2026 2:34 PM
Author: cracking at-the-ready brunch therapy

No it's horrible

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49583738)



Reply Favorite

Date: January 12th, 2026 2:41 PM
Author: walnut tantric nursing home people who are hurt

One of the things I've found most frustrating is that when you're telling it to fix an error (which is annoying enough), it will often change (and break) other things that aren't at issue and were working just fine. It will even do that sometimes when you specifically tell it not to touch anything else.

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49583773)



Reply Favorite

Date: January 12th, 2026 2:45 PM
Author: razzle white international law enforcement agency rigor

What are you trying to do and what prompt did you give the bot

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49583786)



Reply Favorite

Date: January 12th, 2026 2:55 PM
Author: cracking at-the-ready brunch therapy

I am coding a discord replacement in assembly language

But it can't even give the right proxmox commands. It always says pct restart instead of pct reboot for instance, It's bad

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49583813)



Reply Favorite

Date: January 12th, 2026 4:34 PM
Author: Bipolar lilac business firm

lol yeah no wonder. you are using it to code assembly language. its really good at the popular languages. python is probably its best. also you can't always just give it some abstract prompt and expect it to guess the "right" implementation you have in mind. you need to be feeding it real concrete evidence from outputs etc.

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584036)



Reply Favorite

Date: January 12th, 2026 5:19 PM
Author: razzle white international law enforcement agency rigor



(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584139)



Reply Favorite

Date: January 12th, 2026 5:20 PM
Author: indecent voyeur useless brakes

were u possessed by NSAM

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584143)



Reply Favorite

Date: January 12th, 2026 5:50 PM
Author: cracking at-the-ready brunch therapy

I'm flaming online

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584236)



Reply Favorite

Date: January 12th, 2026 6:09 PM
Author: wonderful tripping temple chad



(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584278)



Reply Favorite

Date: January 12th, 2026 3:01 PM
Author: Purple faggotry plaza

Scripts are coding and it's good also try codex

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49583832)



Reply Favorite

Date: January 12th, 2026 3:10 PM
Author: fluffy mind-boggling parlor

(bizarro world where opus 4.5 doesn't exist)

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49583847)



Reply Favorite

Date: January 12th, 2026 4:34 PM
Author: harsh center



(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584037)



Reply Favorite

Date: January 12th, 2026 4:35 PM
Author: Bipolar lilac business firm

its much better than opus 4.5. opus 4.5 just comes up with all of this stuff that literally doesn't matter. like give it something thats broken and it will give you 100 edge cases that matter in 1 in a billion situations and tell you to fix them. gpt 5.2 gives you useful stuff every time. opus 4.5 can be ok though if you have gpt 5.2 to keep it in check but still nowhere even close to as good at coding and math as gpt 5.2. also gpt 5.2 is much more conservative about conclusions. opus 4.5 will be super super certain about an answer when its just guessing. the guess might be correct like 1 in 5 times but the times its hallucinating add up. there are probably things opus is better at like writing and business intelligence. but im 100% sure gpt 5.2 is substantially better at coding and math because the gap is big.

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584039)



Reply Favorite

Date: January 12th, 2026 4:54 PM
Author: elite property deer antler

ChatGPT is smarter likexo autist

Claude is more polished like legally blonde or whatever

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584080)



Reply Favorite

Date: January 12th, 2026 5:46 PM
Author: Bipolar lilac business firm

yeah i agree with that

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584221)



Reply Favorite

Date: January 12th, 2026 8:03 PM
Author: harsh center



(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584733)



Reply Favorite

Date: January 12th, 2026 4:43 PM
Author: Fragrant bbw



(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584058)



Reply Favorite

Date: January 12th, 2026 5:04 PM
Author: brass private investor bawdyhouse

What a Joke

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584107)



Reply Favorite

Date: January 12th, 2026 5:46 PM
Author: Bipolar lilac business firm



(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49584223)



Reply Favorite

Date: January 15th, 2026 10:37 PM
Author: chopped unc

yeah turns out my intuition was correct. even though claude slightly edges out gpt 5.2 on the SWE i think by .8%, GPT scores SIGNIFICANTLY HIGHER on abstract reasoning benchmarks (54.2% vs. 37.6%) for Opus 4.5 making it better at generalizing to solve problems outside its training data. and also Blocker-Severity Vulnerabilities: GPT 5.2 High achieved a best-in-class security posture with only 16 blocker vulnerabilities per million lines of code (MLOC).

Claude Comparison: By contrast, Claude Opus 4.5 Thinking generated 44 blockers per MLOC—nearly 3x as many—while Claude Sonnet 4.5 registered a high of 198 blockers. In deep-reasoning evaluations, GPT 5.2 has demonstrated a significant lead in identifying "blocker-severity" issues:

GPT 5.2: Identified 13 out of 15 critical system-level errors (such as subtle race conditions and complex handler registration bugs).

Claude Opus 4.5: Identified only 5 out of 15 of these same high-level architectural errors, often missing the deeper root causes despite fixing the surface-level bugs.

Also GPT 5.2 scores significantly higher on a newer version of the SWE called SWE pro which is less python centric. 55.6% for 5.2 with 43.3 for Claude.

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49593063)



Reply Favorite

Date: January 15th, 2026 10:47 PM
Author: .,.,...,..,.,.,:,,:,.,.,:::,...,:,...:..:.,:.::,.


With "the best LLM" use case is everything.

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49593074)



Reply Favorite

Date: January 15th, 2026 10:50 PM
Author: chopped unc

to me the generalized reasoning capability is the most important part. in terms of more specific "use case" stuff its just about being clever enough to make the model give you useful outputs and clever ways to cross-check and verify without putting in too much effort

I want an AGI level superhuman general reasoning machine. Not a leading expert in x gay field

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49593078)



Reply Favorite

Date: January 15th, 2026 10:55 PM
Author: Patel Philippe

It changes on a weekly basis lately

Opus 4.5 was kicking everyone's ass since December to the point it became personally ranked #1 for the first time ever, but only for a few weeks. Anthropic must be selectively throttling max subs because some days it's noticeably weaker. On Sunday nights it is so much smarter and better at outputs and handling large contexts that it feels like a different AI

Gpt 5.2 needs more conscious prompting to extract the desired output and its responses are dense and less legible; but there is no question that it bitch slaps opus 4.5 on more complicated reasoning tasks as of THIS WEEK

Next week might be a different story

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49593089)



Reply Favorite

Date: January 15th, 2026 11:00 PM
Author: chopped unc

this makes it sound like you think reasoning tasks themselves change in difficulty from week to week, but maybe you just were lazy about what you wrote? The thing that changes is that the next model claude releases could be better than 5.2. I guess what you mean is that Opus 5 might be better than gpt 5.2 which I agree with.

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49593093)



Reply Favorite

Date: January 15th, 2026 11:13 PM
Author: Patel Philippe

I have a multi-model validation process that's a ghetto version of mainlining's mahchine. It's the same exact prompts that I'm comparing outputs from. There was a brief period that 4.5 still outperformed 5.2 (and Sunday nights)

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49593110)



Reply Favorite

Date: January 15th, 2026 11:18 PM
Author: chopped unc

You are confusing different output from the same prompt as the model changing from week to week. The outputs are not going to be the same every time. Outputs are stochastic. The model itself is the same until they release an update.

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49593113)



Reply Favorite

Date: January 15th, 2026 11:42 PM
Author: Patel Philippe

I would say it's more fluid than week to week

And I never said that models change in between releases

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49593152)



Reply Favorite

Date: January 16th, 2026 6:58 AM
Author: computer online (🧐)

I rike a GRM

(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2\u0026mark_id=5310908#49593313)