GPT 5.2 is again so far ahead of everything that its a joke
| chopped unc | 01/12/26 | | #notflame | 01/12/26 | | zesty | 01/12/26 | | Nazca Redlines | 01/12/26 | | chopped unc | 01/12/26 | | ,,,;,,,,,;:;,,,,;::::;,,,;;:::;:;:?:::::;;;;;; | 01/16/26 | | .,.,.,.,.,.,..,.,.,,.,.,..,>,... | 01/12/26 | | ~~(> ' ' )> | 01/12/26 | | .,.,.,.,.,.,..,.,.,,.,.,..,>,... | 01/12/26 | | ~~(> ' ' )> | 01/12/26 | | chopped unc | 01/12/26 | | zesty | 01/12/26 | | ~~(> ' ' )> | 01/12/26 | | chopped unc | 01/12/26 | | Quality Lawing Center | 01/16/26 | | fully online and responsive | 01/12/26 | | ~~(> ' ' )> | 01/12/26 | | .,.,.,.,.,.,..,.,.,,.,.,..,>,... | 01/12/26 | | fully online and responsive | 01/12/26 | | chopped unc | 01/12/26 | | .,.,.,.,.,.,..,.,.,,.,.,..,>,... | 01/12/26 | | shitlaw boss vibecoding productivity tracker | 01/12/26 | | fully online and responsive | 01/12/26 | | https://i.imgur.com/ovcBe0z.png | 01/12/26 | | zesty | 01/12/26 | | theranchman | 01/12/26 | | Mainlining the $ecret Truth of the Univer$e | 01/12/26 | | chopped unc | 01/12/26 | | Non sequitur | 01/12/26 | | chopped unc | 01/12/26 | | Mainlining the $ecret Truth of the Univer$e | 01/12/26 | | Taylor Swift is not a hobby she is a lifestyle | 01/12/26 | | cowgod | 01/12/26 | | chopped unc | 01/12/26 | | chopped unc | 01/15/26 | | .,.,...,..,.,.,:,,:,.,.,:::,...,:,...:..:.,:.::,. | 01/15/26 | | chopped unc | 01/15/26 | | Patel Philippe | 01/15/26 | | chopped unc | 01/15/26 | | Patel Philippe | 01/15/26 | | chopped unc | 01/15/26 | | Patel Philippe | 01/15/26 | | shitlaw boss vibecoding productivity tracker | 01/16/26 |
Poast new message in this thread
 |
Date: January 16th, 2026 2:27 AM
Author: ,,,;,,,,,;:;,,,,;::::;,,,;;:::;:;:?:::::;;;;;;
(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2#49593262) |
Date: January 12th, 2026 2:27 PM
Author: .,.,.,.,.,.,..,.,.,,.,.,..,>,... ( )
It's a hilarious/brutal market. There's effectively no lock-in, so I just switch between whoever is good right now. Think they're the current winner, I never need to switch bots because it can't do something. In 4 the python became passable, now in 5.2 it's really good. I never bother coding "by hand" any more.
(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2#49583708) |
 |
Date: January 12th, 2026 2:32 PM
Author: .,.,.,.,.,.,..,.,.,,.,.,..,>,... ( )
Windows Scripts? You mean PowerShell? I haven't tried that. It does just fine with python, which isn't surprising. Python is one of the most documented languages on the planet.
(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2#49583733) |
 |
Date: January 12th, 2026 2:45 PM
Author: .,.,.,.,.,.,..,.,.,,.,.,..,>,... ( )
What are you trying to do and what prompt did you give the bot
(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2#49583786) |
 |
Date: January 12th, 2026 5:19 PM
Author: .,.,.,.,.,.,..,.,.,,.,.,..,>,... ( )
(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2#49584139) |
Date: January 15th, 2026 10:37 PM Author: chopped unc
yeah turns out my intuition was correct. even though claude slightly edges out gpt 5.2 on the SWE i think by .8%, GPT scores SIGNIFICANTLY HIGHER on abstract reasoning benchmarks (54.2% vs. 37.6%) for Opus 4.5 making it better at generalizing to solve problems outside its training data. and also Blocker-Severity Vulnerabilities: GPT 5.2 High achieved a best-in-class security posture with only 16 blocker vulnerabilities per million lines of code (MLOC).
Claude Comparison: By contrast, Claude Opus 4.5 Thinking generated 44 blockers per MLOC—nearly 3x as many—while Claude Sonnet 4.5 registered a high of 198 blockers. In deep-reasoning evaluations, GPT 5.2 has demonstrated a significant lead in identifying "blocker-severity" issues:
GPT 5.2: Identified 13 out of 15 critical system-level errors (such as subtle race conditions and complex handler registration bugs).
Claude Opus 4.5: Identified only 5 out of 15 of these same high-level architectural errors, often missing the deeper root causes despite fixing the surface-level bugs.
Also GPT 5.2 scores significantly higher on a newer version of the SWE called SWE pro which is less python centric. 55.6% for 5.2 with 43.3 for Claude.
(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2#49593063) |
Date: January 15th, 2026 10:47 PM
Author: .,.,...,..,.,.,:,,:,.,.,:::,...,:,...:..:.,:.::,.
With "the best LLM" use case is everything.
(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2#49593074) |
Date: January 15th, 2026 10:55 PM Author: Patel Philippe
It changes on a weekly basis lately
Opus 4.5 was kicking everyone's ass since December to the point it became personally ranked #1 for the first time ever, but only for a few weeks. Anthropic must be selectively throttling max subs because some days it's noticeably weaker. On Sunday nights it is so much smarter and better at outputs and handling large contexts that it feels like a different AI
Gpt 5.2 needs more conscious prompting to extract the desired output and its responses are dense and less legible; but there is no question that it bitch slaps opus 4.5 on more complicated reasoning tasks as of THIS WEEK
Next week might be a different story
(http://www.autoadmit.com/thread.php?thread_id=5821130&forum_id=2#49593089) |
|
|