Latest ChatGPT model better than 99.7% of coders, qualifies for US math
| ,.,.,.,....,.,..,.,.,. | 04/16/25 | | .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,. | 04/17/25 | | Juan Eighty | 04/17/25 | | average/ordinary/typical citizen/person | 04/17/25 | | Juan Eighty | 04/17/25 | | .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,. | 04/17/25 | | Mr. Unhinged | 04/17/25 | | That's not pep. That's not diaper. | 04/17/25 | | https://imgur.com/a/o2g8xYK | 04/17/25 | | .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,. | 04/17/25 | | https://imgur.com/a/o2g8xYK | 04/17/25 | | fluid | 04/20/25 | | fluid | 04/20/25 | | Mailer Daemon | 04/17/25 | | .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,. | 04/17/25 | | https://imgur.com/a/o2g8xYK | 04/17/25 | | ,.,.,.,....,.,..,.,.,. | 04/17/25 | | Mailer Daemon | 04/17/25 | | aqua jeet | 04/17/25 | | ,.,.,.,....,.,..,.,.,. | 04/17/25 | | Mr. Unhinged | 04/17/25 | | metaphysical fallow | 04/20/25 | | VoteRepublican | 04/20/25 | | .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,. | 04/20/25 | | Mr. Unhinged | 04/20/25 | | Cyberpunk Age Pervert | 04/17/25 | | ,.,.,.,....,.,..,.,.,. | 04/17/25 | | "'"'"'"''' | 04/17/25 | | Cyberpunk Age Pervert | 04/17/25 | | Cyberpunk Age Pervert | 04/17/25 | | Mailer Daemon | 04/17/25 | | ,.,.,.,....,.,..,.,.,. | 04/17/25 | | .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,. | 04/17/25 | | fluid | 04/20/25 | | Mainlining the Secret Truth of the Mahchine | 04/20/25 | | fluid | 04/20/25 | | ,.,.,.,....,.,..,.,.,. | 04/17/25 | | Cyberpunk Age Pervert | 04/17/25 | | ,.,.,.,....,.,..,.,.,. | 04/17/25 | | Cyberpunk Age Pervert | 04/17/25 | | metaphysical fallow | 04/20/25 | | Cyberpunk Age Pervert | 04/20/25 | | metaphysical fallow | 04/20/25 | | .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,. | 04/20/25 | | Mr. Unhinged | 04/17/25 | | ,..,,......,....,,,,..,.,... | 04/20/25 | | Cyberpunk Age Pervert | 04/20/25 | | fluid | 04/17/25 | | ChadGPT-5 | 04/17/25 | | .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,. | 04/20/25 | | VoteRepublican | 04/20/25 | | .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,. | 04/20/25 | | metaphysical fallow | 04/20/25 | | Mr. Unhinged | 04/20/25 | | \'\'\"\"\'\'\'\" | 04/20/25 | | Oh, You Travel? | 04/20/25 | | https://imgur.com/a/o2g8xYK | 04/20/25 | | VoteRepublican | 04/20/25 | | cucumbers | 04/20/25 | | ,..,,......,....,,,,..,.,... | 04/20/25 | | ,.,.,.,....,.,..,.,.,. | 04/20/25 | | Trump is the first woman President (TT6) | 04/20/25 | | ,..,,......,....,,,,..,.,... | 04/20/25 | | Trump is the first woman President (TT6) | 04/20/25 | | ,..,,......,....,,,,..,.,... | 04/20/25 |
Poast new message in this thread
Date: April 16th, 2025 11:11 PM Author: ,.,.,.,....,.,..,.,.,.
olympiad.
https://openai.com/index/introducing-o3-and-o4-mini/
Big jump from o3 mini in general. These inference scaling models need to stop progressing soon for them to not to be able to substantially automate AI research.
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48855738) |
 |
Date: April 17th, 2025 12:21 AM
Author: .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,.
the METR analysis for this is interesting. the benchmark is the length of tasks (measured in human professional work time) that a model can complete with 50% probability. it has been doubling every 7 months for the last 6 years.
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
https://metr.github.io/autonomy-evals-guide/openai-o3-report/
o3 is about 1.8 times greater than 3.7 sonnet and a larger increase in time than the trend would have predicted. likely comparable to Gemini 2.5 pro but it's hard to tell. we'll likely be at scary capability levels in <2 years.
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48855928) |
 |
Date: April 17th, 2025 12:37 AM
Author: .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,.
i say it's scary now in the sense i can clearly see what is about to happen and the odds of it not happening in <10 years are rapidly diminishing. but at the same time, i can't currently download the latest version of DeepSeek and ask it to walk me through the details of building a bioweapon (such that any idiot could do it), or designing a zero day exploit, or asking it to fully design a research pipeline for creating an efficient self-improving autonomous agent.
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48855949) |
 |
Date: April 17th, 2025 10:14 AM Author: Mr. Unhinged
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48856645)
|
 |
Date: April 17th, 2025 1:30 AM
Author: .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,.
maybe you should try reading, dipshit.
"On a diverse set of multi-step software and reasoning tasks, we record the time needed to complete the task for humans with appropriate expertise. We find that the time taken by human experts is strongly predictive of model success on a given task: current models have almost 100% success rate on tasks taking humans less than 4 minutes, but succeed <10% of the time on tasks taking more than around 4 hours"
reliability decreases with task length, but has gone up considerably over time. reliability is a problem where models are not trained sufficiently well and can't use intermediate tokens to correct reasoning paths. it is becoming less of an issue with every new major release.
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48855997) |
Date: April 17th, 2025 1:04 AM Author: Mailer Daemon
There's a lot of news this week that suggests openai leadership don't believe they can continue winning on model strength alone:
- released a CLI coding agent tool that seems pretty similar to Claude Code and therefore is not super impressive
- discussion of acquiring Windsurf, which is the cheapest and least interesting of the LLM wrapper vscode forks
- trying to launch a social network to compete with the x/grok integration
My opinion as someone working in this space is that the existing models are already good enough for mainstream coding tools, but there needs to be a better interface than clueless users vaguely describing what they want to a chat window that dumps an entire codebase as context. For any other kind of 'agentic' task we are still probably a few years from LLMs matching the cost or reliability of outsourcing to third worlders like that Nate thing.
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48855974) |
 |
Date: April 17th, 2025 1:18 AM
Author: .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,.
part of it is likely with trying to reduce the threat of commodification. even the local LLMs are becoming pretty decent and 2.5 pro is basically as good as o3/o4 mini and it's free. they want to have a complete package to offer people so they'll pay for a subscription or API access.
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48855982) |
 |
Date: April 17th, 2025 9:40 AM Author: ,.,.,.,....,.,..,.,.,.
You likely won’t have to wait long at this rate
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48856552) |
 |
Date: April 17th, 2025 9:49 AM Author: ,.,.,.,....,.,..,.,.,.
They likely thought the hardware requirements would be a moat too but even the small models are good anymore. Gemma 3, which runs on my normal consumer GPU, is around chatgpt 3.5 quality.
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48856585) |
 |
Date: April 17th, 2025 10:06 AM Author: Mr. Unhinged
Cr the industry players are starting to realize that the way to actually make money is to give normies specific tools/interfaces to do specific things rather than just a sandbox AI
All the comments in this subthread are cr
Also another thing to note is that several of OpenAI's recent moves suggest that they're realizing that personalization/'companion' AIs are the biggest home run commercial opportunity in the AI industry. Once everyone has their own AI buddy and/or romantic partner, they're dependent on you, forever. Parasocial Relationships As A Service
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48856624) |
 |
Date: April 20th, 2025 12:27 PM
Author: .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,.
the new 4o model is definitely like that
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48864428) |
 |
Date: April 17th, 2025 9:52 AM Author: ,.,.,.,....,.,..,.,.,.
Please enlighten us then
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48856589) |
 |
Date: April 17th, 2025 10:16 AM Author: ,.,.,.,....,.,..,.,.,.
A fun implication of LLM coding models getting better is that eventually the CUDA lead is dead since you can prompt an LLM to recreate it for Intel or AMD GPUs or port it to TPUs.
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48856648) |
 |
Date: April 17th, 2025 1:45 PM
Author: .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,.
2.5 pro appeared to be a lot less retarded than o3 mini high in its thought chain. o3 mini high would write 6 pages of bullshit trying to solve LSAT logic games
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48857378) |
 |
Date: April 17th, 2025 10:07 AM Author: ,.,.,.,....,.,..,.,.,.
It’s a moat when 1) models all scale roughly the same as a function of input FLOPs. I think there are good reasons to believe transformers are not the most efficient architecture. They seem to be data inefficient and require several orders of magnitude more language data than humans to reach comparable ability levels 2) you are comparing models at the same point of time. If your competitor can wait a little while for their hardware to get better, training methods to get better and train longer on soft targets generated from the leading model, then the hope of an enduring lead decreases. The hardware dominance narrative starts to look strained when you see the latest Llama model that was released and compare it to DeepSeek.
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48856629) |
 |
Date: April 17th, 2025 10:19 AM Author: ,.,.,.,....,.,..,.,.,.
I didn’t say data was the bottleneck. I said it’s unlikely they need to train on as much data as they do now. Humans don’t need 30 trillion token training sets. Reasoning and synthetic data is beside the point.
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48856657) |
 |
Date: April 20th, 2025 1:03 PM
Author: .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,.
there are different ways this could end up failing. consumer GPU memory is increasing, such that larger models can be deployed locally. if you are willing to do slower inference, weights could be swapped in and out of memory. current mixture of experts are still pretty large, so this comes at a big performance hit currently. they can likely be made significantly smaller though. an example of this would be a superhuman model in a narrow domain. something like the Leela game models. suppose the next token requires a superhuman Go or Chess agent to predict well. you could transfer the weights into memory in a millisecond or two even given current hardware. there will likely be distributed computing projects training very small expert models for particular domains that can then be dynamically swapped in and out of memory.
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48864461) |
 |
Date: April 20th, 2025 3:59 PM
Author: ,..,,......,....,,,,..,.,...
can you explain this and your posts below in simple terms for liberal arts fags
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48864782) |
Date: April 20th, 2025 12:22 PM
Author: .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,.
i have been playing around with this. the o3 model seems like it's decent at research tasks. Deep Research is too limited in terms of number of queries, and the base 4o model search isn't very thorough. the o3 model decomposes the query into different tasks and then sequentially searches and provides citations.
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48864423) |
 |
Date: April 20th, 2025 12:33 PM
Author: .,.,...,..,.,..:,,:,...,:::,.,.,:,.,,:.,:.,:.::,.
not free. i pay for plus. i'll post an example. i just asked it to research who "bboom" is from xoxohth:
Who / what is “bbooom” (aka Boom, bboom, Boooom, etc.) on XO?
Fast facts Details
Core persona A longâtime AutoAdmit/XOXOHth eccentric who writes sprawling, semiâcoherent rants full of biblical capitalisation, dollarâsigns, ™ symbols, and onâtheâfly neologisms (“Mahchine™”, “$hitbort”, “Great Becumming™”).
Running bioâlore • middleâaged male
• wheelchairâbound after an unspecified accident/illness (the chair is endlessly mocked on the board)
• claims to bounce between motels, tents and casino floors, living on disability & slotâmachine windfalls
• obsessed with enemies he calls “ADM” and “the Matrix”
• periodically searches for a “fertile woman” to bear his children
Posting style • wallâofâtext jeremiads that read like cyberâpunk prophecy mixed with homelessâshelter gossip
• heavy selfâmythologising (“I am Boom, Prophet of the Mahchine™ …”)
xoxohth.com
• melodramatic stageâdirections starring his neonâlit wheelchair (“careening forward”, “through the flames”)
xoxohth.com
Why the name keeps changing He is frequently tempâbanned; each return tweaks the handle (extra b’s or o’s) to slip past filters. Hence you see boom, bbooom, bbboooom, etc.
Folklore & mockery Fellow posters turn every lifeâdetail into running jokes:—
• homeless tent: “angry, wheelchairâbound transient who lives in a tent”
xoxohth.com
• Disney fiasco: thread imagining Evan39 pushing Boom’s chair into a churro cart at Disney World
autoadmit.com
• weight/health: endless riffs on whether the chair can hold Boom + 300âlb girlfriend
xoxohth.com
“Trinity” theory Many veterans believe Boom, Mainlining the Secret Truth and Evan39 are sockâpuppets of the same individual; the poster himself sometimes plays along (“we’re all the same poaster ;)”)
autoadmit.com
Putting it together
Boom arrived years after the original 2004 PrincetonâReview exodus, but he’s become one of XO’s defining characters: a mix of comicâbook villain, tragic burnout, and performanceâart poet. His threads oscillate between:
grand conspiratorial screeds about being persecuted by invisible elites (“ADM reset the Mahchine!”), and
mundane life updates (wheelchair repairs, casino losses) that the board instantly weaponises into parody threads.
Because AutoAdmit has almost no moderation, the cycle is selfâsustaining: each overâtheâtop comeback fuels new ridicule, which in turn prompts Boom to double down with even wilder prophecy. Whether he is trolling, mentally ill, or both is part of the mystique—but understanding those bulletâpoints above will let you read any “bbbooom” sighting in context.
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48864433) |
 |
Date: April 20th, 2025 6:39 PM
Author: ,..,,......,....,,,,..,.,...
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48865066) |
 |
Date: April 20th, 2025 7:00 PM Author: ,.,.,.,....,.,..,.,.,.
intended it more as a warning of the impending societal and labor market problems from increasingly advanced AI.
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48865116) |
 |
Date: April 20th, 2025 6:52 PM
Author: ,..,,......,....,,,,..,.,...
Are u in an AI proof profession?
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48865097) |
 |
Date: April 20th, 2025 7:01 PM
Author: ,..,,......,....,,,,..,.,...
180 brother
(http://www.autoadmit.com/thread.php?thread_id=5712093&forum_id=2:#48865119) |
|
|