Two studies show AI benchmarks vastly overstate AI abilities
| LathamTouchedMe | 03/16/26 | | Post nut horror | 03/16/26 | | ....;..;...;;;.....;;......;; | 03/16/26 | | LathamTouchedMe | 03/16/26 | | ,.,,.,.,,,,,,..................... | 03/16/26 | | LathamTouchedMe | 03/16/26 | | ,.,,.,.,,,,,,..................... | 03/16/26 | | cardinal swan | 03/16/26 | | computer_smasher420 | 03/16/26 |
Poast new message in this thread
Date: March 16th, 2026 6:08 PM Author: LathamTouchedMe
No doubt AI is groundbreaking. But maybe a little grounding is in order.
Carnegie Mellon study. AI benchmarks so narrowly defined that they only represent 7.6% of all occupational tasks. Benchmarks are disconnected from high-value labor tasks.
https://x.com/rohanpaul_ai/status/2033450821850222811?s=46
Alibaba study. Tested code over course of 8 months. Vast majority broke down over time despite initially passing quality.
(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2...#49749191) |
Date: March 16th, 2026 6:11 PM
Author: ....;..;...;;;.....;;......;;
AI is going to be regarded as a joke pretty soon.
It basically has the same value as Excel
(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2...#49749196) |
 |
Date: March 16th, 2026 11:20 PM
Author: ,.,,.,.,,,,,,.....................
A joke that spit out the results of a legal research test I gave it in 30 seconds that was much better than anything I'd get from a junior associate after days of research.
(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2...#49749973) |
 |
Date: March 16th, 2026 11:30 PM
Author: ,.,,.,.,,,,,,.....................
Latest pay version ChatGPT, forget what it's called.
(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2...#49749997) |
Date: March 16th, 2026 11:21 PM Author: computer_smasher420
all the models are trained to game the benchmark tests
they're completely meaningless
(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2...#49749980) |
|
|