Two studies show AI benchmarks vastly overstate AI abilities
| Marvelous Lettuce | 03/16/26 | | salmon racy quadroon | 03/16/26 | | sepia arrogant candlestick maker love of her life | 03/16/26 | | Marvelous Lettuce | 03/16/26 | | yapping senate azn | 03/16/26 | | Marvelous Lettuce | 03/16/26 | | yapping senate azn | 03/16/26 | | Topaz bawdyhouse becky | 03/16/26 | | insanely creepy magenta codepig | 03/17/26 | | bull headed shrine toaster | 03/17/26 | | yapping senate azn | 03/17/26 | | bull headed shrine toaster | 03/17/26 | | Galvanic Orchid Locale Jap | 03/17/26 | | bull headed shrine toaster | 03/17/26 | | Splenetic dilemma | 03/17/26 | | Aromatic Casino | 03/16/26 | | red histrionic state ceo | 03/16/26 | | spruce idiotic gas station laser beams | 03/17/26 | | Aromatic Casino | 03/17/26 | | Galvanic Orchid Locale Jap | 03/17/26 | | Aromatic Casino | 03/17/26 | | insanely creepy magenta codepig | 03/17/26 |
Poast new message in this thread
Date: March 16th, 2026 6:08 PM Author: Marvelous Lettuce
No doubt AI is groundbreaking. But maybe a little grounding is in order.
Carnegie Mellon study. AI benchmarks so narrowly defined that they only represent 7.6% of all occupational tasks. Benchmarks are disconnected from high-value labor tasks.
https://x.com/rohanpaul_ai/status/2033450821850222811?s=46
Alibaba study. Tested code over course of 8 months. Vast majority broke down over time despite initially passing quality.
(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=5310529#49749191) |
Date: March 16th, 2026 6:11 PM Author: sepia arrogant candlestick maker love of her life
AI is going to be regarded as a joke pretty soon.
It basically has the same value as Excel
(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=5310529#49749196) |
 |
Date: March 17th, 2026 12:29 AM Author: bull headed shrine toaster
This just isn't true, though. You'd fire an associate that gave you the equivalent of a hallucination on 2 occasions (assuming one prior discovery and warning). If the circumstances were unlucky for the associate, you might fire without warning. Whatever's going on with this -- it was asserted to me in 2024 or so that this was a trivially easy thing to fix, and that is obviously just not the case -- yes it gets blown out of proportion sometimes ("haha AI is useless/worthless"), but it is a huge deal practically.
Hallucinations aside, you sometimes just get point-missing or wrong analyses. This is something you also sometimes see from flesh-and-blood associates (particularly summer associates, which I no-joke stopped hiring because of AI), but it's not good.
The reality is that AI is currently a very real current competitor of SAs on legal issues. The integration isn't there for facts yet, and it doesn't compete with midlevels even on pure-law yet. Now who knows what the future holds....
(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=5310529#49750100) |
Date: March 16th, 2026 11:21 PM Author: Aromatic Casino
all the models are trained to game the benchmark tests
they're completely meaningless
(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=5310529#49749980) |
|
|