\
  The most prestigious law school admissions discussion board in the world.
BackRefresh Options Favorite

Two studies show AI benchmarks vastly overstate AI abilities

No doubt AI is groundbreaking. But maybe a little grounding ...
Tantric ruby stead
  03/16/26
Surely it will stay this way.
peach crackhouse ratface
  03/16/26
AI is going to be regarded as a joke pretty soon. It basi...
Tan corner mood
  03/16/26
when do you think that moment will come?
Tantric ruby stead
  03/16/26
A joke that spit out the results of a legal research test I ...
useless mildly autistic point
  03/16/26
what were you using? I use protege from LexisNexis. Sometime...
Tantric ruby stead
  03/16/26
Latest pay version ChatGPT, forget what it's called.
useless mildly autistic point
  03/16/26
Lmao if you’re using that Lexi’s or westlaw buil...
dead cracking legend
  03/16/26
(Liberal)
laughsome wagecucks
  03/17/26
This just isn't true, though. You'd fire an associate that ...
Jade heaven
  03/17/26
A lawyer certainly shouldn't use AI as the final draft in an...
useless mildly autistic point
  03/17/26
Yeah I obviously use it daily; it's an invaluable tool. We'...
Jade heaven
  03/17/26
"and it doesn't compete with midlevels even on pure-law...
blathering big cumskin internal respiration
  03/17/26
Tamagotchis from 1997 were smarter than sealclubber.
Jade heaven
  03/17/26
...
Razzmatazz swollen codepig
  03/17/26
all the models are trained to game the benchmark tests th...
offensive business firm jewess
  03/16/26
i asked AI to build a mobile app and it did. that's pretty i...
floppy kitty cat
  03/16/26
One of the major reasons why labs are prioritizing coding/sw...
lake curious indian lodge background story
  03/17/26
There's also another reason why they focus on coding ability...
offensive business firm jewess
  03/17/26
A secret international cabal trying to make one of the few r...
blathering big cumskin internal respiration
  03/17/26
Because it's a digital language machine that has no memory o...
offensive business firm jewess
  03/17/26
Bc they are Engineers and just see everything as Engineering...
laughsome wagecucks
  03/17/26


Poast new message in this thread



Reply Favorite

Date: March 16th, 2026 6:08 PM
Author: Tantric ruby stead

No doubt AI is groundbreaking. But maybe a little grounding is in order.

Carnegie Mellon study. AI benchmarks so narrowly defined that they only represent 7.6% of all occupational tasks. Benchmarks are disconnected from high-value labor tasks.

https://x.com/rohanpaul_ai/status/2033450821850222811?s=46

Alibaba study. Tested code over course of 8 months. Vast majority broke down over time despite initially passing quality.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49749191)



Reply Favorite

Date: March 16th, 2026 11:23 PM
Author: peach crackhouse ratface

Surely it will stay this way.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49749985)



Reply Favorite

Date: March 16th, 2026 6:11 PM
Author: Tan corner mood

AI is going to be regarded as a joke pretty soon.

It basically has the same value as Excel

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49749196)



Reply Favorite

Date: March 16th, 2026 11:17 PM
Author: Tantric ruby stead

when do you think that moment will come?

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49749968)



Reply Favorite

Date: March 16th, 2026 11:20 PM
Author: useless mildly autistic point

A joke that spit out the results of a legal research test I gave it in 30 seconds that was much better than anything I'd get from a junior associate after days of research.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49749973)



Reply Favorite

Date: March 16th, 2026 11:26 PM
Author: Tantric ruby stead

what were you using? I use protege from LexisNexis. Sometimes it's very solid and other times not so much. I wouldn't say it's anywhere near as game changing as AI has been for programmers.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49749989)



Reply Favorite

Date: March 16th, 2026 11:30 PM
Author: useless mildly autistic point

Latest pay version ChatGPT, forget what it's called.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49749997)



Reply Favorite

Date: March 16th, 2026 11:36 PM
Author: dead cracking legend

Lmao if you’re using that Lexi’s or westlaw built in AI bullshit. ChatGPT can dominate that

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49750004)



Reply Favorite

Date: March 17th, 2026 10:57 AM
Author: laughsome wagecucks

(Liberal)

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49750662)



Reply Favorite

Date: March 17th, 2026 12:29 AM
Author: Jade heaven

This just isn't true, though. You'd fire an associate that gave you the equivalent of a hallucination on 2 occasions (assuming one prior discovery and warning). If the circumstances were unlucky for the associate, you might fire without warning. Whatever's going on with this -- it was asserted to me in 2024 or so that this was a trivially easy thing to fix, and that is obviously just not the case -- yes it gets blown out of proportion sometimes ("haha AI is useless/worthless"), but it is a huge deal practically.

Hallucinations aside, you sometimes just get point-missing or wrong analyses. This is something you also sometimes see from flesh-and-blood associates (particularly summer associates, which I no-joke stopped hiring because of AI), but it's not good.

The reality is that AI is currently a very real current competitor of SAs on legal issues. The integration isn't there for facts yet, and it doesn't compete with midlevels even on pure-law yet. Now who knows what the future holds....

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49750100)



Reply Favorite

Date: March 17th, 2026 10:25 AM
Author: useless mildly autistic point

A lawyer certainly shouldn't use AI as the final draft in an area of the law he doesn't know about, and of course you should double-check its work. It can be hit and miss. But the hits are a lot more common than the misses after the ChatGPT paid version release last month. In my research, it actually said at times "I know this isn't exactly the kind of case you were looking for, but ..." and didn't hallucinate at all.

The work product it gave me was simply way too good to not at least allow it to try to take a crack at any legal project within its areas of competence. What do you have to lose? It's freaking 20 bucks a month and spits out great work in 30 seconds.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49750574)



Reply Favorite

Date: March 17th, 2026 11:57 AM
Author: Jade heaven

Yeah I obviously use it daily; it's an invaluable tool. We're talking about the incredibly high bar of meaningful labor replacement.

On that front, I have way, way more doubts about integration than I do about raw intelligence. Why the fuck will it not take my site logins and do database scrapes?

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49750873)



Reply Favorite

Date: March 17th, 2026 10:30 AM
Author: blathering big cumskin internal respiration

"and it doesn't compete with midlevels even on pure-law yet."

Two years ago it couldn't compete with a reasonably smart college student with access to a library. Four years ago it was dumber than sealclubber.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49750590)



Reply Favorite

Date: March 17th, 2026 11:54 AM
Author: Jade heaven

Tamagotchis from 1997 were smarter than sealclubber.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49750863)



Reply Favorite

Date: March 17th, 2026 10:27 AM
Author: Razzmatazz swollen codepig



(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49750581)



Reply Favorite

Date: March 16th, 2026 11:21 PM
Author: offensive business firm jewess

all the models are trained to game the benchmark tests

they're completely meaningless

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49749980)



Reply Favorite

Date: March 16th, 2026 11:38 PM
Author: floppy kitty cat

i asked AI to build a mobile app and it did. that's pretty incredible imo. when it made mistakes it fixed them on its own.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49750013)



Reply Favorite

Date: March 17th, 2026 12:23 AM
Author: lake curious indian lodge background story

One of the major reasons why labs are prioritizing coding/swe (in addition to being a relatively easy revenue source) is that they intend to use the models for AI research. Deep learning is almost entirely an empirical field with a thin amount of theoretical justification for architecture and training regimes, so the ability to rapidly test new systems is essential. If SWE agents can provide plausible architecture ideas and implement them (or test out a variety of ideas specified by human programmers), the model iteration loop becomes much faster. Not to mention the total training compute deployment in a few years will be orders of magnitude what it is currently, which will decrease large scale training run time substantially. The point isn’t that the models are getting substantially better on all tasks currently. it’s that they improving extremely rapidly on the tasks needed for model self improvement.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49750077)



Reply Favorite

Date: March 17th, 2026 10:45 AM
Author: offensive business firm jewess

There's also another reason why they focus on coding ability no bonus points for guessing that one too

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49750623)



Reply Favorite

Date: March 17th, 2026 10:48 AM
Author: blathering big cumskin internal respiration

A secret international cabal trying to make one of the few remaining things the US is legitimately better at than the rest of the world useless?

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49750630)



Reply Favorite

Date: March 17th, 2026 11:01 AM
Author: offensive business firm jewess

Because it's a digital language machine that has no memory of its past actions and no world-modeling capabilities and there are only so many things that this kind of "machine" can do. Coding is one of the only monetizable tasks in this category

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49750682)



Reply Favorite

Date: March 17th, 2026 10:56 AM
Author: laughsome wagecucks

Bc they are Engineers and just see everything as Engineering. When you’ve got a hammer, the world looks like nails.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2\u0026mark_id=3986969#49750656)