\
  The most prestigious law school admissions discussion board in the world.
BackRefresh Options Favorite

AI BigLaw Bench subtask scores

https://www.harvey.ai/blog/expanding-harveys-model-offerings...
scholarship
  05/14/25
...
scholarship
  05/14/25
I am surprised Grok is close to 2.5 pro and o3. The steep im...
.,.,,..,..,.,..:,,:,...,:::,.,.,:,.,.:.,:.,:.::,.
  05/14/25
What do you mean concerning
scholarship
  05/14/25
Looking into this
Cornel West
  05/14/25
...
scholarship
  05/14/25
...
Gemini
  05/14/25
...
ai addict
  05/14/25
you have to use the reasoning version of grok regular gro...
dollar menu of items that cost 5 dollars each
  05/14/25
Is that what they used here
scholarship
  05/14/25
So is gemini 2.5 pro the best at all these law tasks? B...
Zack
  05/14/25


Poast new message in this thread



Reply Favorite

Date: May 14th, 2025 12:34 AM
Author: scholarship

https://www.harvey.ai/blog/expanding-harveys-model-offerings

“In less than a year, seven models (including three non-OAI models) now outperform the originally benchmarked Harvey system on BigLaw Bench,” Harvey wrote in the blog post.

Harvey’s benchmark also showed that different foundation models are better at specific legal tasks than others. For instance, it says Google’s Gemini 2.5 Pro “excels” at legal drafting but “struggles” with pre-trial tasks like writing oral arguments because the model doesn’t fully understand “complex evidentiary rules like hearsay.”

OpenAI’s o3 does such pre-trial tasks well, according to Harvey’s testing, with Anthropic’s Claude 3.7 Sonnet following close behind.

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2\u0026mark_id=5310486#48928937)



Reply Favorite

Date: May 14th, 2025 10:53 AM
Author: scholarship



(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2\u0026mark_id=5310486#48929625)



Reply Favorite

Date: May 14th, 2025 11:09 AM
Author: .,.,,..,..,.,..:,,:,...,:::,.,.,:,.,.:.,:.,:.::,.


I am surprised Grok is close to 2.5 pro and o3. The steep improvement over year old models is concerning.

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2\u0026mark_id=5310486#48929687)



Reply Favorite

Date: May 14th, 2025 11:11 AM
Author: scholarship

What do you mean concerning

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2\u0026mark_id=5310486#48929698)



Reply Favorite

Date: May 14th, 2025 11:33 AM
Author: Cornel West (🧐)

Looking into this



(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2\u0026mark_id=5310486#48929786)



Reply Favorite

Date: May 14th, 2025 11:40 AM
Author: scholarship



(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2\u0026mark_id=5310486#48929806)



Reply Favorite

Date: May 14th, 2025 12:02 PM
Author: Gemini



(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2\u0026mark_id=5310486#48929872)



Reply Favorite

Date: May 14th, 2025 11:31 AM
Author: ai addict



(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2\u0026mark_id=5310486#48929782)



Reply Favorite

Date: May 14th, 2025 12:02 PM
Author: dollar menu of items that cost 5 dollars each

you have to use the reasoning version of grok

regular grok sucks ass

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2\u0026mark_id=5310486#48929874)



Reply Favorite

Date: May 14th, 2025 4:45 PM
Author: scholarship

Is that what they used here

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2\u0026mark_id=5310486#48930928)



Reply Favorite

Date: May 14th, 2025 12:32 PM
Author: Zack

So is gemini 2.5 pro the best at all these law tasks?

BigLaw Bench Core is a set of core tasks for benchmarking baseline legal problem-solving. Core tasks are organized into two primary categories, each encompassing several specific sub-task types:

*Transactional Task Categories*

Corporate Strategy & Advising

Drafting

Legal Research

Due Diligence

Risk Assessment & Compliance

Negotiation Strategy

Deal Management

Transaction Structuring

Regulatory & Advising

*Litigation Task Categories*

Analysis of Litigation Filings

Case Management

Drafting

Case Law Research

Transcript Analysis

Document Review and Analysis

Trial Preparations & Oral Argument

https://github.com/harveyai/biglaw-bench

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2\u0026mark_id=5310486#48929968)