\
  The most prestigious law school admissions discussion board in the world.
BackRefresh Options Favorite

7/10/25 AI thread

grok 4 released yesterday, instructions for how to jailbreak...
Carmine swollen fortuitous meteor headpube
  07/10/25
i would like to see Grok 4 evaluated on a broader set of ben...
diverse rigpig french chef
  07/10/25
benchmarks are such BS man they're just gaming them
Carmine swollen fortuitous meteor headpube
  07/10/25
https://x.com/nostalgebraist/status/1943341324406788562
Carmine swollen fortuitous meteor headpube
  07/10/25
I am reminded of the debates about the meaningfulness of IQ ...
diverse rigpig french chef
  07/10/25
...
Carmine swollen fortuitous meteor headpube
  07/10/25
https://x.com/vincentweisser/status/1943427747717722490 a...
Carmine swollen fortuitous meteor headpube
  07/10/25
you do this round of coping every time a new model comes out...
obsidian tanning salon striped hyena
  07/10/25
you have to subscribe to "X premium" to use grok 4...
Carmine swollen fortuitous meteor headpube
  07/10/25
You mean even the Blue Check doesn't get at least limited ac...
obsidian tanning salon striped hyena
  07/10/25
yeah i pay for the basic twitter that's like 8 bucks a month...
Carmine swollen fortuitous meteor headpube
  07/10/25
Can you buy like $5 in API credits?
Fantasy-prone ultramarine school cafeteria famous landscape painting
  07/10/25
it doesn't appear that you can, you have to log in via your ...
Carmine swollen fortuitous meteor headpube
  07/10/25
lame. weird how quickly xAI became a leader in this space
Fantasy-prone ultramarine school cafeteria famous landscape painting
  07/10/25
Wait a month and someone else will release a better model
Fantasy-prone ultramarine school cafeteria famous landscape painting
  07/10/25
https://x.com/ramez/status/1943431212766294413 lmao the n...
Carmine swollen fortuitous meteor headpube
  07/10/25
You have to give him credit for going so blatantly full Mask...
obsidian tanning salon striped hyena
  07/10/25
Cr what's the point of becoming the richest person in the wo...
Carmine swollen fortuitous meteor headpube
  07/10/25
...
diverse rigpig french chef
  07/10/25


Poast new message in this thread



Reply Favorite

Date: July 10th, 2025 11:26 AM
Author: Carmine swollen fortuitous meteor headpube

grok 4 released yesterday, instructions for how to jailbreak it:

https://x.com/elder_plinius/status/1943183455430279231

cool 3-D visualization of how transformers and LLMs work:

https://x.com/tetsuoai/status/1942356417396236609

new anthropic research about how and why some models rebel/fake alignment. they tried a little harder to make the scenarios more realistic this time

https://x.com/repligate/status/1942761803735523587

new version of chrome will have a mini-gemini built in:

https://x.com/swyx/status/1942437525525790838

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49089872)



Reply Favorite

Date: July 10th, 2025 11:32 AM
Author: diverse rigpig french chef

i would like to see Grok 4 evaluated on a broader set of benchmarks, but the preliminary numbers seem to strongly imply that LLM progress is not stalling. the Frontiermath score will be interesting.

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49089895)



Reply Favorite

Date: July 10th, 2025 11:35 AM
Author: Carmine swollen fortuitous meteor headpube

benchmarks are such BS man

they're just gaming them

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49089898)



Reply Favorite

Date: July 10th, 2025 12:17 PM
Author: Carmine swollen fortuitous meteor headpube

https://x.com/nostalgebraist/status/1943341324406788562

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49090011)



Reply Favorite

Date: July 10th, 2025 12:30 PM
Author: diverse rigpig french chef

I am reminded of the debates about the meaningfulness of IQ and the existence of the g factor. It sounds intuitively reasonable that they are just fitting to benchmarks just like it’s reasonable to think IQ is only what IQ tests measure. But then people would create alternative measures of intelligence and the first principal component vector would be identical to the one in other intelligence measures, and the g factor would explain a large percentage of the subtest variance. Similarly when people create new LLM benchmarks the rank orderings for LLMs is highly similar to other benchmarks. There are some caveats like Claude models being especially good at coding relative to other measures, but it’s generally true

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49090044)



Reply Favorite

Date: July 10th, 2025 7:15 PM
Author: Carmine swollen fortuitous meteor headpube



(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49091281)



Reply Favorite

Date: July 10th, 2025 6:01 PM
Author: Carmine swollen fortuitous meteor headpube

https://x.com/vincentweisser/status/1943427747717722490

apparently grok 4 leaned heavily into RL for their reported performance gains

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49090910)



Reply Favorite

Date: July 10th, 2025 6:07 PM
Author: obsidian tanning salon striped hyena

you do this round of coping every time a new model comes out

https://x.com/TimSweeneyEpic/status/1943398745762116029

https://x.com/arcprize/status/1943168950763950555

etc

by all accounts it seems to be a leading model. maybe not as practically good for all-around stuff as o3 (which has top-tier tool calling) or as good as sonnet/opus 4 for coding (which anthropic is increasingly specializing in), but nonetheless pretty much all feedback involving actual use seems positive

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49090931)



Reply Favorite

Date: July 10th, 2025 6:09 PM
Author: Carmine swollen fortuitous meteor headpube

you have to subscribe to "X premium" to use grok 4 so i can't test it

very lame

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49090936)



Reply Favorite

Date: July 10th, 2025 6:14 PM
Author: obsidian tanning salon striped hyena

You mean even the Blue Check doesn't get at least limited access? that's pretty gay

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49090955)



Reply Favorite

Date: July 10th, 2025 6:15 PM
Author: Carmine swollen fortuitous meteor headpube

yeah i pay for the basic twitter that's like 8 bucks a month so i can write stuff past the character limit

but you apparently have to pay for at least the $40 a month tier to get grok 4 and ljl at paying for that shit

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49090959)



Reply Favorite

Date: July 10th, 2025 6:23 PM
Author: Fantasy-prone ultramarine school cafeteria famous landscape painting

Can you buy like $5 in API credits?

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49090998)



Reply Favorite

Date: July 10th, 2025 6:28 PM
Author: Carmine swollen fortuitous meteor headpube

it doesn't appear that you can, you have to log in via your twitter account and then if you have premium+ it gives you grok 4 access instead of 3

i'm curious and wanted to test it vs. o3 but like you said there will be another better model out soon enough. would be kind of nice to have unlimited access to a top of the line model along with a twitter boost though. the Supremely Online Screen Experience

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49091012)



Reply Favorite

Date: July 10th, 2025 6:31 PM
Author: Fantasy-prone ultramarine school cafeteria famous landscape painting

lame. weird how quickly xAI became a leader in this space

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49091035)



Reply Favorite

Date: July 10th, 2025 6:21 PM
Author: Fantasy-prone ultramarine school cafeteria famous landscape painting

Wait a month and someone else will release a better model

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49090990)



Reply Favorite

Date: July 10th, 2025 6:43 PM
Author: Carmine swollen fortuitous meteor headpube

https://x.com/ramez/status/1943431212766294413

lmao the new grok 4 just straight up looks up what elon musk's personal beliefs are and then incorporates them into its answer and shows it in its chain of thought

that's hilarious and honestly kind of based

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49091100)



Reply Favorite

Date: July 10th, 2025 7:11 PM
Author: obsidian tanning salon striped hyena

You have to give him credit for going so blatantly full Mask Off. This is in a manner of speaking him and Thiel testing the waters for laying the ground work for their retarded Libertarian Mars State or whatever--if people don't care about or notice shit like this, the sky's the limit for just creating a Corpo RandState.

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49091262)



Reply Favorite

Date: July 10th, 2025 7:49 PM
Author: Carmine swollen fortuitous meteor headpube

Cr what's the point of becoming the richest person in the world if you're not going to push for the most powerful AI possible and then program it to align with your own beliefs and goals

(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49091395)



Reply Favorite

Date: July 10th, 2025 7:46 PM
Author: diverse rigpig french chef



(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id.#49091377)