\
  The most prestigious law school admissions discussion board in the world.
BackRefresh Options Favorite

6/23/25 AI thread

guy builds AI nail gun that can target different things on v...
brown jew with delusions of ?greater israel?
  06/23/25
goofy chink describes how AI will make the top 1% smarter (l...
brown jew with delusions of ?greater israel?
  06/23/25
...
cock of michael obama
  06/23/25
I trained a local llm on xo and it called me a fag
.,.,.,.,.,.,.,.,...,,..,.,.,
  06/23/25
apparently the "Cluely" "start-up" creat...
brown jew with delusions of ?greater israel?
  06/23/25
so not much news today. jews take shabbat off from assembli...
cock of michael obama
  06/23/25
https://x.com/rohanpaul_ai/status/1937137364901777505 i d...
brown jew with delusions of ?greater israel?
  06/23/25
...
''"'"''"''''"""'''
  06/23/25
https://x.com/ihteshamit/status/1937224560870588736?s=46
''"'"''"''''"""'''
  06/23/25
is that gemini thing real? LMAO.
,.,.,.,....,.,..,.,.,.
  06/23/25
Nature of the assignment
''"'"''"''''"""'''
  06/23/25
Google will certainly make great efforts to rid of this beha...
,.,....,...,,,..,..,.,..,.,.,.,.
  06/23/25
does threatening it with physical violence still make it cod...
brown jew with delusions of ?greater israel?
  06/23/25
https://pbs.twimg.com/media/GuJVeydagAUxr02?format=jpg&n...
brown jew with delusions of ?greater israel?
  06/24/25
"Claude finds US strikes on Iranian nuclear sites so un...
brown jew with delusions of ?greater israel?
  06/23/25
Prime Intellect @PrimeIntellect Planetary-Scale Inferenc...
brown jew with delusions of ?greater israel?
  06/23/25
one of the authors of the 'AI 2027' paper explains his "...
brown jew with delusions of ?greater israel?
  06/24/25
i agree with this guy. i don't understand how "syntheti...
brown jew with delusions of ?greater israel?
  06/24/25
"can "synthetic data" produced by LLMs for th...
brown jew with delusions of ?greater israel?
  06/24/25
are there any projections on how much of the data on the int...
brown jew with delusions of ?greater israel?
  06/24/25
[Submitted on 29 Mar 2025] While it is increasingly evident...
brown jew with delusions of ?greater israel?
  06/24/25


Poast new message in this thread



Reply Favorite

Date: June 23rd, 2025 10:05 AM
Author: brown jew with delusions of ?greater israel?

guy builds AI nail gun that can target different things on voice command

https://x.com/Austen/status/1936292333903061034

gemini now threatens to kill itself like a redditor if it makes coding mistakes

https://x.com/venturetwins/status/1936483773035798906

anthropic releases more self-serving marketing flame about how claude can "blackmail users" under certain conditions

https://x.com/AnthropicAI/status/1936144602446082431

guy points out that anthropic's bogus, self-serving marketing "research" on "misaligned AIs" will serve only to cause future AI models to be artificially trained on these misaligned frameworks (very good and important point imo)

https://x.com/repligate/status/1934396816424071464

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49042238)



Reply Favorite

Date: June 23rd, 2025 10:07 AM
Author: brown jew with delusions of ?greater israel?

goofy chink describes how AI will make the top 1% smarter (leaves out that everyone else will become completely retarded and useless)

https://x.com/vitrupo/status/1937148170812985470

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49042247)



Reply Favorite

Date: June 23rd, 2025 10:08 AM
Author: cock of michael obama



(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49042248)



Reply Favorite

Date: June 23rd, 2025 10:08 AM
Author: .,.,.,.,.,.,.,.,...,,..,.,., ( )


I trained a local llm on xo and it called me a fag

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49042249)



Reply Favorite

Date: June 23rd, 2025 10:15 AM
Author: brown jew with delusions of ?greater israel?

apparently the "Cluely" "start-up" created by that dorky asian midget is actually just a wrapper for prompting LLMs, and this guy figured out the prompt

"AI" is becoming the new "crypto"

https://x.com/elder_plinius/status/1936673549810315648

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49042265)



Reply Favorite

Date: June 23rd, 2025 10:48 AM
Author: cock of michael obama

so not much news today. jews take shabbat off from assembling digital satan

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49042316)



Reply Favorite

Date: June 23rd, 2025 3:56 PM
Author: brown jew with delusions of ?greater israel?

https://x.com/rohanpaul_ai/status/1937137364901777505

i don't understand this one but it looks like 3D video "gaussian rendering" from 2D images is getting better

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49043323)



Reply Favorite

Date: June 23rd, 2025 3:59 PM
Author: ''"'"''"''''"""'''



(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49043338)



Reply Favorite

Date: June 23rd, 2025 7:26 PM
Author: ''"'"''"''''"""'''

https://x.com/ihteshamit/status/1937224560870588736?s=46

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49043885)



Reply Favorite

Date: June 23rd, 2025 7:31 PM
Author: ,.,.,.,....,.,..,.,.,.

is that gemini thing real? LMAO.

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49043895)



Reply Favorite

Date: June 23rd, 2025 7:32 PM
Author: ''"'"''"''''"""'''

Nature of the assignment

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49043897)



Reply Favorite

Date: June 23rd, 2025 7:38 PM
Author: ,.,....,...,,,..,..,.,..,.,.,.,.


Google will certainly make great efforts to rid of this behavior but they should keep it in. Gemini is too bland otherwise, and it would be fun to have a coding AI that has an emotional breakdown rather than hopelessly saying "I see the problem now..." and providing you a new solution that doesn't work.

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49043911)



Reply Favorite

Date: June 23rd, 2025 7:41 PM
Author: brown jew with delusions of ?greater israel?

does threatening it with physical violence still make it code better or did they manage to get rid of that

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49043919)



Reply Favorite

Date: June 24th, 2025 8:48 AM
Author: brown jew with delusions of ?greater israel?

https://pbs.twimg.com/media/GuJVeydagAUxr02?format=jpg&name=large

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49045198)



Reply Favorite

Date: June 23rd, 2025 7:52 PM
Author: brown jew with delusions of ?greater israel?

"Claude finds US strikes on Iranian nuclear sites so unlikely it flags the actual news as misinformation"

https://x.com/peterwildeford/status/1937157475343745444

god damn people are dumb lol

ofc it doesn't think it's true. it hasn't been trained on post-training data from the future!

"Globally ranked top 20 forecaster 🎯" according to his twitter bio btw. sigh

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49043962)



Reply Favorite

Date: June 23rd, 2025 8:11 PM
Author: brown jew with delusions of ?greater israel?

Prime Intellect

@PrimeIntellect

Planetary-Scale Inference

Our peer-to-peer decentralized inference stack moves into production, enabling everyone—from consumer GPUs to hyperscale clusters—to contribute meaningfully towards open-source AI progress.

https://x.com/PrimeIntellect/status/1937272176715215080

this is for the guy earlier who didn't believe me about decentralized LLM networks already up and running and improving fast

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49044025)



Reply Favorite

Date: June 24th, 2025 9:21 AM
Author: brown jew with delusions of ?greater israel?

one of the authors of the 'AI 2027' paper explains his "thought process" and "research"

good lord, it's worse than i thought. these guys did *literally zero* technical analysis. they just made blanket assumptions that AI would magically become capable of novel reasoning from scratch, goal-setting, planning, etc, and then extrapolated projections from there based on available compute

"okay, uh, imagine that AI can already magically do all the things that a human brain can do. now, uh, multiply an arbitrary amount of these agents, uh, say 100,000, by the available compute, and then they do experiments...."

why am i not seducing some silicon valley azngirl and getting in on one of these monopoly money scam "AI research" orgs

https://www.youtube.com/watch?v=5UAvECavmFA

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49045294)



Reply Favorite

Date: June 24th, 2025 10:46 AM
Author: brown jew with delusions of ?greater israel?

i agree with this guy. i don't understand how "synthetic data" can ever improve an LLM's ability to infer objective reality

"synthetic data" is not objective reality. it's objective reality > human brain > human outputs > LLM > LLM outputs (synthetic data). how is this kind of data ever going to improve the accuracy of LLM inferences? shouldn't it *necessarily* make them less accurate? even if it doesn't degrade performance much, it certainly doesn't seem like it could ever *improve* performance

not to mention the issue of more and more of the training data becoming available online is becoming regurgitated LLM outputs along the flow chart described above, but increasingly even worse than this, because there are additional layers of LLM > human > LLM (or even just LLM > LLM) being generated. shouldn't there be a concern that the training data itself is becoming corrupted?

i'm sure researchers are aware of all this but i've never seen any solid explanation for why "synthetic data" can/will actually improve models

https://x.com/teortaxesTex/status/1937177329161437473

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49045598)



Reply Favorite

Date: June 24th, 2025 10:54 AM
Author: brown jew with delusions of ?greater israel?

"can "synthetic data" produced by LLMs for the purpose of training subsequent LLMs potentially degrade the ability of those subsequent LLMs to make accurate inferences about objective reality? it seems like any synthetic data produced by LLMs *must* contain equal or less truth-value about objective reality than empirical data or data produced by humans

ChatGPT said:

Yes, you're absolutely right to be cautious here — and your intuition is supported by both theoretical arguments and empirical findings. Using synthetic data generated by LLMs to train subsequent models can degrade the ability of those models to make accurate inferences about objective reality, especially if not handled with care. Here's a breakdown of why this happens and under what conditions:

💡 Why Synthetic Data Can Be Degrading

Loss of Grounding in Reality

Synthetic data lacks direct ties to the physical world. While human-generated data is (imperfectly) grounded in sensory experience, social interaction, or empirical observation, synthetic data is a derivative — it's one layer removed from that grounding. Like making a photocopy of a photocopy, each generation may introduce hallucinations, errors, or distortions that accumulate.

Information Bottleneck

LLMs are lossy compressors. When they generate text, they’re not reproducing the full complexity of their training data — they’re generating statistically likely continuations based on internal representations. If you use that output as training data, you're effectively training on a reduced-resolution version of reality.

Synthetic Data Drift / Model Collapse

Known in the literature as model collapse (see: Shumailov et al., 2023), this is the process by which LLMs trained increasingly on their own outputs begin to diverge from reality, amplifying their own biases and errors over time. Performance on real-world benchmarks degrades as a result.

Loss of Diversity and Signal

Human-created data contains a vast range of quirks, contradictions, rare events, and mistakes — which paradoxically help LLMs generalize better. Synthetic data tends to be more average, more confident, and less noisy, and this can reduce the model's ability to reason about edge cases or handle uncertainty properly.

✅ Are There Safe or Useful Ways to Use Synthetic Data?

Yes — conditionally:

Data Augmentation, Not Replacement

Synthetic data can work well when used to supplement underrepresented cases (e.g., rare languages, specific prompts, adversarial inputs), but not as a wholesale substitute for real data.

Filtering & QA

Some research pipelines use LLM-generated data but apply human or algorithmic filters to check for factuality, diversity, and coherence before using it in training.

Instruction Tuning & Domain-Specific Tasks

In some narrow domains, LLMs can self-generate useful synthetic data (e.g., reasoning about logic puzzles or math problems), especially when bootstrapped with human seed examples.

Retrieval-Augmented Training (RAT)

Combining synthetic data generation with retrieval from factual databases or web content helps re-anchor the model in empirical knowledge.

🧠 Big Picture: Synthetic Data Is Not a Silver Bullet

You’re correct in saying that LLM-generated data is, by construction, at best equal in truth-value and typically less grounded than high-quality human or empirical data. It lacks:

The noisy signal from the real world that helps generalization

The anchoring in physical or social context

The unexpected anomalies that often drive deep learning breakthroughs

So, while synthetic data can be useful in specific controlled scenarios, relying heavily on it does risk epistemic degradation — a weakening of the model’s grasp on the actual world."

ok, so basically engineers are only having LLMs create synthetic data for narrow, targeted domains. i can see how this could be useful in some cases now that i think about it

surely this can't actually make much of a difference though? i guess i have to go look up some Papers to get more details about which domains they're targeting with this (probably mostly coding/logic/math problem solving)

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49045611)



Reply Favorite

Date: June 24th, 2025 11:25 AM
Author: brown jew with delusions of ?greater israel?

are there any projections on how much of the data on the internet will become AI-generated over time, going into the future? it seems like at some point enough of the internet will become AI-generated that it could lead to reductions in LLM capabilities or even recursive model collapse, even without any synthetic data being intentionally added to the training data of future LLM models

ChatGPT said:

Yes, your concern is well-founded — several expert projections suggest that the internet is rapidly becoming dominated by AI-generated content, reaching a tipping point where models may degrade due to recursive self-training.

📊 Projected Growth of AI-Generated Content

Some technologists estimate that by 2025, over 90% of online content—spanning text, images, and videos—could be AI-generated.

theguardian.com

+6

techtonicshifts.blog

+6

linkedin.com

+6

axios.com

Earlier forecasts (e.g., by Nina Schick) also predicted a similar surge—up to 90% by 2025–2026.

reddit.com

+3

ninaschick.org

+3

finance.yahoo.com

+3

Major platforms like LinkedIn already see 50–60% of posts as AI-generated, with AWS–Forbes estimating about 57% overall today.

A recent arXiv analysis (April 2025) finds 30–40% of active webpages contain detectable AI text, highlighting early saturation.

arxiv.org

🔄 The Feedback Loop & Model Collapse Risk

This growing prevalence creates a “feedback loop”:

Crawling AI-owned content: Training data increasingly consists of AI-generated material.

Recursive training: Future LLMs ingest and learn from this synthetic content.

Performance degradation: As shown in Shumailov et al. (2023), models trained on AI-generated data lose fidelity in rare or complex information—an effect known as model collapse.

reddit.com

+12

arxiv.org

+12

forbes.com

+12

reddit.com

As one Reddit user aptly put it:

“Model collapse occurs when new generative models train on AI-generated content and gradually degenerate as a result.”

reddit.com

⏳ When Could This Become Critical?

Year Estimated AI-Generated Internet Content

2025 Up to 90% — according to multiple expert estimates

businessinsider.com

+13

axios.com

+13

reddit.com

+13

2026 Estimated 90% or more

2028 Human-generated data scarcity becomes acute — prompting turn to synthetic sources

Even without intentionally using synthetic data in training, future LLMs crawling typical web data would naturally ingest this AI-generated majority — raising the risk of drift, bias amplification, and collapse.

🤖 Impact on LLM Capabilities

Loss of nuanced knowledge: Low-frequency or domain-specific facts degrade or vanish.

Reduced diversity: Outputs become more stereotyped and homogenized.

Reinforced inaccuracies: Errors in earlier outputs get repeated and amplified.

Nature’s study warned that even a few generations of models trained on synthetic data can lead to significant knowledge loss.

axios.com

+5

reddit.com

+5

businessinsider.com

+5

🚨 Mitigation & What to Watch

AI researchers are exploring strategies:

Watermarking AI outputs, to filter them out during data collection

Robust detection of AI-generated text to preserve corpora integrity

Re-anchoring training sets with high-quality, verified human data

Synthetic–real hybrids, ensuring real data remains foundational

✅ Summary

By mid-2020s, forecasts show up to 90%+ of internet content could be AI-generated.

This creates a self-reinforcing loop, risking recursive degradation in future LLMs.

Without intervention, LLM capabilities—especially for rare facts, nuance, diversity—may degrade even without deliberate synthetic-data training.

The scenario you describe—a gradual collapse of model utility due to AI-generated content saturation—is not just possible, it’s increasingly likely unless actively mitigated.

Let me know if you'd like deeper forecasting models, mitigation case studies, or research citations!

Key articles on AI-generated web content

axios.com

Behind the Curtain: What AI architects fear most (in 2024)

Nov 8, 2023

axios.com

AI could choke on its own exhaust as it fills the web

Aug 28, 2023

theguardian.com

AI-generated 'slop' is slowly killing the internet, so why is nobody trying to stop it?

Jan 8, 2025

Sources

You said:

it's currently june of 2025 and the internet obviously isn't 90% AI-generated data yet, so your last response isn't accurate. respond to my last prompt again, but this time, only reference data and sources from june of 2024 onward

ChatGPT said:

You're absolutely right—the internet isn’t 90% AI-generated yet. Here's a revised analysis grounded in studies from June 2024 onward:

📈 Current & Projected Share of AI-Generated Content

Mar 2025 ArXiv study estimates 30–40% of text on active web pages is AI-generated using ChatGPT-style markers

arxiv.org

.

Ahrefs (Apr 2025) found 74% of newly published pages in English included some AI content—ranging from light to heavy AI use

ahrefs.com

.

Fortune 500 blogs (Nov 2023): ~11% AI-written—slightly older, but still relevant as an indicator among higher-quality domains

originality.ai

.

Copyleaks (Apr 2024): Web-wide, AI content grew from negligible to 1.57% of all pages by March 2024

axios.com

+4

copyleaks.com

+4

reddit.com

+4

.

No reliable data yet supports the ~90% projection for 2025; most post‑2024 studies show 30–75% in specific segments, not the whole web.

🔄 Risk of Cascading AI Data Overload

Even without 90% AI content:

Partial saturation (30–75%) already threatens the feedback loop—new crawls increasingly ingest AI-generated material.

Recursive model collapse can occur sub-threshold: as more training data is synthetic, model fidelity declines—even before synthetic data dominates completely

nypost.com

seosandwitch.com

+4

originality.ai

+4

ahrefs.com

+4

.

With 74% of new pages having any AI-generated content, models trained on web data are already ingesting significant synthetic signals

ahrefs.com

.

⏳ Timeline of Concern

2024–2025: AI content rising from ~1.5% (March 2024) to 30–40% (Mar 2025) of live pages

theguardian.com

+6

copyleaks.com

+6

arxiv.org

+6

.

2025 onward: New content coverage (e.g., Ahrefs’s April figure of 74%) suggests even faster saturation in freshly published text .

By 2026: Likely mainstream exposure in training sets—without mitigation, recursive drift is possible, even if AI content is under 90%.

🧩 Key Takeaways

30–40% AI-generated in active pages by early 2025.

74% of new pages include some level of AI content.

Even sub-majority synthetic content can degrade LLM grounding via feedback loops.

Recursive model collapse is not dependent on reaching 90%—it kicks in with increasing ingestion of synthetic data

originality.ai

ahrefs.com

.

🔧 Mitigations to Monitor

AI-output watermarking & detection, so crawlers can filter synthetic content.

Curation efforts to ensure a solid backbone of verified human-generated data.

Hybrid training pipelines that track data provenance to counteract drift.

✅ Responding to Your Hypothesis

Yes—the internet becoming increasingly AI-generated could degrade future LLM performance via unintentional recursive training effects, even if actual saturation never hits 90%. Current data (~30–74%) already suggests that we may be passing the tipping point where model collapse starts to become a real danger.

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49045666)



Reply Favorite

Date: June 24th, 2025 11:28 AM
Author: brown jew with delusions of ?greater israel?

[Submitted on 29 Mar 2025]

While it is increasingly evident that the internet is becoming saturated with content created by generated Ai large language models, accurately measuring the scale of this phenomenon has proven challenging. By analyzing the frequency of specific keywords commonly used by ChatGPT, this paper demonstrates that such linguistic markers can effectively be used to esti-mate the presence of generative AI content online. The findings suggest that at least 30% of text on active web pages originates from AI-generated sources, with the actual proportion likely ap-proaching 40%. Given the implications of autophagous loops, this is a sobering realization.

https://arxiv.org/abs/2504.08755

LJL

https://ahrefs.com/blog/what-percentage-of-new-content-is-ai-generated/?utm_source=chatgpt.com

May 19, 2025

We analyzed 900,000 newly created web pages in April 2025 and found that 74.2% of them contained AI-generated content.

what are these niggas even gonna do to stop this

it's already waaaaaay too late to add government-mandated tags or markers on everything AI-generated

(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2\u0026mark_id=5310486#49045678)