Experiment: Best AI fails to produce acceptable work on 97.5% of freelance tasks
| LathamTouchedMe | 02/06/26 | | ;.;;;....;;;.;...............;.....;;; | 02/06/26 | | brother andrews | 02/06/26 | | LathamTouchedMe | 02/06/26 | | norwood ultra | 02/06/26 | | computer online | 02/06/26 | | LathamTouchedMe | 02/06/26 | | winter olympic opening ceremony | 02/06/26 | | To be fair | 02/06/26 | | LathamTouchedMe | 02/06/26 | | computer online | 02/06/26 | | To be fair | 02/06/26 | | Matthias of Redwall Did Nothing Wrong #Cornflower | 02/06/26 | | computer online | 02/06/26 | | LathamTouchedMe | 02/06/26 | | Matthias of Redwall Did Nothing Wrong #Cornflower | 02/06/26 | | artificial intelligence | 02/06/26 | | Mainlining the $ecret Truth of the Univer$e | 02/06/26 | | sealclubber | 02/06/26 | | computer online | 02/06/26 | | FizzKidd | 02/06/26 | | do you think i am retarded, . ? | 02/06/26 | | Mainlining the $ecret Truth of the Univer$e | 02/06/26 | | Rainier Wolfcastle | 02/06/26 | | .,.,...,..,.,.,:,,:,...,:::,...,:,.,.:..:. | 02/06/26 |
Poast new message in this thread
Date: February 6th, 2026 12:55 PM Author: LathamTouchedMe
They assigned a variety of freelance tasks (programming, research, architectural design etc.) to different AI's. The best AI produced adequate results (someone would be willing to pay for it) in just 2.5% of tasks. Basically, no better than a non-professional clown using google to figure something out. The bar for "adequate" was also very low. Current AI is maybe a tiny productivity boost (if that). It won't be able to replace professionals like myself.
https://www.semafor.com/article/10/31/2025/ai-still-fails-at-completing-real-life-work-study-finds
(http://www.autoadmit.com/thread.php?thread_id=5831741&forum_id=2Reputation#49650972) |
 |
Date: February 6th, 2026 1:18 PM Author: To be fair (Semi-Retarded)
To be fair,
Pretty sure the difference is that in another 10 years, a hammer will still just be a tool that is totally unable to autonomously guide itself into building anything useful without active human use. Just like it has been for the last 5,000+ years and counting.
Wanna make the same bet about AI?
(http://www.autoadmit.com/thread.php?thread_id=5831741&forum_id=2Reputation#49651043) |
Date: February 6th, 2026 10:16 PM
Author: .,.,...,..,.,.,:,,:,...,:::,...,:,.,.:..:.
Likely a lot of those tasks aren’t the current focus of model designers. Meanwhile for coding/SWE tasks, model competency is quite high because they are spending their training resources on it:
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
The implication of AI coding models that can complete long, complicated tasks is that they can substantially automate AI design experimentation. I will eat a dick if that benchmark is less than 20% in a year
(http://www.autoadmit.com/thread.php?thread_id=5831741&forum_id=2Reputation#49652671) |
|
|