Opus 4.5 is disgustingly good
| wild galvanic bawdyhouse black woman | 11/29/25 | | Yapping area | 11/29/25 | | wild galvanic bawdyhouse black woman | 11/29/25 | | rusted house | 11/29/25 | | pearl very tactful coffee pot | 11/29/25 | | wild galvanic bawdyhouse black woman | 11/29/25 | | doobsian razzle-dazzle tanning salon mexican | 11/29/25 | | Khaki Sweet Tailpipe Dragon | 12/07/25 | | Trip famous landscape painting | 12/07/25 | | doobsian razzle-dazzle tanning salon mexican | 11/30/25 | | Khaki Sweet Tailpipe Dragon | 12/07/25 | | vibrant legal warrant | 12/07/25 | | titillating crimson senate coldplay fan | 12/07/25 | | arousing crackhouse cumskin | 12/07/25 | | Yapping area | 12/21/25 | | Multi-colored temple volcanic crater | 12/21/25 | | rusted house | 12/21/25 |
Poast new message in this thread
Date: December 21st, 2025 12:59 PM Author: Yapping area
Opus 4.5 is up to 4 hours and 49 minutes on the METR time horizon task. this benchmark measures the task length (in terms of human work time) that models can do with SWE/AI research type projects. big increase over 5.1 max, which was 2 hours 53 minutes and faster than the overall trend of doubling every 7 months. with any luck, models will be capable of substantially automating AI research before 2030 and set off an intelligence explosion.
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
(http://www.autoadmit.com/thread.php?thread_id=5804093&forum_id=2\u0026show=week",#49527643) |
|
|