Date: April 4th, 2026 9:17 PM Author: jet-lagged place of business temple
Turns out I only need 48gb of VRAM to run agentic shit on a server with 96gb distributed over 5 GPUs. That means I got 48gb of leftover VRAM I can use to run any random shit like nomic-embed-text or GLM-4.7. If I use all five GPUs I'll get 66% of the effective memory bandwidth of a RTX 6000, with some added latency from the PCIe interchanges, but right now I'm getting nearly twice the performance of an RTX 6000 because 90% of my work gets done on a 5090. RTX 6000 would feel slow as fuck for anything I might actually do with it