Date: April 29th, 2026 12:37 AM
Author: https://imgur.com/a/o2g8xYK
i have a couple of "code parser in the sampling loop" + "code parser controlled KV cache endpoints" experiments that are pretty promising, also in the context of CPU inference. I don't really have time nor knowledge to properly explore them, but if someone is interested here, I feel this can unlock some pretty interesting idea to keep smaller models on track and just in general exploit the massive power that comes with controlling the inference end to end. I did a couple of experiments and my ideas do work (very messy repo: https://github.com/REDACTED), if someone is interested in a chat. (I think small local models + custom inference harness + tools optimized on trace analysis / GEPA-like/autoresearch loops are potentially competitive with much much much larger models for a fair amount of the "lower level but token consuming" tasks in the context of coding.
roughly the idea is that if you have a AST parser + compiler/interpreter in the loop, you can do the following things:
- flag syntax errors (obv) + easily gaslight the model into thinking it wrote them correctly
- autocomplete a set of patterns to save on inference compute by just prefilling the completion
- rewind the cache / inference pointer (not sure what the proper name is) to semantically meaningful points (say, function starts)
- insert // docstrings that help the model along for trickier APIs
- for certain types of functions / unit tests, run the unit tests right after the code was generated, parallelizing with inference, and you can easily rewind
- on CPU, where you are memory constrained (I guess it's the same on GPUs but I haven't done GPGPU stuff since 2012), you can run speculative loops / split the code writing across multiple cores and then merge
All of these things mostly benefit from it being tailored to your own codebase / being RL-optimized to your codebase through transcript analysis, so model providers who are more reliant on batching similar jobloads and often are "too fast" for this kind of trickery won't really benefit from it.
(http://www.autoadmit.com/thread.php?thread_id=5861511&forum_id=2...id#49850311)