Xeophon

Xeophon

9,629 Twitter followers

Follow

Posts

Thoughts on LLM benches, open models and skill issues. Link in replies.

is it just be or has gpt-5.2(-codex) gotten faster???

things i am wondering lately: - what is the reason some scaffolds perform better? is it the prompt? the available tools, esp. the search tool? the loop itself? - what impact has model-specific prompting on the first question? how impactful is this for the frontier?

Update: They’ve acknowledged the issues and re-ran SWE-bench with the official images. It brought its scores down to a (still very, very impressive) 76.2. Kudos!! They also have a vLLM kernel patch and advise against using quants. github.com/IQuestLab/IQuest-Co...…

And I was right!! IQuest-Coder was set up incorrectly and includes the whole git history, including future commits. The model has found this trick and uses it rather often. Thus, its SWE-bench score should be discarded. twitter.com/xeophon/status/200...

-- END --