Followin LogoFollowin
  • icon of HOMEicon of HOME
    Today
  • icon of INTELicon of INTEL
    Intel
  • icon of EXPLOREicon of EXPLORE
    Market
  • icon of EARNicon of EARN
    Earn
  • icon of SETTINGicon of SETTING
    Settings
    • Account
    • Theme Selection
      • Light
      • Dark
    • Language
      • English
      • 简体中文
      • 繁體中文
      • Tiếng Việt
      • 한국어
Followin APP
Mine Web3 Possibilities
avatar
Log in
avatar
Xeophon
9,629 Twitter followers
Follow
Posts
avatar
Xeophon
02-27
Thoughts on LLM benches, open models and skill issues. Link in replies.
avatar
Xeophon
01-16
Thread
#Thread#
is it just be or has gpt-5.2(-codex) gotten faster???
avatar
Xeophon
01-13
Thread
#Thread#
things i am wondering lately: - what is the reason some scaffolds perform better? is it the prompt? the available tools, esp. the search tool? the loop itself? - what impact has model-specific prompting on the first question? how impactful is this for the frontier?
0XSEARCH
0%
avatar
Xeophon
01-03
Update: They’ve acknowledged the issues and re-ran SWE-bench with the official images. It brought its scores down to a (still very, very impressive) 76.2. Kudos!! They also have a vLLM kernel patch and advise against using quants. github.com/IQuestLab/IQuest-Co...…
avatar
Xeophon
01-02
And I was right!! IQuest-Coder was set up incorrectly and includes the whole git history, including future commits. The model has found this trick and uses it rather often. Thus, its SWE-bench score should be discarded. twitter.com/xeophon/status/200...
-- END --