And I was right!!
IQuest-Coder was set up incorrectly and includes the whole git history, including future commits. The model has found this trick and uses it rather often.
Thus, its SWE-bench score should be discarded.
twitter.com/xeophon/status/200...