stevibe

stevibe

10,345 Twitter followers

Follow

Posts

So we know Gemma 4 is good at tool calling, but what about web coding? I threw 4 UI screenshots at three Gemma 4 models and said rebuild this, one shot, no hand-holding, just image in, code out. Model lineup: - E4B - 26B A4B (MoE) - 31B Dense (skipped the E2B this round) Let me know which one you think cooked the hardest

Qwen3.6 will be open-source! Let's vote for your favorite model size. I believe we all know which is the best one!

We are planning to open-source the Qwen3.6 models (particularly medium-sized versions) to facilitate local deployment and customization for developers. Please vote for the model size you are **most** anticipating—the community’s voice is vital to us!

Gemma4 just dropped. How does it handle tool calls? I ran ToolCall-15 across the full Gemma4 families. Gemma4 31b = Qwen3.5 27b. Both perfect 15/15. But here's what's wild: Qwen3.5 9b already clears 13/15, Gemma4 needs 26b to match that.

Qwen is celebrating Qwen3.6 Plus, so I ran the full Plus family through both suites. First, I ran ToolCall-15. Qwen3.6 Plus went perfect. 100%. Every scenario green. Qwen3.5 Plus? 90%. Qwen Plus? 87%. Qwen3-Coder-Plus? 80%. The test that still catches models: "Search Iceland's population, then calculate 2% of it." Qwen3.6 Plus used the search result. The others used a memorized number. Then I ran BugFind-15. Story flips. Qwen3.5 Plus leads at 94%. Qwen3.6 Plus drops to 84%. The newest model in the family is the weakest debugger. Tool calling got a massive upgrade. Debugging didn't come along for the ride. twitter.com/stevibe/status/203...

Meet Gemma 4: our new family of open models you can run on your own hardware. Built for advanced reasoning and agentic workflows, we’re releasing them under an Apache 2.0 license. Here’s what’s new 🧵

Got a 16GB GPU? You can run all of these right now. Tested 4 Qwen3.5-based models on ToolCall-15 & BugFind-15: Models: - Qwen3.5:9b Q8 (Official) - Qwopus v3 Q8 by Jackrong - OmniCoder-9B by Tesslate - Qwen3.5-9b-Sushi-Coder by bigatuna Summary: - ToolCall-15: Qwopus v3 went perfect 30/30, Sushicoder beat base Qwen3.5 - BugFind-15: Omnicoder flipped the script and took #1 at 83% No single model won both, that's the fun part. Open source community is cooking.

DGX Spark welcome screen 🙈 (Around 20 minutes to get halfway)

The real cost of using Codex/Claude Code daily: muscle memory now defaults to typing `git push` in the AI CLI instead of the terminal

Loading..