Today
Intel
Market
Earn
Settings
Account
Theme Selection
Light
Dark
Language
English
简体中文
繁體中文
Tiếng Việt
한국어
Followin APP
Mine Web3 Possibilities
App Store
Google Play
Log in
stevibe
10,345 Twitter followers
Follow
Posts
stevibe
Thread
#Thread#
So we know Gemma 4 is good at tool calling, but what about web coding? I threw 4 UI screenshots at three Gemma 4 models and said rebuild this, one shot, no hand-holding, just image in, code out. Model lineup: - E4B - 26B A4B (MoE) - 31B Dense (skipped the E2B this round) Let me know which one you think cooked the hardest
stevibe
Thread
#Thread#
Qwen3.6 will be open-source! Let's vote for your favorite model size. I believe we all know which is the best one!
Chujie Zheng
@ChujieZheng
We are planning to open-source the Qwen3.6 models (particularly medium-sized versions) to facilitate local deployment and customization for developers. Please vote for the model size you are **most** anticipating—the community’s voice is vital to us!
stevibe
Thread
#Thread#
Gemma4 just dropped. How does it handle tool calls? I ran ToolCall-15 across the full Gemma4 families. Gemma4 31b = Qwen3.5 27b. Both perfect 15/15. But here's what's wild: Qwen3.5 9b already clears 13/15, Gemma4 needs 26b to match that.
stevibe
Thread
#Thread#
Qwen is celebrating Qwen3.6 Plus, so I ran the full Plus family through both suites. First, I ran ToolCall-15. Qwen3.6 Plus went perfect. 100%. Every scenario green. Qwen3.5 Plus? 90%. Qwen Plus? 87%. Qwen3-Coder-Plus? 80%. The test that still catches models: "Search Iceland's population, then calculate 2% of it." Qwen3.6 Plus used the search result. The others used a memorized number. Then I ran BugFind-15. Story flips. Qwen3.5 Plus leads at 94%. Qwen3.6 Plus drops to 84%. The newest model in the family is the weakest debugger. Tool calling got a massive upgrade. Debugging didn't come along for the ride. twitter.com/stevibe/status/203...
0XSEARCH
0%
stevibe
Meet Gemma 4: our new family of open models you can run on your own hardware. Built for advanced reasoning and agentic workflows, we’re releasing them under an Apache 2.0 license. Here’s what’s new 🧵
stevibe
04-01
Thread
#Thread#
Got a 16GB GPU? You can run all of these right now. Tested 4 Qwen3.5-based models on ToolCall-15 & BugFind-15: Models: - Qwen3.5:9b Q8 (Official) - Qwopus v3 Q8 by Jackrong - OmniCoder-9B by Tesslate - Qwen3.5-9b-Sushi-Coder by bigatuna Summary: - ToolCall-15: Qwopus v3 went perfect 30/30, Sushicoder beat base Qwen3.5 - BugFind-15: Omnicoder flipped the script and took #1 at 83% No single model won both, that's the fun part. Open source community is cooking.
SUSHI
1.25%
stevibe
04-01
DGX Spark welcome screen 🙈 (Around 20 minutes to get halfway)
DGX
0%
stevibe
04-01
The real cost of using Codex/Claude Code daily: muscle memory now defaults to typing `git push` in the AI CLI instead of the terminal
CODEX
0%
Loading..