The Qwen team at Alibaba has unveiled QwQ-32B, a 32 billion parameter AI model that demonstrates performance rivalling the much larger DeepSeek-R1. This breakthrough highlights the potential of scalin...

The Qwen team at Alibaba has unveiled QwQ-32B, a 32 billion parameter AI model that demonstrates performance rivalling the much larger <a href="https://www.artificialintelligence-news.com/news/deepseek-r1-reasoning-models-rival-openai-in-performance/" rel="nofollow">DeepSeek-R1</a>. This breakthrough highlights the potential of scaling Reinforcement Learning (RL) on robust foundation models.The Qwen team have successfully integrated <a href="https://www.artificialintelligence-news.com/news/opera-introduces-browser-integrated-ai-agent/" rel="nofollow">agent capabilities</a> into the reasoning model, enabling it to think critically, utilise tools, and adapt its reasoning based on environmental feedback.“Scaling RL has the potential to enhance model performance beyond conventional pretraining and post-training methods,” the team stated. “Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models.”QwQ-32B achieves performance comparable to DeepSeek-R1, which boasts 671 billion parameters (with 37 billion activated), a testament to the effectiveness of RL when applied to robust foundation models pretrained on extensive world knowledge. This remarkable outcome underscores the potential of RL to bridge the gap between model size and performance.The model has been evaluated across a range of benchmarks, including AIME24, LiveCodeBench, LiveBench, IFEval, and BFCL, designed to assess its mathematical reasoning, coding proficiency, and general problem-solving capabilities.The results highlight QwQ-32B’s performance in comparison to other leading models, including DeepSeek-R1-Distilled-Qwen-32B, DeepSeek-R1-Distilled-Llama-70B, o1-mini, and the original DeepSeek-R1.Benchmark results:<ul><li>AIME24: QwQ-32B achieved 79.5, slightly behind DeepSeek-R1-6718’s 79.8, but significantly ahead of OpenAl-o1-mini’s 63.6 and the distilled models.</li><li>LiveCodeBench: QwQ-32B scored 63.4, again closely matched by DeepSeek-R1-6718’s 65.9, and surpassing the distilled models and OpenAl-o1-mini’s 53.8.</li><li>LiveBench: QwQ-32B achieved 73.1, with DeepSeek-R1-6718 scoring 71.6, and outperforming the distilled models and OpenAl-o1-mini’s 57.5.</li><li>IFEval: QwQ-32B scored 83.9, very close to DeepSeek-R1-6718’s 83.3, and leading the distilled models and OpenAl-o1-mini’s 59.1.</li><li>BFCL: QwQ-32B achieved 66.4, with DeepSeek-R1-6718 scoring 62.8, demonstrating a lead over the distilled models and OpenAl-o1-mini’s 49.3.</li></ul>The Qwen team’s approach involved a cold-start checkpoint and a multi-stage RL process driven by outcome-based rewards. The initial stage focused on scaling RL for math and coding tasks, utilising accuracy verifiers and code execution servers. The second stage expanded to general capabilities, incorporating rewards from general reward models and rule-based verifiers.“We find that this stage of RL training with a small amount of steps can increase the performance of other general capabilities, such as instruction following, alignment with human preference, and agent performance, without significant performance drop in math and coding,” the team explained.QwQ-32B is open-weight and available on <a href="https://huggingface.co/" rel="nofollow">Hugging Face</a> and <a href="https://www.modelscope.cn/" rel="nofollow">ModelScope</a> under the Apache 2.0 license, and is also accessible via Qwen Chat. The Qwen team views this as an initial step in scaling RL to enhance reasoning capabilities and aims to further explore the integration of agents with RL for long-horizon reasoning.“As we work towards developing the next generation of Qwen, we are confident that combining stronger foundation models with RL powered by scaled computational resources will propel us closer to achieving Artificial General Intelligence (AGI),” the team stated.See also: <a href="https://www.artificialintelligence-news.com/news/deepgram-nova-3-medical-ai-speech-model-healthcare-transcription-errors/" rel="nofollow">Deepgram Nova-3 Medical: AI speech model cuts healthcare transcription errors</a><figure><a href="https://www.ai-expo.net/" rel="nofollow"><img src="https://static.fwimg.io/img/feed/4a794fbd5bc45fd981f688dfbed9aa3b.jpg" alt=""></a></figure>Want to learn more about AI and big data from industry leaders? Check out<a href="https://www.ai-expo.net/" rel="nofollow"> AI &amp; Big Data Expo</a> taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including <a href="https://intelligentautomation-conference.com/northamerica/" rel="nofollow">Intelligent Automation Conference</a>, <a href="https://www.blockchain-expo.com/" rel="nofollow">BlockX</a>,<a href="https://digitaltransformation-week.com/" rel="nofollow"> Digital Transformation Week</a>, and <a href="https://www.cybersecuritycloudexpo.com/" rel="nofollow">Cyber Security &amp; Cloud Expo</a>.Explore other upcoming enterprise technology events and webinars powered by TechForge <a href="https://techforge.pub/events/" rel="nofollow">here</a>.The post <a href="https://www.artificialintelligence-news.com/news/alibaba-qwen-qwq-32b-scaled-reinforcement-learning-showcase/" rel="nofollow">Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase</a> appeared first on <a href="https://www.artificialintelligence-news.com/" rel="nofollow">AI News</a>.

Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase

阿里巴巴的Qwen团队推出了QwQ-32B,这是一个拥有320亿参数的AI模型,其性能与更大的DeepSeek-R1相媲美。这一突破突显了扩展模型规模的潜力。

阿里巴巴的Qwen团队推出了QwQ-32B,这是一个320亿参数的AI模型,其性能与更大的<a href="https://www.artificialintelligence-news.com/news/deepseek-r1-reasoning-models-rival-openai-in-performance/" rel="nofollow">DeepSeek-R1</a>相媲美。这一突破突显了在强大的基础模型上扩展强化学习(RL)的潜力。Qwen团队成功将<a href="https://www.artificialintelligence-news.com/news/opera-introduces-browser-integrated-ai-agent/" rel="nofollow">代理能力</a>集成到推理模型中,使其能够批判性思考、利用工具,并根据环境反馈调整其推理。"扩展RL有望提升模型性能,超越常规的预训练和后训练方法,"团队表示。"最近的研究表明,RL可以显著提高模型的推理能力。"QwQ-32B的性能可与拥有6710亿参数(37亿激活)的DeepSeek-R1相媲美,这证明了在经过广泛世界知识预训练的强大基础模型上应用RL的有效性。这一remarkable结果突出了RL弥补模型规模和性能差距的潜力。该模型已在一系列基准测试中进行了评估,包括AIME24、LiveCodeBench、LiveBench、IFEval和BFCL,旨在评估其数学推理、编码能力和一般问题解决能力。结果突出了QwQ-32B与其他领先模型(包括DeepSeek-R1-Distilled-Qwen-32B、DeepSeek-R1-Distilled-Llama-70B、o1-mini和原始DeepSeek-R1)的性能对比。基准测试结果:<ul><li>AIME24: QwQ-32B获得79.5分,略低于DeepSeek-R1-6718的79.8分,但明显高于OpenAl-o1-mini的63.6分和蒸馏模型。</li><li>LiveCodeBench: QwQ-32B得分63.4,再次与DeepSeek-R1-6718的65.9分相当,并超过了蒸馏模型和OpenAl-o1-mini的53.8分。</li><li>LiveBench: QwQ-32B获得73.1分,而DeepSeek-R1-6718得71.6分,并优于蒸馏模型和OpenAl-o1-mini的57.5分。</li><li>IFEval: QwQ-32B得分83.9,非常接近DeepSeek-R1-6718的83.3分,并领先于蒸馏模型和OpenAl-o1-mini的59.1分。</li><li>BFCL: QwQ-32B获得66.4分,而DeepSeek-R1-6718得62.8分,表现优于蒸馏模型和OpenAl-o1-mini的49.3分。</li></ul>Qwen团队的方法涉及冷启动检查点和多阶段RL过程,由基于结果的奖励驱动。初始阶段专注于扩展RL用于数学和编码任务,利用准确性验证器和代码执行服务器。第二阶段扩展到一般能力,纳入来自一般奖励模型和基于规则的验证器的奖励。"我们发现,这个阶段的RL训练只需很少的步骤就可以提高其他一般能力,如指令遵循、与人类偏好的一致性和代理性能,而数学和编码性能不会显著下降,"团队解释道。QwQ-32B是开放权重,可在<a href="https://huggingface.co/" rel="nofollow">Hugging Face</a>和<a href="https://www.modelscope.cn/" rel="nofollow">ModelScope</a>上获得,采用Apache 2.0许可,也可通过Qwen Chat访问。Qwen团队将此视为扩展RL以增强推理能力的初步步骤,并旨在进一步探索将代理与RL结合用于长期推理。"在我们致力于开发下一代Qwen的过程中,我们相信,将更强大的基础模型与由扩展计算资源驱动的RL相结合,将使我们更接近实现人工通用智能(AGI),"团队表示。另见:<a href="https://www.artificialintelligence-news.com/news/deepgram-nova-3-medical-ai-speech-model-healthcare-transcription-errors/" rel="nofollow">Deepgram Nova-3 Medical: AI语音模型降低医疗转录错误</a><figure><a href="https://www.ai-expo.net/" rel="nofollow"><img src="https://static.fwimg.io/img/feed/4a794fbd5bc45fd981f688dfbed9aa3b.jpg" alt=""></a></figure>想了解更多来自行业领导者的AI和大数据信息吗?查看在阿姆斯特丹、加州和伦敦举办的<a href="https://www.ai-expo.net/" rel="nofollow">AI & Big Data Expo</a>。这个全面的活动与其他领先活动如<a href="https://intelligentautomation-conference.com/northamerica/" rel="nofollow">Intelligent Automation Conference</a>、<a href="https://www.blockchain-expo.com/" rel="nofollow">BlockX</a>、<a href="https://digitaltransformation-week.com/" rel="nofollow">Digital Transformation Week</a>和<a href="https://www.cybersecuritycloudexpo.com/" rel="nofollow">Cyber Security & Cloud Expo</a>同时举办。在这里探索由TechForge提供的其他即将到来的企业技术活动和网络研讨会<a href="https://techforge.pub/events/" rel="nofollow">events</a>。本文最初发表于<a href="https://www.artificialintelligence-news.com/" rel="nofollow">AI News</a>。

阿里巴巴 Qwen QwQ-32B：规模化强化学习展示

美国劳工统计局（BLS） 1 月非农报告即将发布，市场预计此次修正将抹去约 100 万个就业岗位。
文章作者：赵颖
文章来源：华尔街见闻
美国劳工统计局（BLS）将于今晚发布延迟的 1 月非农报告，同时进行年度基准修正和方法论更新。市场预计此次修正将抹去约 100 万个就业岗位，这是美国就业统计史上规模最大的下修之一。
根据 BLS 初步估计，2024 年 4 月至 2025 年 3 月期间的就业...

今夜，美国非农或现“百万级”下修

全球最大资产管理公司贝莱德披露将购入去中心化交易平台 Uniswap 的原生代币 UNI。此举不仅展现传统金融 […]
〈贝莱德宣布购买 Uniswap 平台币 UNI！$UNI 跳涨 23%〉这篇文章最早发布于动区BlockTempo《动区动趋-最具影响力的区块链新闻媒体》。

贝莱德宣布购买Uniswap 平台币UNI！ $ UNI跳涨23%

Polymarket 上一玩家 1 年交易 61,793 次，狂赚 10.6 万美元。