Under Scaling Law, Small Models Have Unique Advantages in Enterprise Applications
Small models with low cost, low latency, and performance not inferior to large-scale general models in specific tasks are not only discovered by Fastino. Among model manufacturers, Cohere and Mistral both offer very strong small-sized models; domestic giants like Alibaba Cloud's Qwen3 also have 4B, 1.7B, and even 0.6B models. The enterprise unicorn Writer we previously introduced also has its Palmyra series of small models that only cost $700,000 to train.
Why do enterprises and developers still need small models when large-scale models have already reached a certain level of intelligence? The root lies in cost, inference latency, and capability matching.
First, the most intuitive aspect is deployment and inference costs. Enterprises pursuing high security will inevitably deploy some business privately, and the commercial inference cost of large-scale models with tens of billions of parameters may exceed the training cost of small models. Moreover, for applications like TikTok and WeChat with over 1 billion users, high-concurrency is crucial, and the cost difference between high-concurrency inference of small and large models is exponential.
Taking large C-end applications as an example, when using large-scale models, their inference latency is much higher than small models. Small models can even achieve microsecond-level latency, while large-scale models often have noticeable lag, which is very obvious in terms of user experience.
For some large-scale but specific use cases that do not require general capabilities, the performance gap between large and small models is negligible. Therefore, the additional costs brought by large-scale models are unnecessary for enterprises.
These three aspects, under the shadow of Scaling Law, provide sufficient survival space for small-sized models. This principle naturally applies to AI application entrepreneurs in China. Fortunately, China's model open-source ecosystem is gradually maturing, with sufficiently strong small-sized models available. Entrepreneurs only need to perform post-training based on their specific requirements to obtain a suitable model.
This article is from the WeChat public account "Alpha Startups" (ID: alphastartups), authored by those who discover extraordinary entrepreneurs, published by 36Kr with authorization.




