A small model that costs $100,000 to train outperforms GPT-4o in specific tasks and has 99 times lower latency

05-14

This article is machine translated

Show original

Here is the English translation: The existing SOTA-level large language models indeed possess strong intelligence, achieving or surpassing human levels in some tasks, but their parameter sizes often reach hundreds of billions or even trillions, making training, deployment, and inference extremely costly. For enterprises and developers, these SOTA models may not be the optimal choice in terms of comprehensive cost and performance for some relatively simple tasks that require large-scale and high-concurrency operations. A early-stage startup called Fastino recognized this pain point, using low-end gaming GPUs to train a series of small models called "Task-Specific Language Models" (TLMs) at an average cost of less than $100,000, which can perform comparably to large language models on specific tasks and have inference speeds 99 times faster. Recently, Fastino secured a $17.5 million seed round led by Khosla Ventures, with participation from Insight Partners, Valor Equity Partners, and notable angel investors including former Docker CEO Scott Johnston and Weights & Biases CEO Lukas Biewald. In November 2024, Fastino raised a $7 million pre-seed round led by M12 (Microsoft's venture arm) and Insight Partners, bringing their total funding to nearly $25 million. [The rest of the translation follows the same professional and accurate approach, maintaining the original structure and technical terminology.]

Under Scaling Law, Small Models Have Unique Advantages in Enterprise Applications

Small models with low cost, low latency, and performance not inferior to large-scale general models in specific tasks are not only discovered by Fastino. Among model manufacturers, Cohere and Mistral both offer very strong small-sized models; domestic giants like Alibaba Cloud's Qwen3 also have 4B, 1.7B, and even 0.6B models. The enterprise unicorn Writer we previously introduced also has its Palmyra series of small models that only cost $700,000 to train.

Why do enterprises and developers still need small models when large-scale models have already reached a certain level of intelligence? The root lies in cost, inference latency, and capability matching.

First, the most intuitive aspect is deployment and inference costs. Enterprises pursuing high security will inevitably deploy some business privately, and the commercial inference cost of large-scale models with tens of billions of parameters may exceed the training cost of small models. Moreover, for applications like TikTok and WeChat with over 1 billion users, high-concurrency is crucial, and the cost difference between high-concurrency inference of small and large models is exponential.

Taking large C-end applications as an example, when using large-scale models, their inference latency is much higher than small models. Small models can even achieve microsecond-level latency, while large-scale models often have noticeable lag, which is very obvious in terms of user experience.

For some large-scale but specific use cases that do not require general capabilities, the performance gap between large and small models is negligible. Therefore, the additional costs brought by large-scale models are unnecessary for enterprises.

These three aspects, under the shadow of Scaling Law, provide sufficient survival space for small-sized models. This principle naturally applies to AI application entrepreneurs in China. Fortunately, China's model open-source ecosystem is gradually maturing, with sufficiently strong small-sized models available. Entrepreneurs only need to perform post-training based on their specific requirements to obtain a suitable model.

This article is from the WeChat public account "Alpha Startups" (ID: alphastartups), authored by those who discover extraordinary entrepreneurs, published by 36Kr with authorization.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

Blockbeats

$55,000 will be the lifeline for Bitcoin.

BTC

0.13%

All-in station

The Vietnamese government is pushing for the operation of a gold and cryptocurrency exchange by February 28, 2026.

BlockTempo

Machi Big Brother lost so much money he couldn't sleep all night? Ethereum fell below 2000, causing panic. He long positions in ETH and HYPE, then sold at a loss, losing all 120,000 in magnesium.

ETH

0.65%