Tongyi Qianwen GPT-4 large model directly breaks through the bottom price of the entire network!
Just now, Alibaba suddenly released a big move and officially announced price reductions for 9 Tongyi models .
Among them, the API input price of Qwen-Long, the main model whose performance is comparable to GPT-4, has dropped from 0.02 yuan/thousand tokens to 0.0005 yuan/thousand tokens, which means that 1 yuan can buy 2 million tokens, which is equivalent to the text volume of 5 Xinhua Dictionaries. It can be called the king of cost-effectiveness among large models in the world.
A more intuitive comparison -
Qwen-Long supports long text input of 10 million tokens. Compared with GPT-4, the price is only 1/400.
New super-large cup products are also on the price reduction list: the API input price of the recently released Tongyi Qianwen super-large cup Qwen-max has also dropped by 67% to as low as 0.02 yuan/thousand tokens.
In terms of open source, the input prices of five open source models including Qwen1.5-72B and Qwen1.5-110B have also dropped by more than 75%.
This wave of operations once again broke the lowest price on the entire network. It can be said to be a 618 carnival exclusively for large model companies and programmers.
1 yuan for 2 million tokens
Let’s take a look at the specific price reductions:
This price reduction covers a total of 9 Tongyi Qianwen series models, including both commercial models and open source models.
include:
Qwen-Long, with performance comparable to GPT-4, has an API input price that has dropped from 0.02 yuan/thousand tokens to 0.0005 yuan/thousand tokens, a decrease of 97%; the API output price has dropped from 0.02 yuan/thousand tokens to 0.002 yuan/thousand tokens, a decrease of 90%.
Qwen-max, on the authoritative benchmark OpenCompass, has the same performance as GPT-4-turbo, and its API input price has dropped from 0.12 yuan/thousand tokens to 0.04 yuan/thousand tokens, a decrease of 67%.
For the Qwen1.5 series open source models listed in the Big Model Arena ranking, the API input price of Qwen1.5-72B dropped from 0.02 yuan/thousand tokens to 0.005 yuan/thousand tokens, a decrease of 75%; the API output price dropped from 0.02 yuan/thousand tokens to 0.01 yuan/thousand tokens, a decrease of 50%.
Compared with OpenAI's GPT series, the Tongyi Qianwen series can basically be purchased at a 10% discount after the price reduction, with a very high cost-effectiveness.
Taking Qwen-Long, which has the largest price reduction, as an example, its price is only 1/400 of GPT-4, but its performance indicators are not inferior.
Especially in terms of long texts, Qwen-Long supports ultra-long contextual dialogues with a maximum length of 10 million tokens, which means it can easily process documents of about 15 million words or 15,000 pages. In conjunction with the document service launched simultaneously, it can also support parsing and dialogue in multiple document formats such as word, pdf, Markdown, epub, and mobi.
It is worth noting that, unlike the pricing method of most domestic manufacturers where input and output prices are the same, this time Qwen-Long's input price dropped more than its output price.
In this regard, Alibaba officials also gave an explanation:
Nowadays, asking questions to large models in conjunction with long texts (papers, documents, etc.) has become one of the most common requirements, so the number of model input calls is often greater than the number of output calls .
According to statistics, the actual model input call volume is generally about 8 times the output. We have significantly reduced the price of the input token that users use the most, which is more cost-effective for enterprises and can better achieve universal benefits.
I also hope that everyone will use the long text.
Ali's first move is a big move
This is not the first time that Alibaba Cloud has broken the industry’s bottom price.
On February 29th of this year, Alibaba Cloud held a big event called "Crazy Thursday" for its cloud products: the prices of all cloud products dropped by 20%, with the highest drop reaching 55%.
It's really a big blow to yourself.
The confidence behind such a large investment comes from the fact that Alibaba Cloud, as the largest public cloud vendor in China, has built a complete AI infrastructure and Infra technology advantages through long-term technology accumulation and scale effects.
This sincere price reduction reveals the era of large-model applications. This technological dividend is becoming one of the "trump cards" of public cloud vendors.
At the AI infrastructure level, from the chip layer to the platform layer, Alibaba Cloud has built a highly flexible AI computing power scheduling system based on its self-developed core technologies and products such as heterogeneous chip interconnection, high-performance network HPN7.0, high-performance storage CPFS, and artificial intelligence platform PAI.
For example, PAI supports cluster scalability of 100,000 cards, and the linear expansion efficiency of ultra-large-scale training reaches 96%. In large-model training tasks, achieving the same effect can save more than 50% of computing resources, and the performance reaches the world's leading level.
In terms of inference optimization, Alibaba Cloud mainly provides three capabilities:
First, high-performance optimization, including system-level reasoning optimization technology, as well as high-performance operators, efficient reasoning frameworks, and compilation optimization capabilities.
Second, adaptive tuning. With the diversification of AI applications, it is difficult for a single model to maintain optimal performance in all scenarios. Adaptive reasoning technology allows the model to dynamically adjust the application of reasoning technology and the selection of computing resources based on the characteristics of the input data and the constraints of the computing environment.
Third, scalable deployment. The expansion and elasticity of model reasoning deployment resources can solve the tidal phenomenon of reasoning services in a certain period of time.
Previously, Liu Weiguang, senior vice president of Alibaba Cloud Intelligence Group and president of the Public Cloud Business Unit, also stated that the technological dividends and scale effects of the public cloud will bring huge cost and performance advantages.
This will make "public cloud + API become the mainstream way for enterprises to call big models."
The mainstream route in the era of large model applications: public cloud + API
This is also the core reason why Alibaba Cloud has once again pushed the large-model "price war" to a climax.
Especially for small and medium-sized enterprises and entrepreneurial teams, public cloud + API has always been regarded as a cost-effective choice for large-scale model applications:
Although open source models are developing rapidly, and the strongest models represented by Llama 3 are considered to have performance comparable to GPT-4, private deployment still faces the problem of high costs.
Taking the Qwen-72B open source model and a monthly token usage of 100 million as an example, directly calling the API on Alibaba Cloud Bailian only costs 600 yuan per month, while the cost of private deployment is more than 10,000 yuan per month on average.
In addition, the public cloud + API model is also convenient for multi-model calls and can provide enterprise-level data security. Taking Alibaba Cloud as an example, Alibaba Cloud can provide enterprises with a dedicated VPC environment to achieve computing isolation, storage isolation, network isolation, and data encryption. At present, Alibaba Cloud has led and deeply participated in the formulation of more than 10 international and domestic technical standards related to large model security.
The openness of cloud vendors can also provide developers with a richer selection of models and tool chains. For example, in addition to Tongyi Qianwen, Alibaba Cloud Bailian Platform also supports hundreds of large models at home and abroad, such as Llama series, Baichuan, ChatGLM, etc., and provides a one-stop development environment for large model applications, which can develop a large model application in 5 minutes and build an enterprise-level RAG application with 5 to 10 lines of code.
In the "China AIGC Application Panorama Report", Quantum位 Think Tank mentioned that among AIGC application products, products based on self-built vertical large models and API access account for nearly 70%.
This data also indirectly proves the market potential of the "public cloud + API" model: in the application market, understanding of the business and data accumulation are the key to breaking through. Developing applications based on public cloud + API is a more realistic choice in terms of cost and startup speed.
In fact, whether it is the intuitive price competition or the deeper AI infrastructure, it reflects that when the focus of big model development gradually shifts from basic models to practical applications, how platform vendors lower the threshold for using big models has become the key to competition .
Liu Weiguang pointed out:
As China's largest cloud computing company, Alibaba Cloud has reduced the price of mainstream large-model API input by 97% in the hope of accelerating the explosion of AI applications.
We expect the number of calls to large model APIs to increase thousands of times in the future.
To sum up, on the one hand, for platform manufacturers, the "price war" is actually a battle over infrastructure and technical capabilities; on the other hand, for the entire large model industry, whether the application can continue to explode and become more popular, the entry threshold and operating costs have become key factors.
In this light, the recent price reduction trend is undoubtedly good news for developers and friends who are looking forward to more large-scale model applications.
What do you think?
This article comes from the WeChat public account "Quantum Bit" (ID: QbitAI) , author: Yuyang, and is authorized to be published by 36Kr.






