Claude's Chinese-language tax: Asking for the same content costs 65% more tokens than English, while OpenAI only costs 15%.

This article is machine translated

Show original

According to AIMPACT, on April 29th (UTC+8), based on Beating's monitoring, AI researcher Aran Komatsuzaki translated Rich Sutton's famous paper "The Bitter Lesson" into nine languages and fed it into the tokenizers of six models: OpenAI, Gemini, Qwen, DeepSeek, Kimi, and Claude. Using the token count of the original English text on OpenAI as a baseline, the researchers compared the token consumption for each language to each model. Results: For the same content, querying Claude in Chinese resulted in 1.65 times the token consumption of the baseline; using OpenAI, it was only 1.15 times. Hindi showed an even more dramatic increase on Claude, exceeding the baseline by three times. Anthropic ranked last among the six models in the comparison. Note that translation alters text length, so the comparison with English is not entirely accurate. Even more convincing is the performance of the same Chinese text on different models (using the same benchmark): Kimi only took 0.81 times the time (less than English), Qwen 0.85 times, and Claude 1.65 times. The text was exactly the same; the difference was purely a matter of the efficiency of the word segmentation tool. The fact that the Chinese model processed Chinese more efficiently than English indicates that the problem lies not in Chinese itself, but in whether the word segmentation tool has been optimized for that language. For users, more tokens directly increase the cost of the API, increase the waiting time before the model responds, and deplete the context window more quickly. The efficiency of a word segmentation tool depends on the proportion of each language in the training data: more English data allows for efficient compression of English words; less non-English data results in very fragmented segmentation. Aran's conclusion: whoever has the largest market will save more tokens. (Source: ME)

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content