According to CryptoBriefing, StepFun's StepAudio 2.5 Realtime speech model ranked first in five major benchmark tests in April 2026, achieving a subjective human evaluation score of 80.41, a 10-point improvement over its predecessor. This model employs an end-to-end architecture, supports real-time interaction in both Chinese and English, and incorporates paralinguistic understanding to recognize intonation, emotion, and speech rate. The technical report indicates that it maintains role consistency through role-playing-specific RLHF technology, distinguishing it from traditional speech recognition-language model-speech synthesis pipelines.
StepAudio 2.5 Tops Five Voice AI Benchmark Tests
This article is machine translated
Show original
Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments
Share






