Microsoft has just open-sourced a cutting-edge voice AI that can process 60 minutes of audio in a single session. You upload your recording. It identifies each speaker, timestamps each word, outputs complete structured text, and annotates who said what and when. It also supports real-time TTS with a first-episode audio latency of only 300 milliseconds and supports over 50 languages. 100% open source. Link: github.com/microsoft/VibeVoice...…
This article is machine translated
Show original

From Twitter
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments
Share
Relevant content



