0xFunky's Insight

12-13

This article is machine translated

Show original

A week ago, Microsoft released the open-source VibeVoice Model, and today Google updated Gemini Audio. In the intervening days, I developed MeetLingo: a real-time speech-to-speech tool focused on PC online meetings. The motivation was simple: when VibeVoice announced a 300ms latency, I realized that TTS latency had become low enough to be truly used in time-sensitive scenarios like "real-time meetings." So, I spent a day using Vibe Coding to create the MVP. The entire system uses a streaming architecture: speech is recognized and translated simultaneously, translation tokens are sent to the TTS system as they are spoken, and finally, the output is directly as speech, rather than waiting for a sentence to finish before processing begins. Because of this, currently, with a local + open-source model, the TTFA (Time-to-Average Translation) is around 1000-1500ms. In the open-source world where code can be customized, modified, and embedded, latency is generally over 2000ms. Our speed is already quite competitive (approaching the translation speed in Google Video). Frankly, when I saw Google announce the Gemini Audio update today, I felt for a moment that "this idea and narrative were completely overshadowed." When a big company makes a move, it's easy to suppress creativity, timing, and even the presence of a startup. Therefore, in this era, what truly matters is not hiding ideas and slowly refining them, but the ability to quickly build an MVP and validate it in the real world. But then I thought, since we've already started, let's continue. MeetLingo was never limited to meetings from the beginning; it's essentially a low-latency speech-to-speech pipeline. In the future, it will support more languages and can be used in any real-time voice scenario. The difference is that I chose to make it open-source and local-first, not a feature locked within a platform. Now that it's running, let's gradually optimize it to be faster, more stable, and easier to use. My GitHub and website are in the comments section. Feel free to check them out, and please give it a ⭐ if you'd like!

Google AI

@GoogleAI

12-13

Listen up 🔊 We’ve made some updates to our Gemini Audio models and capabilities: — Gemini’s live speech-to-speech translation capability is rolling out in a beta experience to the Google Translate app, bringing you real-time audio translation that captures the nuance of human

Github: github.com/0x0funky/MeetLingo… Website: meetlingo.vercel.app

From Twitter

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

Blockbeats

$55,000 will be the lifeline for Bitcoin.

BTC

0.54%

All-in station

The ringleader of a cryptocurrency scam worth approximately $74 million has been sentenced to 20 years in prison.

Coin68

MegaETH launches mainnet, aiming for 50,000 TPS and a Block Time of 10 milliseconds.

MEGA