This article is machine translated
Show original

Thinking Machines, the company of former OpenAI CTO Mira, has released a highly innovative model called the Interactive Model. This model can continuously receive native multimodal content such as audio, video, and text, and think, respond, and act in real time. Unlike previous agent scaffolding that strung together multiple models and modalities through agents, this model integrates all modalities within a single entity. This allows users to interact with the AI in real time across any modality: You can interrupt it at any time, add comments, and the AI will monitor your state and output results accordingly, unlike before when you had to wait for a sentence to end before interacting with the model. The core idea is to train the interaction mechanism into the model. Their interaction model, trained from scratch, mainly consists of two parts: Front-end interaction model: (a) Always online, constantly listening to, watching, and reading user-provided content. (b) Processing input and producing a small output every 200 milliseconds as a node. (c) Responsible for maintaining the user's presence, supporting user interruptions and interjections, and reacting to screen and video content. Back-end inference model: (a) Handling tasks requiring continuous inference, tool calls, and long contexts and planning. (b) The interaction model will reintegrate the results of the inference model into the dialogue at appropriate times, without inserting abrupt content. The final result the user sees is an interface that can both interact in real time and handle heavy tasks.

Thinking Machines
@thinkymachines
05-12
People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. https://thinkingmachines.ai/blog/interaction-models…
From Twitter
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments