When GPT and MCU are deeply integrated...

avatar
36kr
12-25
This article is machine translated
Show original
Here is the English translation:

Life is not over, and the struggle never ends. Many engineers have tried to combine MCUs with OpenAI's ChatGPT to create chatbots, voice assistants, and natural language interfaces.

A few days ago, when OpenAI officially released the o3 model, they also announced a Realtime API SDK that can be used on Linux and 32-bit MCUs, sparking a heated discussion among engineers.

OpenAI has developed an SDK for 32-bit MCUs

Recently, OpenAI has released an SDK for the Open Realtime API that can be used on microcontrollers like the ESP32 in its official GitHub repository. The project has been developed and tested on the ESP32-S3 and Linux, and developers can use it directly according to the instructions.

This SDK is primarily designed for embedded hardware and has currently only been verified on the Espressif ESP32S3. The SDK is based on OpenAI's latest WebRTC technology and can provide an extremely low-latency voice dialogue experience.

At the release event, OpenAI demonstrated an AI toy with a Christmas theme, in which the MCU used was the ESP32. In the demo, the engineer had four or five rounds of dialogue with the AI toy, which was essentially a natural conversation with no noticeable latency or response time, consistent with the previous web-based demo.

What's on Github?

According to the Github page (https://github.com/openai/openai-realtime-embedded-sdk), the openai-realtime-embedded-sdk is a customized SDK for microcontrollers, allowing developers to implement Realtime API functionality on MCUs like the Peg32.

The SDK has been developed and tested primarily on the Peg32S3 and Linux platforms, so developers can use it directly on Linux without physical hardware.

To use this SDK on hardware, please purchase one of the following microcontrollers. Other MCUs may also be compatible, but this SDK is developed based on the following devices:

Freenove Peg32-S3-WROOM;

Sonatino - Peg32-S3 Audio Development Board.

However, we found that in the examples folder, there are examples for a generic platform and a Raspberry Pi, with the Raspberry Pi folder using a Raspberry Pi 4B, Camera Module, ReSpeaker 2-Mics Pi HAT, and Speaker. So, it's possible that other embedded devices may also gradually support this SDK.

By configuring the Wi-Fi SSID, password, and OpenAI API key, users can easily set up the device and run the program. The key advantage of this SDK is that it provides microcontrollers with the ability to interact with a powerful API, expanding the potential applications of microcontrollers in real-time data processing and decision-making scenarios.

Target audience: The target audience includes embedded system developers, IoT device manufacturers, and researchers who need to implement intelligent decision-making on microcontrollers. The SDK is particularly suitable for users seeking to achieve advanced data processing capabilities on resource-constrained devices due to its easy integration and use.

Example use cases:

Smart home: Implementing voice control functionality on Peg32 using the SDK;

Industrial automation: Using the SDK to enable real-time response to sensor data on microcontrollers;

Research: Utilizing the SDK for real-time inference of machine learning models.

According to the engineer's analysis, the demo is essentially a practical implementation, and the biggest advantage is that the WebRTC protocol API greatly simplifies the process of calling the API for developers. As is well known, embedded development is mostly done in C/C++, which is an old-fashioned language that is particularly troublesome, especially when it comes to real-world business scenarios, with many cases that need to be manually handled. With the use of WebRTC, the demo can be implemented in just a few hundred lines of C code.

Looking at the repo structure, there is only one commit, and the demo code files are only six. The project references a few open-source libraries, including libopus (for audio encoding and decoding), esp-protocols (to control the hardware integrated in the Peg, connect to Wi-Fi, record audio, etc.), and libpeer (for WebRTC communication).

The main program doesn't have much complex content, just calling the packages, turning on Wi-Fi, starting recording and playback, connecting to Wi-Fi, and then connecting WebRTC to the OpenAI API. Each function is implemented in less than 100 lines, and the entire demo, excluding the PC-compatible part, has only about 300 lines of code that the developer actually compiles and runs on the chip.

Why did OpenAI choose Peg32?

The engineer's analysis shows that, based on the product requirements, the control chip for an AI voice dialogue toy has two basic requirements:

Networking capability, whether Wi-Fi or Bluetooth;

Voice processing, supporting recording and playback.

These two are hard requirements, and other functions are not as important, especially the video processing that the Arm field is good at, such as large-screen display, which an AI toy does not need.

Compared to traditional microcontrollers, Peg32 is a new player that has emerged in the smart home era and can perfectly meet these requirements.

First, Peg32 is inexpensive, highly integrated, and costs only a few dollars per chip;

Second, Peg32's design is specifically targeted at low-power scenarios, and can achieve battery life of weeks or even months when paired with a battery;

Third, Peg32 has already integrated Wi-Fi, Bluetooth, and voice processing functions, eliminating the need for external modules, further reducing the complexity and cost of circuit board design, and improving the product's battery life.

Compared to other common microcontroller solutions, although there are many different ways to implement them, the simplest and most effortless solution is to use Peg32. Which hardware engineer would refuse a design that only uses one chip?

More embedded SDKs are on the way

At the "2024 Volcano Engine Winter Force Originator Conference", multiple hardware vendors showcased product demos based on RTC technology. And at this conference, a product manager from Byte mentioned embedded SDKs, although they did not disclose the specific hardware models supported, it is undoubtedly that the SDKs are on the way.

Apex.AI is also working on this. According to Apex.AI, Apex.Grace enhances ROS 2, and Apex.Ida enhances Eclipse iceoryx. Through the Apex.AI SDK for microprocessors, we provide more features, improved functionality, and additional security certifications on top of the open-source projects. As new Apex.AI SDKs for microcontrollers are released, we will continue this successful path based on the open-source projects. It is understood that Apex.AI has currently added Xilinx Ultrascale+ MPSoC and Infineon AURIX TC399 to the new platform as internal projects. Based on experience, adding a new platform only takes a few weeks.

This article is from the WeChat public account "Electronic Engineering World", author: EEWorld, authorized by 36Kr for release.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments