China's first! Embedded data collection "black box" officially open sourced, ending the era of expensive embodied data.

This article is machine translated
Show original
XRZero-G0, China's first open-source black-box system for ontology-free data acquisition, is an independent variable robot system. This project integrates the entire chain of ontology-free data acquisition, quality inspection, training, and real-device evaluation, accompanied by over 2000 hours of multimodal datasets covering 3000 tasks. The core solution involves operators wearing VR devices and multiple cameras for motion capture, eliminating the need for robots on-site. The system ensures data quality through three security checks—three-camera perspective, virtual limiter IK verification, and real-device playback—achieving a data effectiveness rate exceeding 85%. Experiments show that training with a 10:1 ontology-free to real-device data ratio yields results comparable to training with only 500 real-device datasets, reducing acquisition costs to one-twentieth of the original. The system also supports zero-sample cross-ontology transfer, resolving ontology discrepancies in robot deployment.

Article author and source: Leifeng.com

Recently, the embodied industry has been flooded with news about an open-source project.

It started as a rumor circulating in a small circle that "someone has open-sourced an entire embodied dataset in the community." I checked it out of curiosity, but the more I looked, the more suspicious it seemed. This wasn't a simple dataset; it was an entire ontologyless data collection system.

In other words, what others open-source is "a piece of code," while this open-source project is a complete chain of ontology-free data collection, quality inspection, training, and real-device evaluation, along with a multimodal ontology-free dataset of over 2,000 hours covering 3,000 tasks, all packaged and released in its entirety.

China's first! Embedded data collection "black box" officially open sourced, ending the era of expensive embodied data.

China's first! Embedded data collection "black box" officially open sourced, ending the era of expensive embodied data.

Paper link: https://arxiv.org/abs/2604.13001

This is the first time this has happened in the country, so I did some digging into the corresponding paper:

In short, the XRZero-G0 paper did two things: First, it opened the "black box" of robotic data acquisition, demonstrating step-by-step how to collect a set of high-quality data at ultra-low cost. Second, it taught you how to train your data step-by-step.

First, let's talk about data collection. You may have heard that "data collection in the embodied industry is difficult and expensive," and some people have even made outrageous claims that the slow development of embodied industries is all due to data collection issues.

Look at large models; they consume text, which is readily available online. Robots, on the other hand, consume physical data, each piece of which requires significant investment to collect. Moreover, in the past, data collection in the industry had three major pitfalls: high cost, dirty practices, and non-reusability. These constitute the "Blockchain Trilemma" of embodied data.

China's first! Embedded data collection "black box" officially open sourced, ending the era of expensive embodied data.

The XRZero-G0 paper offers a clever solution, the core of which can be summarized in one sentence: people wear the equipment to do the work, and robots are not needed on site.

This approach has been explored before (for example, the UMI paradigm), but it had a fatal flaw: the collected data was like a "black box," meaning you didn't know if the actual device could run it. The XRZero-G0, however, passed three "security checks," turning the black box into a transparent white box.

The first security checkpoint: three cameras.

In the past, handheld data acquisition devices only had single or dual-view perspectives. This had a drawback: if the hands were crossed or an object was blocked by the arms, the data would be lost on the spot. The XRZero-G0 takes a straightforward approach: the operator wears a PICO VR headset with a global camera on the top of the head and a camera on each wrist.

China's first! Embedded data collection "black box" officially open sourced, ending the era of expensive embodied data.

With these three perspectives plus six degrees of freedom pose information, plus spatiotemporal alignment using backpack edge calculations, the accuracy is directly ≤4 millimeters. No matter how you turn around, bend over, or walk, occlusion and drift issues will not occur.

China's first! Embedded data collection "black box" officially open sourced, ending the era of expensive embodied data.

The second security check: Install a virtual limit switch.

Everyone knows that human joints are flexible enough to do yoga, but robots can't. Previously, during remote operation, I performed a movement the robot couldn't, and the motors burned out. The XRZero-G0 is very clever; it incorporates automatic inverse kinematics (IK) verification to filter out movements that exceed the joint's limits.

The third security check: playback of the actual device.

After the first two screening processes, the system will randomly select a portion of the data and send it directly to a real dual-arm robot for "open-loop playback". Only when the robot successfully completes the task will this batch of data be considered stored in the database.

After three layers of sieving, the validity rate of the data entering the database has been increased to over 85%, with the same usability as real machine data, and the collection speed is even faster.

According to the data in the paper, simple tasks were reduced from 35 seconds to 15 seconds, a speedup of 2.33 times; complex tasks were also 1.71 times faster. Peak acquisition speed reached 93.2 tracks per hour. Isn't this better than a real device?

China's first! Embedded data collection "black box" officially open sourced, ending the era of expensive embodied data.

However, the above only teaches "how to collect data better". The more crucial part of the XRZero-G0 paper is teaching everyone "how to train" the data.

In embodied training, everyone knows that "cheap data without a physical body" and "expensive data from real machines" should be used together, but how should the proportions be matched? In the past, it all depended on alchemy.

The XRZero-G0 team did something particularly solid: they conducted systematic and exhaustive experiments, and ultimately discovered a "golden ratio".

Before this, they compared three options:

▪ 500 pure machine data points (baseline)

▪ 500 real devices + 500 devices without the original (1:1)

▪ 50 real devices + 500 devices without the device (10:1 ratio)

The results were unexpected: the 10:1 approach achieved a success rate comparable to, or even higher than, that of a baseline of 500 pure machine data. In simpler terms: you reduce the amount of real machine data used by 90%, and the total cost to one-twentieth of the traditional method, while still producing a just as intelligent model. This represents a 20-fold leap in cost-efficiency.

The paper explains the reason behind this, which is called the "few-sample physical anchoring effect".

China's first! Embedded data collection "black box" officially open sourced, ending the era of expensive embodied data.

That's not all. The model trained on this data can also achieve "zero-sample" cross-ontology transfer.

As mentioned earlier, the biggest problem with traditional teleoperated systems is the movement of the operator. If the table is raised by ten centimeters, or if a different robot is used, the system will fail completely. However, the XRZero-G0 is a backpack-style system, and as the operator moves around, the perspective, height, and lighting naturally change dynamically during the data acquisition process. This rich "noise" actually makes the model extremely robust.

The paper reveals some truly impressive details: the model trained on this mixed data was deployed directly to EX001 and CX001 without ever seeing any real machine data, and it performed tasks such as flower arranging, towel folding, and sausage stuffing without any problems.

China's first! Embedded data collection "black box" officially open sourced, ending the era of expensive embodied data.

Let me briefly share my thoughts on XRZero-G0. The core of this paper is to break down and explain to practitioners, like an instruction manual, how to collect data at low cost and how to use data efficiently.

Everyone can sense that the embodied industry is shifting from "competing on demos" to "competing on data." However, the industry lacks consensus and direction on how to maximize the amount of time spent on projects. XRZero-G0 has taught the industry the entire process, from "easier data collection" and "finding the perfect data ratio" to ultimately achieving "zero-sample cross-ontology migration."

This kind of engineering work cannot be accomplished by a single university laboratory or a star scholar; it requires an industry team that understands both academia and industry.

The company behind XRZero-G0 is X-Square Robot.

To understand why IndependentVariable was able to build XRZero-G0, look at their path selection. From Day One, the company chose a large end-to-end model, simultaneously exploring VLA, WM, and WUM routes. Those in the industry know that this approach simply won't work without solid infrastructure capabilities. Therefore, from WALL-OSS to XRZero-G0, IndependentVariable has consistently built infrastructure related to infrastructure.

This path may be difficult, but it's the right one. Just look at the capital markets: in less than two years, the company has completed nine rounds of financing, achieving a valuation of over 10 billion, with four major companies—ByteDance, Meituan, Alibaba, and Xiaomi—on its shareholder list.

The reason why XRZero-G0 is fully open source is even simpler and more direct.

A truly embodied "ChatGPT moment" cannot be created by a single company. When universities, small and medium-sized teams, and individual developers can all use the XRZero-G0 standardized toolchain to generate data in batches, the data flywheel of the entire industry will truly begin to turn, and at that time, the moat of independent variables will be built.

Here's a link to XRZero-G0's GitHub page at the end of this article. I recommend checking it out:

https://github.com/X-Square-Robot/XRZero-G0

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments