Galaxea, a company based in Beijing E-Town, recently open-sourced its open-world dataset—the Galaxea Open-World Dataset (GOD). This is the world's first high-quality real-device dataset for open scenarios, providing robust data to help global developers advance research and applications in humanoid intelligence more efficiently.
"Datasets are a crucial foundation for building multi-task, multi-skill, and multi-environment generalized agents. Real-device data is at the pinnacle of the humanoid intelligence data pyramid and is a key technology for breaking through the ceiling of humanoid intelligence capabilities", said Galaxea's chief scientist, Zhao Xing. "Therefore, we need to take robots into the real world to collect data."
Based on a unified robotic platform -- Galaxea R1 Lite, Galaxea completed data collection tasks in real human living and working environments. The GOD now spans 50 different settings, including residential areas, kitchens, retail spaces, and offices. It comprises 500 hours of high-quality mobile operation data, encompassing over 234 tasks, more than 1,600 operation objects, and 58 operational skills.
The tasks in this dataset include both short-sequence actions such as desktop tidying, object grasping, and appliance operation, as well as long-sequence tasks like bed-making, which require full-body coordination and multi-step reasoning. This adds to the diversity and complexity of the task distribution within the dataset.
During the data collection process, the unified hardware design ensures that all data shares a consistent action space and sensory input, allowing the dataset to maintain consistency in action parameters across different tasks and scenarios. Additionally, the GOD emphasizes multi-angle coverage and natural lighting conditions during collection, ensuring that the sensory information closely resembles real deployment environments, thereby reducing domain adaptation costs.
Compared to most datasets collected in simulated environments or controlled laboratory settings, the GOD offers significant advantages in terms of scene realism, task diversity, and action complexity. This transforms robot training from a "greenhouse test-taker" to a "street-smart practitioner." It directly reflects the real-world challenges robots face in unstructured environments, such as sensory noise, object occlusion, action redundancy, and task interference. As a result, it provides more valuable training signals for improving the generalization and stability of models.
Currently, Galaxea has integrated the GOD with its end-to-end dual-system full-body intelligence VLA model, Galaxea G0, and made it available to developers worldwide. This effectively creates an online "real training base" for the industry, significantly lowering the barriers to embodied intelligence research and development and accelerating the transition of embodied intelligence from laboratory innovation to widespread societal value.