HyperAI
Back to Headlines

MIT researchers create vision-based system that lets robots "see" and understand their own bodies

8 days ago

A groundbreaking vision-based system developed by MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) enables robots to autonomously understand their physical form and movements. In a lab setting, a soft robotic hand manipulates objects without embedded sensors or traditional programming, relying solely on a single camera to capture visual data and guide its actions. This innovation, named Neural Jacobian Fields (NJF), represents a shift from rigid, sensor-dependent robotics toward machines that learn control through observation, mimicking how humans develop body awareness. The system addresses a critical challenge in robotics: the difficulty of creating accurate control models for soft, deformable, or irregularly shaped robots. Conventional approaches often require extensive sensor arrays or precise mathematical models, which limit design flexibility. NJF instead allows robots to infer their own internal mechanics by analyzing visual feedback from random movements. This eliminates the need for pre-programmed sensor data or hardware modifications, enabling more creative and adaptive robot designs. At the core of NJF is a neural network that maps two key aspects of a robot’s “body”: its 3D structure and how it responds to control inputs. It builds on neural radiance fields (NeRF), a technique that reconstructs 3D environments from images, by adding a Jacobian field—a mathematical function that predicts how each part of the robot moves when given motor commands. During training, the robot performs unstructured motions while cameras record its movements. The system then autonomously deduces the relationship between control signals and physical responses, creating a self-model without human intervention or prior knowledge of the robot’s design. Tests demonstrated the system’s versatility. Researchers applied NJF to a soft pneumatic hand, a rigid robotic hand, a 3D-printed arm, and a sensorless rotating platform. In each case, NJF accurately learned the robots’ geometries and control dynamics using only visual data. Once trained, the system requires just a single camera for real-time operation, processing at 12 Hertz—a speed suitable for soft robots, which often struggle with the computational demands of physics-based simulators. The implications are significant. NJF could reduce reliance on costly sensors and complex programming, making robotics more accessible. Potential applications include precision tasks in agriculture, construction site operations, and dynamic environments where traditional methods falter. For example, drones could navigate indoor or underground spaces without maps, and mobile robots could handle cluttered spaces like homes or warehouses. While the current setup requires multiple cameras and retraining for each robot, the team envisions a future where users could capture a robot’s movements with a smartphone, similar to recording a rental car’s condition. This would democratize access, allowing hobbyists or developers to create control models without specialized equipment. However, challenges remain. NJF lacks tactile or force-sensing capabilities, limiting its effectiveness for tasks involving physical contact. Researchers are exploring ways to improve generalization across different robots, handle occlusions, and extend the model’s ability to reason over longer timeframes. “Just as humans intuitively understand their bodies, NJF gives robots that self-awareness through vision,” says lead researcher Sizhe Lester Li, a MIT PhD student. The work underscores a broader trend in robotics: moving away from manual modeling toward self-supervised learning. As the field advances, systems like NJF could bridge the gap between theoretical control and real-world adaptability, paving the way for more flexible, affordable, and autonomous machines.

Related Links