← All Labs
Physical intelligence

Robotics × AI

Teaching machines to understand and act in physical environments. Edge inference, real-time adaptation, safety-critical systems.

We're connecting LLMs and vision models to physical robots — running inference at the edge, fusing sensor data in real-time, and building systems that adapt when the environment changes.

The focus is practical: warehouse automation that handles unexpected obstacles, inspection systems that flag anomalies, and collaborative robots that respond to natural gestures and voice commands.

What we're exploring

01Edge inference and sensor fusion
02Vision-language control loops
03Adaptive motion through LLM reasoning
04Real-time safety monitoring
05Sim-to-real transfer
06Multi-modal perception

Experiments

What we're building, testing, and learning.

VLM-guided pick-and-place

Using Claude Sonnet vision to identify and describe objects, then translating that understanding into robotic arm movements. Testing on a 6-DOF arm with RGB-D camera.

Insight:Latency is the killer — current cloud VLMs add 2-3s per decision, which breaks any real-time illusion. Exploring local models.

LLM motion planner

Feeding obstacle descriptions to an LLM and having it generate trajectory waypoints in natural language, then parsing to robot commands.

Insight:Works surprisingly well for high-level planning. Falls apart for precise manipulation. Best used as a 'strategy layer' above traditional planners.

Safety boundary learning

Training a small model to predict 'this movement is probably unsafe' from camera + proprioception data. Goal: cheap safety net that catches edge cases.

Verbal instruction interface

Natural language commands → robot actions. 'Pick up the red thing next to the keyboard' style interaction.

Insight:Grounding is hard. The robot needs to know what 'next to' means in 3D space.

Tech we're using

ROS2PyTorchIsaac SimClaude SonnetWhisperONNXJetson Orin

Open questions

Things we're still figuring out.

?

How do you maintain safety guarantees when an LLM is in the control loop?

?

What's the right abstraction level for LLM→robot communication?

?

Can we get VLM inference fast enough for real-time adaptation (< 100ms)?

?

How much can sim-to-real transfer reduce the need for physical robot training?

Interested in this research? Have a related problem?

Let's talk →Reach out to us at info@deepklarity.com