Physical intelligence

Robotics × AI

Teaching machines to understand and act in physical environments. Edge inference, real-time adaptation, safety-critical systems.

We're connecting LLMs and vision models to physical robots — running inference at the edge, fusing sensor data in real-time, and building systems that adapt when the environment changes.

The focus is practical: warehouse automation that handles unexpected obstacles, inspection systems that flag anomalies, and collaborative robots that respond to natural gestures and voice commands.

What we're exploring

01Edge inference and sensor fusion

02Vision-language control loops

03Adaptive motion through LLM reasoning

04Real-time safety monitoring

05Sim-to-real transfer

06Multi-modal perception

Experiments

What we're building, testing, and learning.

VLM-guided pick-and-place

Using Claude Sonnet vision to identify and describe objects, then translating that understanding into robotic arm movements. Testing on a 6-DOF arm with RGB-D camera.

Insight:Latency is the killer — current cloud VLMs add 2-3s per decision, which breaks any real-time illusion. Exploring local models.

LLM motion planner

Feeding obstacle descriptions to an LLM and having it generate trajectory waypoints in natural language, then parsing to robot commands.

Insight:Works surprisingly well for high-level planning. Falls apart for precise manipulation. Best used as a 'strategy layer' above traditional planners.

Safety boundary learning

Training a small model to predict 'this movement is probably unsafe' from camera + proprioception data. Goal: cheap safety net that catches edge cases.

Verbal instruction interface

Natural language commands → robot actions. 'Pick up the red thing next to the keyboard' style interaction.

Insight:Grounding is hard. The robot needs to know what 'next to' means in 3D space.

Tech we're using

ROS2PyTorchIsaac SimClaude SonnetWhisperONNXJetson Orin

Open questions

Things we're still figuring out.

How do you maintain safety guarantees when an LLM is in the control loop?

What's the right abstraction level for LLM→robot communication?

Can we get VLM inference fast enough for real-time adaptation (< 100ms)?

How much can sim-to-real transfer reduce the need for physical robot training?

Other research

Infrastructure for autonomous agents

Agent OS

Building the orchestration layer for AI agents — browser control, desktop automation, inter-agent communication, and reliable tool use.

→

Spatial computing meets intelligence

VR / XR × AI

Exploring AI-native interfaces for immersive environments. Generative 3D, spatial understanding, voice-first interaction, and intelligent virtual worlds.

→

Interested in this research? Have a related problem?

Let's talk →Reach out to us at info@deepklarity.com