← All Labs
Infrastructure for autonomous agents

Agent OS

Building the orchestration layer for AI agents — browser control, desktop automation, inter-agent communication, and reliable tool use.

$

Agents that can actually do things — browse the web, fill out forms, use desktop applications, coordinate with other agents — need infrastructure that doesn't exist yet.

Current agent frameworks are good at chains and RAG but break down when you need an agent to click a button on a website, read the result, decide what to do next, and maybe hand off to a specialized agent for part of the task.

We're building the plumbing: browser automation that works reliably, computer use APIs that feel natural, protocols for agents to talk to each other, and orchestration that doesn't fall over when one agent fails.

What we're exploring

01Browser automation primitives
02Computer use (keyboard, mouse, screen)
03Inter-agent communication protocols
04MCP server implementations
05Fault-tolerant orchestration
06Shared context and memory

Experiments

What we're building, testing, and learning.

Playwright + Claude integration

Connecting Claude's computer use capability to Playwright for reliable browser automation. Goal: agents that can navigate real websites without custom per-site code.

Insight:Screenshots every action adds latency but dramatically improves reliability. Working on selective screenshot strategies.

MCP server library

Building reusable MCP servers for common tools: file system, databases, APIs, calendar, email. Focus on making them production-ready, not just demos.

Insight:The MCP spec is clean but real-world tools have messy edges. Handling auth, rate limits, and partial failures is 80% of the work.

Multi-agent handoff protocol

Designing how agents pass tasks and context to each other. When should a 'researcher' agent hand off to a 'writer' agent? What context transfers?

Agent memory and context persistence

How do you give an agent memory that persists across sessions? Exploring embeddings, structured summaries, and explicit knowledge graphs.

Insight:Hybrid approach works best: structured facts + semantic search over conversation history. Pure embedding search misses obvious things.

Tech we're using

PlaywrightPuppeteerClaude APIMCP SDKRedisPostgreSQLFastAPI

Open questions

Things we're still figuring out.

?

How do you debug an agent that's autonomous? Traditional logging doesn't capture 'why it decided that'.

?

What's the right granularity for agent specialization? Lots of narrow agents vs. fewer capable ones?

?

How do you handle graceful degradation when an agent can't complete its task?

?

Can agents learn to use new tools from documentation alone?

Interested in this research? Have a related problem?

Let's talk →Reach out to us at info@deepklarity.com