OpenHands: The Open-Source Autonomous Coding Agent That Actually Ships

A Deep Technical Review of OpenHands (Formerly OpenDevin)

When Cognition Labs announced Devin in early 2024 as the "first AI software engineer," the demo videos were impressive but the product was closed-source, expensive, and waitlisted. Within weeks, an open-source alternative emerged — initially called OpenDevin, now rebranded as OpenHands to establish its own identity.

The rebrand was smart. OpenHands isn't a Devin clone. It's a fundamentally different approach to autonomous coding agents — one that's modular, self-hosted, and transparent about what it can and can't do. After spending several weeks running it against real projects, here's what I found.

What OpenHands Actually Is

OpenHands is an open-source platform for building and running autonomous software development agents. It's not just a single agent — it's a framework that provides:

A runtime sandbox for safe code execution
An agent-controller architecture that manages agent behavior
Multiple interaction modalities: terminal, browser, file system, and code editor
A web-based UI for monitoring and interacting with agents
An API for programmatic control

The project lives at github.com/All-Hands-AI/OpenHands and has accumulated over 40,000 GitHub stars — a signal that the developer community sees real value here, not just hype.

Architecture: Where OpenHands Gets Interesting

OpenHands uses an event-driven microservices architecture that's more sophisticated than most AI coding tools. Understanding this architecture is key to understanding its capabilities and limitations.

Core Components

┌─────────────────────────────────────────────────┐
│                   Web UI / API                    │
├─────────────────────────────────────────────────┤
│                  Agent Controller                 │
│  ┌──────────┐  ┌──────────┐  ┌──────────────┐   │
│  │  Agent    │  │  Event   │  │  LLM         │   │
│  │  Runtime  │  │  Stream  │  │  Integration │   │
│  └──────────┘  └──────────┘  └──────────────┘   │
├─────────────────────────────────────────────────┤
│              Docker Sandbox Runtime               │
│  ┌──────────┐  ┌──────────┐  ┌──────────────┐   │
│  │ Terminal  │  │ Browser  │  │ File System  │   │
│  │ Execution │  │ (Playwright)│  │ Operations │   │
│  └──────────┘  └──────────┘  └──────────────┘   │
└─────────────────────────────────────────────────┘

The Event Stream is the backbone. Every action — a file edit, a terminal command, a browser navigation — generates an event. The agent observes these events and decides what to do next. This is fundamentally different from a simple prompt-response loop. It means the agent has persistent context and can reason about sequences of actions over time.

The Agent Controller manages the lifecycle of an agent session. It handles:

LLM call orchestration and retry logic
Token budget management
Action validation before execution
State persistence across interactions

The Runtime Sandbox is a Docker container that provides isolated execution. This is non-negotiable for autonomous agents — you don't want an LLM-generated rm -rf / hitting your host system. OpenHands spins up a fresh container per session with configurable resource limits.

Agent Implementations

OpenHands ships with several agent implementations, and this is where the framework aspect shines:

Agent	Description	Best For
CodeActAgent	Primary agent that executes code actions directly	General-purpose coding tasks
BrowsingAgent	Specialized for web browsing and interaction	Research, API docs, web scraping
DummyAgent	Minimal agent for testing	Development and debugging

The CodeActAgent is the workhorse. It uses a ReAct-style reasoning loop where the LLM observes the current state, reasons about what to do, and executes an action. The key insight is that "actions" in OpenHands aren't just text — they're structured operations:

# Example of how actions are represented internally
class CmdRunAction(Action):
    command: str
    thought: str = ""
    blocking: bool = False
    hidden: bool = False

class FileWriteAction(Action):
    path: str
    content: str
    thought: str = ""

class BrowseURLAction(Action):
    url: str
    thought: str = ""

This structured action space is what makes OpenHands more reliable than a raw LLM-in-a-loop approach. The agent can't generate arbitrary code to execute — it must choose from a defined set of actions, each with validation.

Capabilities: What It Can Actually Do

Let's be specific about what OpenHands handles well and where it struggles.

What Works Well

1. Repository-Level Understanding

OpenHands can clone a repo, explore its structure, read existing code, and make changes that respect the project's conventions. I pointed it at a medium-sized Python project (~15k lines) and asked it to add error handling to an API endpoint. It correctly identified the existing patterns, used the same exception classes, and added appropriate logging.

2. Multi-Step Task Execution

For tasks that require multiple steps — "find the bug in this test, fix it, run the test suite, and verify the fix" — OpenHands performs reasonably well. The event-driven architecture means it can chain actions together and adapt based on intermediate results.

3. Browser Interaction

The integrated Playwright-based browser is genuinely useful. The agent can:

Navigate documentation sites
Read API references
Interact with web-based tools
Debug frontend issues by observing rendered pages

This is a capability that pure terminal-based agents simply don't have.

4. Environment Setup

OpenHands can install dependencies, configure environments, and handle the messy setup work that often precedes actual coding. It reads package.json, requirements.txt, pyproject.toml, etc., and takes appropriate action.

Where It Struggles

1. Complex Architectural Decisions

Don't expect OpenHands to refactor a monolith into microservices. It handles well-scoped, concrete tasks far better than open-ended architectural work. The LLM backbone (regardless of which one you use) doesn't maintain enough context or reasoning depth for large-scale design decisions.

2. Debugging Non-Obvious Issues

When a bug requires understanding subtle interactions between components — race conditions, memory management issues, complex state machines — OpenHands often enters loops of trying different fixes without converging on the root cause. It lacks the deep debugging intuition that experienced developers rely on.

3. Long-Running Sessions

Token limits and context window constraints become real problems on complex tasks. OpenHands has some context management strategies, but sessions that require reading and understanding many files will degrade in quality as the conversation grows.

4. Test Generation Quality

While OpenHands can write tests, the quality is inconsistent. It tends toward testing the happy path and misses edge cases that a thoughtful developer would catch. Generated tests sometimes assert on implementation details rather than behavior.

Setting Up OpenHands

Prerequisites

Docker: OpenHands runs agents in Docker containers. Docker Desktop or Docker Engine is required.
Python 3.11+: For running the backend.
Node.js 18+: For the frontend (if building from source).
An LLM API key: Anthropic Claude, OpenAI GPT-4, or compatible API.

Quick Start with Docker

The fastest way to get running:

docker run -it --pull=always \
    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.9-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands-state:/.openhands-state \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
    ghcr.io/all-hands-ai/openhands:0.9

This pulls the latest image and starts the web UI on http://localhost:3000.

From Source (Development Setup)

git clone https://github.com/All-Hands-AI/OpenHands.git
cd OpenHands

# Backend setup
make build

# Install frontend dependencies
cd frontend && npm install && cd ..

# Start the application
make run

Configuration

On first launch, you'll configure your LLM provider. OpenHands supports:

Provider	Recommended Model	Notes
Anthropic	`claude-sonnet-4-20250514`	Best overall performance for coding tasks
OpenAI	`gpt-4o`	Strong, but sometimes verbose in reasoning
Azure OpenAI	Various	For enterprise deployments
Local (Ollama)	`deepseek-coder-v2`, `codellama`	Free but significantly less capable

For the best experience, use Claude Sonnet or GPT-4o. I tested with several models and the quality difference is substantial — cheaper models like GPT-3.5-turbo produce agents that loop frequently and make more errors.

You can configure the LLM in the settings UI or via environment variables:

export LLM_MODEL="anthropic/claude-sonnet-4-20250514"
export LLM_API_KEY="your-api-key-here"

Runtime Configuration

The sandbox runtime can be customized:

# config.toml
[core]
workspace_base = "/path/to/your/projects"
max_iterations = 100
sandbox_timeout = 1200

[sandbox]
container_image = "ghcr.io/all-hands-ai/runtime:0.9-nikolaik"
enable_browser = true
user_id = 1000

Real-World Usage: A Practical Walkthrough

Here's what a typical session looks like. I asked OpenHands to add rate limiting to a FastAPI application:

Step 1: Agent explores the project

The agent started by reading the project structure:

Action: CmdRunAction("find . -type f -name '*.py' | head -20")
Action: FileReadAction("app/main.py")
Action: FileReadAction("app/routers/users.py")
Action: FileReadAction("requirements.txt")

Step 2: Agent researches the approach

Using the browser, it navigated to the FastAPI documentation to check rate limiting options, then read the slowapi library docs.

Step 3: Agent implements the solution

# Agent wrote this to app/middleware/rate_limit.py
from slowapi import Limiter
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded
from fastapi import Request
from fastapi.responses import JSONResponse

limiter = Limiter(key_func=get_remote_address)

async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
    return JSONResponse(
        status_code=429,
        content={"detail": "Rate limit exceeded. Try again later."}
    )

Step 4: Agent integrated and tested

It modified main.py, installed the dependency, and ran the test suite. One test failed — it hadn't accounted for the new middleware in a mock. It fixed the test and re-ran.

Total time: ~4 minutes. Total LLM cost: roughly $0.80 in API calls.

This is where OpenHands shines. The task was well-scoped, the implementation was standard, and the agent could verify its own work.

How OpenHands Compares to Commercial Alternatives

OpenHands vs. Cognition Devin

Dimension	OpenHands	Devin
Access	Free, open-source, self-hosted	Waitlist, $500/month
Architecture	Modular, extensible	Proprietary, closed
LLM Choice	Bring your own (Claude, GPT-4, etc.)	Locked to their stack
Sandbox	Docker-based, fully configurable	Proprietary cloud environment
UI	Web-based, functional but basic	Polished, Slack integration
Performance	Variable (depends on LLM and config)	Consistently optimized
Privacy	Full control — code never leaves your infra	Code processed on their servers
Community	Active open-source community	Corporate support

Honest assessment: Devin's demo videos are more impressive than what I've seen OpenHands produce on equivalent tasks. Cognition has likely invested heavily in prompt engineering, fine-tuning, and workflow optimization that OpenHands doesn't replicate. However, the gap is smaller than the marketing suggests, and for many practical tasks, OpenHands produces comparable results at a fraction of the cost.

OpenHands vs. GitHub Copilot Workspace

Copilot Workspace takes a different approach — it's more integrated into the GitHub workflow and less autonomous. It proposes changes based on issues and lets you review before execution. OpenHands is more "hands-off" — you give it a task and it runs.

For developers who want control, Copilot Workspace may be preferable. For tasks where you want to delegate entirely (especially repetitive ones), OpenHands is the better fit.

OpenHands vs. Cursor / Windsurf / Aider

These tools are AI-assisted coding — they work alongside you in your editor. OpenHands is autonomous coding — it works independently. Different use cases entirely.

I use Cursor daily for active development. I use OpenHands for tasks I want to hand off completely: writing boilerplate, fixing straightforward bugs, adding standard features to well-understood codebases.

The Economics of Self-Hosted Agents

One of OpenHands' biggest advantages is cost transparency. Here's a realistic cost breakdown for a team running OpenHands:

Monthly estimate (moderate usage):
├── LLM API costs: $50-200 (depends on task volume)
├── Infrastructure: $0 (runs on existing dev machines)
├── Licensing: $0 (MIT license)
└── Total: $50-200/month

Compare to Devin:
├── Per seat: $500/month
├── Team of 5: $2,500/month
└── Annual: $30,000

The cost advantage is significant, but it comes with a hidden cost: you're the support team. When OpenHands loops on a task, when the Docker runtime behaves unexpectedly, when the LLM provider has an outage — that's your problem to solve. With a commercial product, you file a support ticket.

Limitations and Honest Criticisms

1. Documentation is uneven. The getting-started docs are solid, but advanced configuration and agent customization documentation is sparse. You'll spend time reading source code.

2. Error handling in the agent loop is fragile. When the LLM produces malformed output (which happens), the agent sometimes enters retry loops that burn through API credits before timing out. Better guardrails would help.

3. The web UI is functional but not polished. Compared to Devin's slick interface, OpenHands feels like a developer tool (because it is). This matters if you're trying to get non-technical stakeholders to use it.

4. Multi-file changes across large codebases are unreliable. The agent works best when the scope of change is contained. Asking it to refactor a pattern across 30 files will likely produce inconsistent results.

5. No persistent memory across sessions. Each session starts fresh. The agent doesn't learn from previous interactions with your codebase. This is a fundamental limitation of the current architecture.

6. Model dependency. OpenHands' capabilities are bounded by the LLM you choose. When Claude or GPT-4 have bad days (it happens), OpenHands has a bad day. There's no fallback or graceful degradation strategy built in.

Who Should Use OpenHands

Good fit for:

Developers who want autonomous coding without vendor lock-in
Teams with security requirements that prohibit sending code to third parties
Researchers studying autonomous agent architectures
Organizations that want to build custom agents on top of an existing framework
Budget-conscious teams who can trade polish for cost savings

Not ideal for:

Non-technical users who need a polished, guided experience
Teams that need guaranteed SLAs and commercial support
Developers working on highly complex, novel architectures
Anyone who expects it to replace a senior engineer (it won't)

The Bottom Line

OpenHands is the most capable open-source autonomous coding agent available today. It's not a toy — it's a legitimate tool that can handle real development tasks when given appropriate scope and good LLM backends.

The gap between OpenHands and commercial alternatives like Devin is real but narrower than the marketing suggests. For well-scoped tasks — bug fixes, feature additions, boilerplate generation, test writing — OpenHands delivers solid results at a fraction of the cost. For complex, creative software engineering work, no autonomous agent (open-source or commercial) is ready to replace human developers.

The project's trajectory is promising. The architecture is sound, the community is active, and the All-Hands-AI team ships frequently. If you're interested in autonomous coding agents, OpenHands is where you should start experimenting — not because it's free, but because it's transparent. You can see exactly how it works, modify it to your needs, and contribute improvements back.

That transparency is worth more than any polished demo video.

OpenHands v0.9 was used for this review. The project is under active development — capabilities and limitations may change with subsequent releases.

OpenHands: The Open-Source Alternative to Devin That Actually Works