AI agents have been the special focus of 2025, which was enabled by giving AI the ability to invoke tools. This has been a major transition point, moving from chatbots that are purely conversational to entities that can take actions while operating on a plan.
Computer use represents one application of agentic AI: enabling the AI to use a mouse and keyboard just like a human user would.
This reveals a distinction between "programmatic interfaces" and "graphical interfaces". The majority of human users interact with the latter, clicking on graphics and navigating by visuals. Artificial intelligence, however, is particularly suited for programmatic interfaces.
Consider how AI agents operate within development environments. When an agent deploys an application, runs tests, or manages version control, it interfaces through command-line tools, APIs, and file system operations. These programmatic interfaces are deterministic: a command either succeeds or fails with structured feedback. There's no ambiguity about where a button is located or whether a menu item has been redesigned.
Computer-use agents introduce a fundamentally different paradigm. Rather than direct programmatic access, they interpret visual information through screenshots, use vision models to understand interface layouts, and simulate human interactions through mouse movements and keyboard inputs. This approach mirrors how humans interact with computers, but at a significant cost in efficiency and reliability.
The performance implications are substantial. Programmatic interfaces execute operations directly through API calls, shell commands, and file operations. Computer-use requires a multi-step process: capture screen state, process visual information, interpret interface elements, plan actions, and execute simulated inputs. Each layer adds latency, computational overhead, and potential points of failure.
This raises a fundamental question about the direction of agentic #AI development. Are we optimizing for the right interface layer? For developers and technical workflows, programmatic access is already available and demonstrably more efficient. Computer-use appears to solve for scenarios where programmatic interfaces don't exist, or systems designed exclusively for human interaction.
The most practical architecture may be hybrid: programmatic interfaces as the primary mode, with computer-use as a fallback for exceptional cases. This mirrors how many developers already work, preferring command-line tools and APIs while occasionally resorting to GUI navigation when necessary.
I've been building infrastructure that is designed to be usable by both humans and agents. A core component to this is ensuring the interface is programmatic.Β This doesn't eliminate the possibility of a graphical interface down the line, but the programmatic interface foundation means the agent can directly use the tool without the overhead that comes with GUIs.
