Core Concepts
A short tour of the ideas that make Vibium feel different from older browser automation tools.
The browser daemon
Section titled “The browser daemon”Vibium runs a long-lived daemon that owns the browser process. Each vibium
command is a small client that talks to that daemon over a local socket. Two
practical consequences:
- Commands are fast — there is no per-command startup cost.
- State persists between commands — cookies, the current page, the active tab, scroll position, and element references all carry over.
The daemon shuts down on demand, when you explicitly stop it, or when the session ends.
From a script using a client library, always pair browser.start() with a
matching browserSession.stop() (or the language’s equivalent) so the daemon
doesn’t outlive the script — running the same script twice in a row otherwise leaves
orphaned browser processes around.
Element references (@eN)
Section titled “Element references (@eN)”Most UI automation tools want a CSS selector for every interaction. Vibium takes a different approach: it numbers the interactive elements on the current page and lets you refer to them by short, stable IDs.
@e1 link "Sign in"@e2 input placeholder="Email"@e3 button "Continue"You get these IDs by running vibium map, or by calling
vibium find ... which returns a reference for each match.
References are stable across commands as long as the page does not change
substantially. Each map or find refreshes the current reference set, so
an @eN reference only means what it meant in the last result you saw. When
the DOM shifts, run map again (or diff map to see what moved) to refresh
them.
Semantic finding
Section titled “Semantic finding”Vibium’s find subcommands match elements the way a human would describe
them: visible text, form labels, placeholders, ARIA roles. CSS selectors are
intentionally not the primary interface — they are brittle and they don’t
match how an agent reads a page.
| Subcommand | Matches |
|---|---|
vibium find text "Sign in" | Visible text content |
vibium find label "Email" | Inputs whose label is “Email” |
vibium find placeholder "Search" | Inputs with that placeholder |
vibium find role button | Elements with that ARIA role |
Verbs and subverbs
Section titled “Verbs and subverbs”A few Vibium commands are actually small command groups:
vibium findhas subcommandstext,label,placeholder,role.vibium waitis overloaded —vibium wait "<selector>"waits for a CSS selector, whilevibium wait text "<text>"andvibium wait url "<path>"use named subcommands.vibium recordhasstartandstop.
That means vibium wait "h2" and vibium wait text "h2" do different
things: the first waits for any element matching the CSS selector h2, the
second waits for the literal string h2 to appear in the visible page.
When in doubt, the Command Reference shows the
exact synopsis for each command.
Standards-based protocol
Section titled “Standards-based protocol”Under the hood, Vibium speaks WebDriver BiDi, the W3C bidirectional WebDriver protocol. That means:
- It is a standard, not a vendor-specific debugging protocol.
- Future browser support comes “for free” as more browsers ship BiDi.
- You can mix Vibium with other BiDi-aware tools if you ever need to.
Capture vs. interaction
Section titled “Capture vs. interaction”Vibium splits into two clean halves:
- Interaction —
go,click,fill,select,check,press,wait. - Capture —
text,screenshot,pdf,eval,record.
This makes it easy to reason about side effects: capture commands never change the page; interaction commands always do.
MCP server mode
Section titled “MCP server mode”vibium mcp starts an MCP (Model Context Protocol) server that exposes the
same commands as MCP tools. Plug it into Codex, Claude Code, Cline, Cursor, or
another MCP-aware client and the browser becomes part of the agent’s tool
inventory.
See MCP Server Integration.