browser-pilot
Chrome DevTools Protocol (CDP) browser automation, web scraping, and crawling with React compatibility
Status: β Released (v1.10.0)
Chrome DevTools Protocol (CDP) based browser automation, web scraping, and crawling with daemon-based architecture and Smart Mode for reliable element targeting.
Overview
Browser Pilot provides intelligent browser automation through Chrome DevTools Protocol (CDP) with a daemon-based architecture that maintains persistent browser connections. Features Smart Mode with automatic Interaction Map generation for text-based element search, eliminating brittle CSS selectors.
Key Features:
- π€ Smart Mode - Text-based element search with automatic selector generation
- π Daemon Architecture - Persistent CDP connection for instant command execution
- πΈ Screenshot Capture - Viewport or full-page screenshots
- π Navigation & Interaction - URL navigation, click, fill, type, scroll
- π Data Extraction - Text content, console logs, cookies, accessibility tree
- π PDF Generation - Convert web pages to PDFs
- π Tab Management - List, switch, close tabs programmatically
- π€ Bot Detection Bypass - Maintains
navigator.webdriver = false - βοΈ React/Framework Compatibility - Properly triggers React synthetic events
- βοΈ Chain Mode - Execute multiple commands in a single workflow
Architecture
βββββββββββββββββββ ββββββββββββββββββββ CDP ββββββββββββββββββββββββ
β Claude Code β IPC Commands β Daemon Server βββββββββββββΊβ Chrome Browser β
β (CLI Client) βββββββββββββββββββββΊβ (Background) β WebSocket β (CDP Server) β
β β β β Port 9222+ β β
β - User Request β β - CDP Client β β - Headless/Headed β
β - Command Parseβ β - Map Generator β β - Tab Management β
β - Result Outputβ β - Auto-restart β β - DevTools API β
βββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββ
Key Components:
- Daemon Server: Background process maintaining persistent CDP connection
- CLI Client: Fast command execution via IPC communication with daemon
- Interaction Map: Auto-generated JSON map of interactive elements for Smart Mode
- Chrome Browser: Runs in headless or headed mode with CDP enabled
- Auto-Management: Daemon starts on first command, stops at session end (30-min timeout)
Using the Skill in Claude Code
This plugin provides a skill that Claude can use automatically. You don't need to run commands manually.
Prerequisites
Before Claude can use this skill:
-
Build the TypeScript code (one-time setup):
cd plugins/browser-pilot/skills/scripts npm install npm run build -
Install Google Chrome
- Chrome will auto-launch when Claude uses the skill
- Runs on port 9222 (or next available port)
How It Works
Once set up, Claude will automatically use this skill when you ask for browser automation tasks:
- "Take a screenshot of https://example.com"
- "Extract the title from https://news.ycombinator.com"
- "Click the login button on https://example.com"
- "Run browser tests"
- "Load browser-pilot skill to test login functionality"
Claude handles all the command execution through the SKILL.md interface - you just describe what you want!
Installation
Install Plugin
# Add marketplace
/plugin marketplace add https://github.com/Dev-GOM/claude-code-marketplace.git
# Install plugin
/plugin install browser-pilot@dev-gom-plugins
Or install directly:
/plugin add https://github.com/Dev-GOM/claude-code-marketplace/tree/main/plugins/browser-pilot
The plugin will auto-initialize when you start a Claude Code session (via SessionStart hook).
Prerequisites
- Google Chrome (latest version recommended)
- Node.js 18+ (nodejs.org)
- Git Bash (Windows) or Terminal (macOS/Linux)
Quick Start
Smart Mode (Recommended)
Text-based element search with automatic selector generation - more reliable than CSS selectors:
# Navigate to a page
node .browser-pilot/bp navigate -u "https://example.com/login"
# Click using text content (no quotes for single words)
node .browser-pilot/bp click --text Login --type button
# Fill form using text labels (quotes when text contains spaces)
node .browser-pilot/bp fill --text "Email Address" -v "user@example.com"
node .browser-pilot/bp fill --text Password -v "mypass123"
# Handle duplicate elements with indexing (click 2nd Delete button)
node .browser-pilot/bp click --text Delete --index 2
# Take screenshot
node .browser-pilot/bp screenshot -o "login-page.png"
Chain Mode (Multiple Commands)
Execute multiple commands in a single workflow:
# Login workflow
node .browser-pilot/bp chain \
navigate -u "https://example.com/login" \
fill --text Email -v "user@example.com" \
fill --text Password -v "mypass123" \
click --text Login \
screenshot -o "logged-in.png"
# Scraping workflow
node .browser-pilot/bp chain \
navigate -u "https://example.com" \
wait -s ".content-loaded" \
extract -s ".product-title" \
screenshot -o "products.png"
Direct Mode (CSS Selectors)
For unique IDs or when Smart Mode is unavailable:
# Using CSS selectors
node .browser-pilot/bp click -s "#login-button"
node .browser-pilot/bp fill -s "input[name='email']" -v "user@example.com"
node .browser-pilot/bp extract -s "h1.page-title"
Note: Files are saved to .browser-pilot/ in your project root. Daemon auto-starts on first command and stops at session end.
Available Commands
All commands use the wrapper script: node .browser-pilot/bp <command> [options]
Use --help for detailed options: node .browser-pilot/bp <command> --help
Navigation Commands
navigate -u <url> # Navigate to URL
back # Go back in history
forward # Go forward in history
reload # Reload current page
Interaction Commands (Smart Mode)
# Text-based search (recommended)
click --text <text> [options] # Click element by text
--type <type> # Filter by element type (button, link, input, etc.)
--index <n> # Select n-th match (for duplicates)
--viewport-only # Only search visible elements
fill --text <text> -v <value> # Fill input by label text
--type <type> # Filter by input type
--index <n> # Select n-th match
# Direct selector (fallback)
click -s <selector> # Click by CSS selector
fill -s <selector> -v <value> # Fill by CSS selector
Capture Commands
screenshot -o <filename> # Capture screenshot
--full-page # Capture entire page (default: viewport)
pdf -o <filename> # Generate PDF
--landscape # Use landscape orientation
Data Extraction
extract -s <selector> # Extract text from elements
--all # Extract all matches (default: first)
content # Get full page text content
console # Get console messages
cookies # Get cookies
Chain Mode
chain <cmd1> <cmd2> ... # Execute multiple commands
--timeout <ms> # Map wait timeout (default: 10000)
--delay <ms> # Fixed delay between commands
Query Commands
query --text <text> # Find elements in Interaction Map
query --list-types # List all element types
map-status # Check map status
regen-map # Force regenerate map
Other Commands
wait -s <selector> -t <ms> # Wait for element
scroll -s <selector> # Scroll to element
eval -e <expression> # Execute JavaScript
tabs # List all tabs
Daemon Management
daemon-status # Check daemon status
daemon-stop # Stop daemon manually
For complete command reference, see references/commands-reference.md
Configuration
Shared Config File
Location: {plugin-folder}/skills/browser-pilot-config.json
The plugin uses a shared configuration system that manages multiple projects:
{
"projects": {
"my-project": {
"rootPath": "/path/to/my-project",
"port": 9222,
"outputDir": ".browser-pilot",
"lastUsed": "2025-11-04T12:00:00.000Z",
"autoCleanup": false
},
"another-project": {
"rootPath": "/path/to/another-project",
"port": 9223,
"outputDir": ".browser-pilot",
"lastUsed": "2025-11-04T11:00:00.000Z",
"autoCleanup": false
}
}
}
Features:
- Automatic project registration via SessionStart hook
- Unique port allocation per project (9222-9322)
- Project identification by folder name
- Optional cleanup via SessionEnd hook (when
autoCleanup: true)
Output Directory
All screenshots and PDFs are automatically saved to:
.browser-pilot/(in project root)- Auto-created at session start with
.gitignoreto exclude generated files
Example Workflows
Login Workflow (Smart Mode + Chain)
# Complete login workflow using text-based search
node .browser-pilot/bp chain \
navigate -u "https://example.com/login" \
fill --text Email -v "user@example.com" \
fill --text Password -v "mypass123" \
click --text "Sign In" --type button \
screenshot -o "logged-in.png"
Screenshot Capture
# Full-page screenshot
node .browser-pilot/bp chain \
navigate -u "https://github.com" \
screenshot -o "github-homepage.png" --full-page
Form Automation (Smart Mode)
# Contact form submission using text labels
node .browser-pilot/bp chain \
navigate -u "https://example.com/contact" \
fill --text "Your Name" -v "John Doe" \
fill --text "Email Address" -v "john@example.com" \
fill --text Message -v "Hello from Browser Pilot!" \
click --text Submit --type button \
screenshot -o "contact-submitted.png"
Web Scraping
# Extract multiple elements
node .browser-pilot/bp chain \
navigate -u "https://news.ycombinator.com" \
extract -s "a.storylink" --all
PDF Generation
# Generate landscape PDF
node .browser-pilot/bp chain \
navigate -u "https://docs.example.com" \
pdf -o "api-docs.pdf" --landscape
Query Interaction Map
# Find specific elements
node .browser-pilot/bp query --text "Login"
node .browser-pilot/bp query --list-types
# Check map status
node .browser-pilot/bp map-status
Bot Detection Avoidance
Browser Pilot maintains navigator.webdriver = false and properly triggers React synthetic events, bypassing most anti-bot systems.
Chain Mode automatically adds 300-800ms random delays between commands to mimic human behavior.
Test Bot Detection:
node .browser-pilot/bp chain \
navigate -u "https://bot.sannysoft.com" \
screenshot -o "bot-test.png"
Expected: All checks PASS (green).
Best Practices
- π Use Smart Mode by default - Text-based search (
--text Login) is more stable than CSS selectors - Use Chain Mode for workflows - Automatic delays and streamlined execution
- Daemon auto-manages - No manual start/stop needed
- Prefer text-based search - More resilient to UI changes than CSS selectors
- Handle duplicates with indexing -
--index 2for the 2nd matching element - Filter by type for precision -
--type buttonnarrows results - Verify visibility -
--viewport-onlyensures element is on screen - Check console on errors -
node .browser-pilot/bp consolefor debugging - Use
--helpfor guidance -node .browser-pilot/bp <command> --help
Troubleshooting
Build Errors
cd plugins/browser-pilot/skills/scripts
npm install && npm run build
Chrome Not Found
Browser Pilot automatically detects Chrome installation. If it fails:
Windows:
- Default:
C:\Program Files\Google\Chrome\Application\chrome.exe - User:
%LOCALAPPDATA%\Google\Chrome\Application\chrome.exe
macOS:
/Applications/Google Chrome.app/Contents/MacOS/Google Chrome
Linux:
/usr/bin/google-chrome/usr/bin/chromium
Port Already in Use
Browser Pilot auto-increments debug port (9222 β 9223 β ...) if port is busy.
Connection Timeout
Default timeout is 10 seconds (20 attempts Γ 500ms). To increase, modify src/cdp/browser.ts:
const maxAttempts = 40; // Change from 20 to 40 for 20 seconds timeout
Element Not Found
- Verify selector with browser DevTools (
F12β Console βdocument.querySelector("...")) - Wait for page load with
sleep 1before interacting - Use more specific selectors (
#id>.class)
Technical Details
Chrome DevTools Protocol
Browser Pilot uses CDP for all browser operations:
Request:
{
"id": 1,
"method": "Page.navigate",
"params": {
"url": "https://example.com"
}
}
Response:
{
"id": 1,
"result": {
"frameId": "frame-id-here"
}
}
Ethical Usage
Browser Pilot is for authorized automation only. Do NOT use for:
- Unauthorized access or credential theft
- DDoS attacks or server overload
- Copyright infringement (scraping copyrighted content)
- Bypassing paywalls or access controls
- Automated account creation (spam, fake accounts)
- Credential stuffing
Always respect robots.txt and website terms of service.
Performance
Command Chaining:
- Browser launches once and stays open (fast)
- Each command reuses the same browser instance
- No page refresh when URL is omitted (v0.1.6+) - preserves page state
- ~100-200ms overhead per command (npm startup)
- CDP communication is near-instant (milliseconds)
- Add
sleepdelays to avoid bot detection (0.5-2 seconds recommended) - Total workflow time = initial page load + sleep delays + command overhead
License
Apache License 2.0 - see LICENSE and NOTICE for details
Contributing
This plugin is part of the Dev GOM Plugins marketplace. Contributions are welcome!
Documentation
- π Quick Reference: SKILL.md - Concise guide for Claude Code
- π Detailed Guides:
- Command Reference - All 52+ commands with examples
- Selector Guide - Smart Mode strategies and best practices
- Interaction Map - Map system details and query API
Support
- π Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions
- π§ Development Guide: CLAUDE.md - Plugin development guidelines
Note: Browser Pilot v1.4.0 is production-ready and actively maintained. Report bugs or request features via GitHub Issues.
πQuick Installation
/plugin install browser-pilot@dev-gom-pluginsRun this command in Claude Code to install the plugin.