🎨

Dev GOM Plugins

←Back to Plugins
⚑

browser-pilot

Chrome DevTools Protocol (CDP) browser automation, web scraping, and crawling with React compatibility

⚑productivity
v1.10.0
by Dev GOM
browserautomationscrapingcdpchromedevtoolsscreenshotpdfskills
View on GitHub β†’

Status: βœ… Released (v1.10.0)

Language: English | ν•œκ΅­μ–΄

Chrome DevTools Protocol (CDP) based browser automation, web scraping, and crawling with daemon-based architecture and Smart Mode for reliable element targeting.

Overview

Browser Pilot provides intelligent browser automation through Chrome DevTools Protocol (CDP) with a daemon-based architecture that maintains persistent browser connections. Features Smart Mode with automatic Interaction Map generation for text-based element search, eliminating brittle CSS selectors.

Key Features:

  • πŸ€– Smart Mode - Text-based element search with automatic selector generation
  • πŸ”„ Daemon Architecture - Persistent CDP connection for instant command execution
  • πŸ“Έ Screenshot Capture - Viewport or full-page screenshots
  • 🌐 Navigation & Interaction - URL navigation, click, fill, type, scroll
  • πŸ“„ Data Extraction - Text content, console logs, cookies, accessibility tree
  • πŸ“‘ PDF Generation - Convert web pages to PDFs
  • πŸ”— Tab Management - List, switch, close tabs programmatically
  • πŸ€– Bot Detection Bypass - Maintains navigator.webdriver = false
  • βš›οΈ React/Framework Compatibility - Properly triggers React synthetic events
  • ⛓️ Chain Mode - Execute multiple commands in a single workflow

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    CDP     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Claude Code    β”‚  IPC Commands      β”‚  Daemon Server   │◄──────────►│   Chrome Browser     β”‚
β”‚  (CLI Client)   │───────────────────►│  (Background)    β”‚ WebSocket  β”‚   (CDP Server)       β”‚
β”‚                 β”‚                    β”‚                  β”‚ Port 9222+ β”‚                      β”‚
β”‚  - User Request β”‚                    β”‚  - CDP Client    β”‚            β”‚  - Headless/Headed   β”‚
β”‚  - Command Parseβ”‚                    β”‚  - Map Generator β”‚            β”‚  - Tab Management    β”‚
β”‚  - Result Outputβ”‚                    β”‚  - Auto-restart  β”‚            β”‚  - DevTools API      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Components:

  • Daemon Server: Background process maintaining persistent CDP connection
  • CLI Client: Fast command execution via IPC communication with daemon
  • Interaction Map: Auto-generated JSON map of interactive elements for Smart Mode
  • Chrome Browser: Runs in headless or headed mode with CDP enabled
  • Auto-Management: Daemon starts on first command, stops at session end (30-min timeout)

Using the Skill in Claude Code

This plugin provides a skill that Claude can use automatically. You don't need to run commands manually.

Prerequisites

Before Claude can use this skill:

  1. Build the TypeScript code (one-time setup):

    cd plugins/browser-pilot/skills/scripts
    npm install
    npm run build
    
  2. Install Google Chrome

    • Chrome will auto-launch when Claude uses the skill
    • Runs on port 9222 (or next available port)

How It Works

Once set up, Claude will automatically use this skill when you ask for browser automation tasks:

Claude handles all the command execution through the SKILL.md interface - you just describe what you want!

Installation

Install Plugin

# Add marketplace
/plugin marketplace add https://github.com/Dev-GOM/claude-code-marketplace.git

# Install plugin
/plugin install browser-pilot@dev-gom-plugins

Or install directly:

/plugin add https://github.com/Dev-GOM/claude-code-marketplace/tree/main/plugins/browser-pilot

The plugin will auto-initialize when you start a Claude Code session (via SessionStart hook).

Prerequisites

  1. Google Chrome (latest version recommended)
  2. Node.js 18+ (nodejs.org)
  3. Git Bash (Windows) or Terminal (macOS/Linux)

Quick Start

Smart Mode (Recommended)

Text-based element search with automatic selector generation - more reliable than CSS selectors:

# Navigate to a page
node .browser-pilot/bp navigate -u "https://example.com/login"

# Click using text content (no quotes for single words)
node .browser-pilot/bp click --text Login --type button

# Fill form using text labels (quotes when text contains spaces)
node .browser-pilot/bp fill --text "Email Address" -v "user@example.com"
node .browser-pilot/bp fill --text Password -v "mypass123"

# Handle duplicate elements with indexing (click 2nd Delete button)
node .browser-pilot/bp click --text Delete --index 2

# Take screenshot
node .browser-pilot/bp screenshot -o "login-page.png"

Chain Mode (Multiple Commands)

Execute multiple commands in a single workflow:

# Login workflow
node .browser-pilot/bp chain \
  navigate -u "https://example.com/login" \
  fill --text Email -v "user@example.com" \
  fill --text Password -v "mypass123" \
  click --text Login \
  screenshot -o "logged-in.png"

# Scraping workflow
node .browser-pilot/bp chain \
  navigate -u "https://example.com" \
  wait -s ".content-loaded" \
  extract -s ".product-title" \
  screenshot -o "products.png"

Direct Mode (CSS Selectors)

For unique IDs or when Smart Mode is unavailable:

# Using CSS selectors
node .browser-pilot/bp click -s "#login-button"
node .browser-pilot/bp fill -s "input[name='email']" -v "user@example.com"
node .browser-pilot/bp extract -s "h1.page-title"

Note: Files are saved to .browser-pilot/ in your project root. Daemon auto-starts on first command and stops at session end.

Available Commands

All commands use the wrapper script: node .browser-pilot/bp <command> [options]

Use --help for detailed options: node .browser-pilot/bp <command> --help

Navigation Commands

navigate -u <url>              # Navigate to URL
back                           # Go back in history
forward                        # Go forward in history
reload                         # Reload current page

Interaction Commands (Smart Mode)

# Text-based search (recommended)
click --text <text> [options]  # Click element by text
  --type <type>                # Filter by element type (button, link, input, etc.)
  --index <n>                  # Select n-th match (for duplicates)
  --viewport-only              # Only search visible elements

fill --text <text> -v <value>  # Fill input by label text
  --type <type>                # Filter by input type
  --index <n>                  # Select n-th match

# Direct selector (fallback)
click -s <selector>            # Click by CSS selector
fill -s <selector> -v <value>  # Fill by CSS selector

Capture Commands

screenshot -o <filename>       # Capture screenshot
  --full-page                  # Capture entire page (default: viewport)

pdf -o <filename>              # Generate PDF
  --landscape                  # Use landscape orientation

Data Extraction

extract -s <selector>          # Extract text from elements
  --all                        # Extract all matches (default: first)

content                        # Get full page text content
console                        # Get console messages
cookies                        # Get cookies

Chain Mode

chain <cmd1> <cmd2> ...        # Execute multiple commands
  --timeout <ms>               # Map wait timeout (default: 10000)
  --delay <ms>                 # Fixed delay between commands

Query Commands

query --text <text>            # Find elements in Interaction Map
query --list-types             # List all element types
map-status                     # Check map status
regen-map                      # Force regenerate map

Other Commands

wait -s <selector> -t <ms>     # Wait for element
scroll -s <selector>           # Scroll to element
eval -e <expression>           # Execute JavaScript
tabs                           # List all tabs

Daemon Management

daemon-status                  # Check daemon status
daemon-stop                    # Stop daemon manually

For complete command reference, see references/commands-reference.md

Configuration

Shared Config File

Location: {plugin-folder}/skills/browser-pilot-config.json

The plugin uses a shared configuration system that manages multiple projects:

{
  "projects": {
    "my-project": {
      "rootPath": "/path/to/my-project",
      "port": 9222,
      "outputDir": ".browser-pilot",
      "lastUsed": "2025-11-04T12:00:00.000Z",
      "autoCleanup": false
    },
    "another-project": {
      "rootPath": "/path/to/another-project",
      "port": 9223,
      "outputDir": ".browser-pilot",
      "lastUsed": "2025-11-04T11:00:00.000Z",
      "autoCleanup": false
    }
  }
}

Features:

  • Automatic project registration via SessionStart hook
  • Unique port allocation per project (9222-9322)
  • Project identification by folder name
  • Optional cleanup via SessionEnd hook (when autoCleanup: true)

Output Directory

All screenshots and PDFs are automatically saved to:

  • .browser-pilot/ (in project root)
  • Auto-created at session start with .gitignore to exclude generated files

Example Workflows

Login Workflow (Smart Mode + Chain)

# Complete login workflow using text-based search
node .browser-pilot/bp chain \
  navigate -u "https://example.com/login" \
  fill --text Email -v "user@example.com" \
  fill --text Password -v "mypass123" \
  click --text "Sign In" --type button \
  screenshot -o "logged-in.png"

Screenshot Capture

# Full-page screenshot
node .browser-pilot/bp chain \
  navigate -u "https://github.com" \
  screenshot -o "github-homepage.png" --full-page

Form Automation (Smart Mode)

# Contact form submission using text labels
node .browser-pilot/bp chain \
  navigate -u "https://example.com/contact" \
  fill --text "Your Name" -v "John Doe" \
  fill --text "Email Address" -v "john@example.com" \
  fill --text Message -v "Hello from Browser Pilot!" \
  click --text Submit --type button \
  screenshot -o "contact-submitted.png"

Web Scraping

# Extract multiple elements
node .browser-pilot/bp chain \
  navigate -u "https://news.ycombinator.com" \
  extract -s "a.storylink" --all

PDF Generation

# Generate landscape PDF
node .browser-pilot/bp chain \
  navigate -u "https://docs.example.com" \
  pdf -o "api-docs.pdf" --landscape

Query Interaction Map

# Find specific elements
node .browser-pilot/bp query --text "Login"
node .browser-pilot/bp query --list-types

# Check map status
node .browser-pilot/bp map-status

Bot Detection Avoidance

Browser Pilot maintains navigator.webdriver = false and properly triggers React synthetic events, bypassing most anti-bot systems.

Chain Mode automatically adds 300-800ms random delays between commands to mimic human behavior.

Test Bot Detection:

node .browser-pilot/bp chain \
  navigate -u "https://bot.sannysoft.com" \
  screenshot -o "bot-test.png"

Expected: All checks PASS (green).

Best Practices

  1. 🌟 Use Smart Mode by default - Text-based search (--text Login) is more stable than CSS selectors
  2. Use Chain Mode for workflows - Automatic delays and streamlined execution
  3. Daemon auto-manages - No manual start/stop needed
  4. Prefer text-based search - More resilient to UI changes than CSS selectors
  5. Handle duplicates with indexing - --index 2 for the 2nd matching element
  6. Filter by type for precision - --type button narrows results
  7. Verify visibility - --viewport-only ensures element is on screen
  8. Check console on errors - node .browser-pilot/bp console for debugging
  9. Use --help for guidance - node .browser-pilot/bp <command> --help

Troubleshooting

Build Errors

cd plugins/browser-pilot/skills/scripts
npm install && npm run build

Chrome Not Found

Browser Pilot automatically detects Chrome installation. If it fails:

Windows:

  • Default: C:\Program Files\Google\Chrome\Application\chrome.exe
  • User: %LOCALAPPDATA%\Google\Chrome\Application\chrome.exe

macOS:

  • /Applications/Google Chrome.app/Contents/MacOS/Google Chrome

Linux:

  • /usr/bin/google-chrome
  • /usr/bin/chromium

Port Already in Use

Browser Pilot auto-increments debug port (9222 β†’ 9223 β†’ ...) if port is busy.

Connection Timeout

Default timeout is 10 seconds (20 attempts Γ— 500ms). To increase, modify src/cdp/browser.ts:

const maxAttempts = 40; // Change from 20 to 40 for 20 seconds timeout

Element Not Found

  • Verify selector with browser DevTools (F12 β†’ Console β†’ document.querySelector("..."))
  • Wait for page load with sleep 1 before interacting
  • Use more specific selectors (#id > .class)

Technical Details

Chrome DevTools Protocol

Browser Pilot uses CDP for all browser operations:

Request:

{
  "id": 1,
  "method": "Page.navigate",
  "params": {
    "url": "https://example.com"
  }
}

Response:

{
  "id": 1,
  "result": {
    "frameId": "frame-id-here"
  }
}

Ethical Usage

Browser Pilot is for authorized automation only. Do NOT use for:

  • Unauthorized access or credential theft
  • DDoS attacks or server overload
  • Copyright infringement (scraping copyrighted content)
  • Bypassing paywalls or access controls
  • Automated account creation (spam, fake accounts)
  • Credential stuffing

Always respect robots.txt and website terms of service.

Performance

Command Chaining:

  • Browser launches once and stays open (fast)
  • Each command reuses the same browser instance
  • No page refresh when URL is omitted (v0.1.6+) - preserves page state
  • ~100-200ms overhead per command (npm startup)
  • CDP communication is near-instant (milliseconds)
  • Add sleep delays to avoid bot detection (0.5-2 seconds recommended)
  • Total workflow time = initial page load + sleep delays + command overhead

License

Apache License 2.0 - see LICENSE and NOTICE for details

Contributing

This plugin is part of the Dev GOM Plugins marketplace. Contributions are welcome!

Documentation

Support


Note: Browser Pilot v1.4.0 is production-ready and actively maintained. Report bugs or request features via GitHub Issues.

πŸš€Quick Installation

/plugin install browser-pilot@dev-gom-plugins

Run this command in Claude Code to install the plugin.