diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 00000000..f63070cf --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,255 @@ +# Contributing to Agent Browser + +Thank you for your interest in contributing to Agent Browser! This document provides guidelines and instructions for contributing. + +## Table of Contents + +- [Code of Conduct](#code-of-conduct) +- [Getting Started](#getting-started) +- [Development Setup](#development-setup) +- [Making Changes](#making-changes) +- [Submitting Changes](#submitting-changes) +- [Coding Standards](#coding-standards) +- [Testing](#testing) +- [Documentation](#documentation) + +## Code of Conduct + +This project adheres to a code of conduct. By participating, you are expected to: + +- Be respectful and inclusive +- Accept constructive criticism gracefully +- Focus on what's best for the community +- Show empathy towards others + +## Getting Started + +1. **Fork the repository** on GitHub +2. **Clone your fork** locally: + ```bash + git clone https://github.com/YOUR_USERNAME/agent-browser.git + cd agent-browser + ``` +3. **Add upstream remote**: + ```bash + git remote add upstream https://github.com/dextonai/agent-browser.git + ``` +4. **Create a branch** for your changes: + ```bash + git checkout -b feature/your-feature-name + ``` + +## Development Setup + +### Prerequisites + +- Node.js 18+ +- npm or yarn +- Git + +### Installation + +```bash +# Install dependencies +npm install + +# Build the project +npm run build + +# Run in development mode +npm run dev + +# Run tests +npm test +``` + +### Environment Variables + +Create a `.env` file: + +```env +DEBUG=true +HEADLESS=false +DEFAULT_TIMEOUT=30000 +``` + +## Making Changes + +### Types of Contributions + +- ๐Ÿ› **Bug fixes** - Fix existing issues +- โœจ **Features** - Add new functionality +- ๐Ÿ“š **Documentation** - Improve docs +- ๐Ÿงช **Tests** - Add or improve tests +- ๐Ÿ”ง **Refactoring** - Code improvements + +### Branch Naming + +- `feature/description` - New features +- `bugfix/description` - Bug fixes +- `docs/description` - Documentation +- `refactor/description` - Code refactoring + +### Commit Messages + +Follow conventional commits: + +``` +(): + +[optional body] + +[optional footer] +``` + +Types: +- `feat`: New feature +- `fix`: Bug fix +- `docs`: Documentation +- `style`: Formatting +- `refactor`: Code restructuring +- `test`: Tests +- `chore`: Maintenance + +Example: +``` +feat(browser): add screenshot on failure option + +Adds automatic screenshot capture when navigation fails. +Useful for debugging CI failures. + +Closes #123 +``` + +## Submitting Changes + +### Pull Request Process + +1. **Update documentation** if needed +2. **Add tests** for new features +3. **Ensure tests pass**: + ```bash + npm test + npm run lint + ``` +4. **Update CHANGELOG.md** with your changes +5. **Submit PR** with clear description + +### PR Template + +```markdown +## Description +Brief description of changes + +## Type of Change +- [ ] Bug fix +- [ ] New feature +- [ ] Documentation +- [ ] Refactoring + +## Testing +- [ ] Tests pass locally +- [ ] Added tests for new features +- [ ] Manual testing completed + +## Checklist +- [ ] Code follows style guidelines +- [ ] Self-review completed +- [ ] Documentation updated +- [ ] No breaking changes (or documented) +``` + +### Review Process + +- Maintainers will review within 48 hours +- Address review comments +- Squash commits if requested +- PR will be merged by maintainers + +## Coding Standards + +### TypeScript/JavaScript + +- Use TypeScript for new code +- Follow ESLint configuration +- Use meaningful variable names +- Add JSDoc comments for public APIs + +### Code Style + +```typescript +// Good +async function navigateToPage(url: string): Promise { + const page = await browser.newPage(); + await page.goto(url, { waitUntil: 'networkidle' }); + return page; +} + +// Avoid +async function nav(u) { + let p = await b.newPage(); + await p.goto(u); + return p; +} +``` + +### File Organization + +``` +src/ +โ”œโ”€โ”€ core/ # Core functionality +โ”œโ”€โ”€ utils/ # Utility functions +โ”œโ”€โ”€ types/ # TypeScript types +โ””โ”€โ”€ __tests__/ # Test files +``` + +## Testing + +### Running Tests + +```bash +# All tests +npm test + +# Watch mode +npm run test:watch + +# Coverage +npm run test:coverage + +# Specific file +npm test -- src/core/browser.test.ts +``` + +### Writing Tests + +```typescript +describe('Browser', () => { + it('should navigate to URL', async () => { + const browser = new Browser(); + const page = await browser.navigate('https://example.com'); + expect(page.url()).toBe('https://example.com/'); + }); +}); +``` + +## Documentation + +- Update README.md for user-facing changes +- Update API docs for public interface changes +- Add JSDoc comments +- Include code examples + +## Questions? + +- Open an issue for discussion +- Join our Discord: [link] +- Email: maintainers@agent-browser.dev + +## License + +By contributing, you agree that your contributions will be licensed under the MIT License. + +--- + +Thank you for contributing! ๐ŸŽ‰ diff --git a/README.md b/README.md index 8ede376c..40122895 100644 --- a/README.md +++ b/README.md @@ -1,1322 +1,50 @@ -# agent-browser +# Agent Browser -Browser automation CLI for AI agents. Fast native Rust CLI. +[![Build Status](https://github.com/dextonai/agent-browser/workflows/CI/badge.svg)](https://github.com/dextonai/agent-browser/actions) +[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) +[![TypeScript](https://img.shields.io/badge/TypeScript-5.0-blue.svg)](https://www.typescriptlang.org/) +[![npm version](https://badge.fury.io/js/agent-browser.svg)](https://www.npmjs.com/package/agent-browser) -## Installation - -### Global Installation (recommended) +A fast, Rust-based headless browser automation tool with Node.js bindings for AI agents. -Installs the native Rust binary: - -```bash -npm install -g agent-browser -agent-browser install # Download Chrome from Chrome for Testing (first time only) -``` +## Features -### Project Installation (local dependency) +- โšก **High Performance** - Rust core for maximum speed +- ๐Ÿ”ง **Easy Integration** - Simple Node.js API +- ๐Ÿ“ธ **Screenshot Support** - Capture page renders +- ๐Ÿ” **Element Detection** - Smart element selection +- ๐Ÿ“ **Form Automation** - Fill and submit forms +- ๐ŸŒ **Multi-page** - Handle multiple tabs -For projects that want to pin the version in `package.json`: +## Installation ```bash npm install agent-browser -agent-browser install -``` - -Then use via `package.json` scripts or by invoking `agent-browser` directly. - -### Homebrew (macOS) - -```bash -brew install agent-browser -agent-browser install # Download Chrome from Chrome for Testing (first time only) -``` - -### Cargo (Rust) - -```bash -cargo install agent-browser -agent-browser install # Download Chrome from Chrome for Testing (first time only) ``` -### From Source - -```bash -git clone https://github.com/vercel-labs/agent-browser -cd agent-browser -pnpm install -pnpm build -pnpm build:native # Requires Rust (https://rustup.rs) -pnpm link --global # Makes agent-browser available globally -agent-browser install -``` - -### Linux Dependencies - -On Linux, install system dependencies: - -```bash -agent-browser install --with-deps -``` - -### Updating - -Upgrade to the latest version: - -```bash -agent-browser upgrade -``` - -Detects your installation method (npm, Homebrew, or Cargo) and runs the appropriate update command automatically. - -### Requirements - -- **Chrome** - Run `agent-browser install` to download Chrome from [Chrome for Testing](https://developer.chrome.com/blog/chrome-for-testing/) (Google's official automation channel). Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically. No Playwright or Node.js required for the daemon. -- **Rust** - Only needed when building from source (see From Source above). - ## Quick Start -```bash -agent-browser open example.com -agent-browser snapshot # Get accessibility tree with refs -agent-browser click @e2 # Click by ref from snapshot -agent-browser fill @e3 "test@example.com" # Fill by ref -agent-browser get text @e1 # Get text by ref -agent-browser screenshot page.png -agent-browser close -``` - -### Traditional Selectors (also supported) - -```bash -agent-browser click "#submit" -agent-browser fill "#email" "test@example.com" -agent-browser find role button click --name "Submit" -``` - -## Commands - -### Core Commands - -```bash -agent-browser open # Navigate to URL (aliases: goto, navigate) -agent-browser click # Click element (--new-tab to open in new tab) -agent-browser dblclick # Double-click element -agent-browser focus # Focus element -agent-browser type # Type into element -agent-browser fill # Clear and fill -agent-browser press # Press key (Enter, Tab, Control+a) (alias: key) -agent-browser keyboard type # Type with real keystrokes (no selector, current focus) -agent-browser keyboard inserttext # Insert text without key events (no selector) -agent-browser keydown # Hold key down -agent-browser keyup # Release key -agent-browser hover # Hover element -agent-browser select # Select dropdown option -agent-browser check # Check checkbox -agent-browser uncheck # Uncheck checkbox -agent-browser scroll [px] # Scroll (up/down/left/right, --selector ) -agent-browser scrollintoview # Scroll element into view (alias: scrollinto) -agent-browser drag # Drag and drop -agent-browser upload # Upload files -agent-browser screenshot [path] # Take screenshot (--full for full page, saves to a temporary directory if no path) -agent-browser screenshot --annotate # Annotated screenshot with numbered element labels -agent-browser screenshot --screenshot-dir ./shots # Save to custom directory -agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80 -agent-browser pdf # Save as PDF -agent-browser snapshot # Accessibility tree with refs (best for AI) -agent-browser eval # Run JavaScript (-b for base64, --stdin for piped input) -agent-browser connect # Connect to browser via CDP -agent-browser stream enable [--port ] # Start runtime WebSocket streaming -agent-browser stream status # Show runtime streaming state and bound port -agent-browser stream disable # Stop runtime WebSocket streaming -agent-browser close # Close browser (aliases: quit, exit) -agent-browser close --all # Close all active sessions -``` - -### Get Info - -```bash -agent-browser get text # Get text content -agent-browser get html # Get innerHTML -agent-browser get value # Get input value -agent-browser get attr # Get attribute -agent-browser get title # Get page title -agent-browser get url # Get current URL -agent-browser get cdp-url # Get CDP WebSocket URL (for DevTools, debugging) -agent-browser get count # Count matching elements -agent-browser get box # Get bounding box -agent-browser get styles # Get computed styles -``` - -### Check State - -```bash -agent-browser is visible # Check if visible -agent-browser is enabled # Check if enabled -agent-browser is checked # Check if checked -``` - -### Find Elements (Semantic Locators) - -```bash -agent-browser find role [value] # By ARIA role -agent-browser find text # By text content -agent-browser find label