diff --git a/skills/cold-outbound/SKILL.md b/skills/cold-outbound/SKILL.md new file mode 100644 index 0000000..9e3779a --- /dev/null +++ b/skills/cold-outbound/SKILL.md @@ -0,0 +1,262 @@ +--- +name: cold-outbound +description: | + Cold outbound lead generation skill for SDR prospecting at scale. Researches a + company's product and ICP, discovers target companies using Browserbase Search API, + deeply researches each using a Plan→Research→Synthesize pattern, scores ICP fit, + and generates personalized outbound emails — all compiled into a scored CSV. + Supports depth modes (quick/deep/deeper) for balancing scale vs intelligence. + Use when the user wants to: (1) generate outbound leads, (2) build a prospecting + list, (3) find companies matching an ICP, (4) create personalized cold emails at + scale, (5) do SDR research. Triggers: "outbound", "lead gen", "prospecting list", + "ICP leads", "cold email", "find companies to sell to", "SDR", "build a lead list", + "outbound campaign". +license: MIT +compatibility: Requires bb CLI (@browserbasehq/cli) and BROWSERBASE_API_KEY env var +allowed-tools: Bash Agent +metadata: + author: browserbase + version: "0.3.0" +--- + +# Cold Outbound + +Generate enriched lead lists with personalized outbound emails. Uses Browserbase Search API for discovery, a deep research pattern for enrichment, and LLM-powered email personalization. + +**Required**: `BROWSERBASE_API_KEY` env var and `bb` CLI installed. + +**First-run setup**: On the first run you'll be prompted to approve `bb fetch`, `bb search`, `cat`, `mkdir`, `sed`, etc. Select **"Yes, and don't ask again for: bb fetch:\*"** (or equivalent) for each to auto-approve for the session. To permanently approve, add these to your `~/.claude/settings.json` under `permissions.allow`: +```json +"Bash(bb:*)", "Bash(bunx:*)", "Bash(bun:*)", "Bash(node:*)", +"Bash(cat:*)", "Bash(mkdir:*)", "Bash(sed:*)", "Bash(head:*)", "Bash(tr:*)", "Bash(rm:*)" +``` + +**Path rules**: Always use the full literal path in all Bash commands — NOT `~` or `$HOME` (both trigger "shell expansion syntax" approval prompts). Resolve the home directory once and use it everywhere (e.g., `/Users/jay/skills/skills/cold-outbound/...`). When constructing subagent prompts, replace `{SKILL_DIR}` with the full literal path. When writing files (like profiles), use the Write tool with the full expanded path. + +**CRITICAL — Tool restrictions (applies to main agent AND all subagents)**: +- All web searches: use `bb search`. NEVER use WebSearch. +- All page fetches: use `bb fetch --allow-redirects`. NEVER use WebFetch. `bb fetch` returns raw HTML — to extract text, pipe through: `sed 's/]*>.*<\/script>//g; s/]*>.*<\/style>//g; s/<[^>]*>//g' | tr -s ' \n'`. Has a 1MB response limit — for large or JS-heavy pages, use `bb browse` instead. +- All research output: subagents write **one markdown file per company** to `/tmp/cold_research/{company-slug}.md` using bash heredoc. NEVER use the Write tool or `python3 -c` — both trigger security prompts. See `references/example-research.md` for the file format. +- CSV compilation: use the bundled script `node {SKILL_DIR}/scripts/compile_csv.mjs /tmp/cold_research`. See `references/workflow.md` for details. +- URL deduplication: use `node {SKILL_DIR}/scripts/list_urls.mjs /tmp` after discovery. +- **Subagents must use ONLY the Bash tool. No other tools allowed.** This is non-negotiable — WebFetch, WebSearch, Write, Read, Glob, and Grep all trigger permission prompts that interrupt the user. +- **Main agent NEVER reads raw discovery JSON batch files.** Use `list_urls.mjs` for URL deduplication. For enrichment data, read the per-company markdown files directly. + +**CRITICAL — Minimize permission prompts**: +- Subagents MUST batch ALL file writes into a SINGLE Bash call using chained heredocs (`cat << 'COMPANY_MD' > file1.md ... COMPANY_MD && cat << 'COMPANY_MD' > file2.md ...`). One Bash call = one permission prompt. Multiple Bash calls = multiple prompts that frustrate the user. +- Similarly, batch ALL searches and ALL fetches into single Bash calls where possible using `&&` chaining. +- The main agent should also batch operations: run all contact searches in one call, append all contact sections in one call, append all email drafts in one call. + +## Pipeline Overview + +Follow these 8 steps in order. Do not skip steps or reorder. + +1. **Company Research** — Discover the user's product, ICP, and pitch angle +2. **Depth Mode Selection** — Choose research depth based on lead count +3. **Micro-Vertical Generation** — Expand ICP into diverse search queries +4. **Output Schema Design** — Define CSV columns with user input +5. **Batch Discovery** — Subagents search for target companies in parallel +6. **Deep Research & Enrichment** — Subagents research each company, score ICP fit (NO emails yet) +7. **Contact Discovery** — Find decision makers at high-fit companies +8. **Email Generation + CSV Compilation** — Write personalized emails with full context, compile final CSV + +--- + +## Step 1: Deep Company Research + Vertical Scoping + +This is the most important step. The quality of everything downstream depends on deeply understanding the user's company AND the specific vertical they want to target. + +**If the user specifies a target vertical** (e.g., "sell to UI testing companies"), run a quick research on that vertical too: +- Search: `bb search "{vertical} companies landscape types"` +- Use the `AskUserQuestion` tool to ask clarifying questions as checkboxes — NOT as a text wall. Combine all questions into a single AskUserQuestion call with multiple questions. Example: + - Question 1 (multiSelect: true): "Which segments?" with options like "E2E testing platforms", "Visual regression tools", "Cross-browser testing", "AI-powered testing" + - Question 2: "Company stage?" with options like "Startups", "Mid-market", "Enterprise", "All" + - Question 3: "How many leads / depth?" with options like "Quick (100+)", "Deep (25-50)", "Deeper (<25)" +- This is the ONLY user interaction after profile confirmation. Fold answers into ICP and sub-verticals, then execute Steps 3-7 silently. +- Do NOT save vertical targeting answers to the profile. These are per-run decisions held in memory only. The profile only stores company facts (product, customers, competitors, use cases). + +If the user doesn't specify a vertical, derive sub-verticals from the company research. Still use AskUserQuestion for depth mode selection. + +**Profiles directory**: `{SKILL_DIR}/profiles/` +A blank template (`example.json`) ships with the skill. Completed profiles persist across sessions. + +1. Ask the user for their company name or URL + +2. **Check for an existing profile**: + - List files in `{SKILL_DIR}/profiles/` (ignore `example.json`) + - If a matching profile exists → load it, present to user: "I have your profile from {researched_at}. Still accurate?" If yes → skip to Step 2. If changes needed → edit fields and re-save. + - If no profile exists → proceed with deep research below. After confirmation, save to `profiles/{company-slug}.json` (copy structure from `example.json`). + +3. **Run a full deep research on the user's company** using the Plan→Research→Synthesize pattern. + See `references/research-patterns.md` for sub-question templates, research loop rules, and synthesis instructions. + + **Key research steps:** + - Search: `bb search "{company name}" --num-results 10` + - Fetch homepage: `bb fetch --allow-redirects "{company website}"` + - **Discover site pages via sitemap** (do NOT hardcode paths like `/about` or `/customers`): + 1. `bb fetch --allow-redirects "{company website}/sitemap.xml"` — primary source, has ALL pages + 2. Scan for URLs with keywords: `customer`, `case-stud`, `pricing`, `about`, `use-case`, `industry`, `solution` + 3. Optionally also fetch `bb fetch --allow-redirects "{company website}/llms.txt"` for page descriptions + 4. Pick 3-5 most relevant URLs and fetch those + - Search for external context and competitors + - Accumulate findings with confidence levels + + **Synthesize into a profile** (about the COMPANY, not a specific vertical): + Company, Product, Existing Customers, Competitors, Use Cases. + Do NOT include ICP, pitch angle, or sub-verticals — those are per-run targeting decisions. + +4. Present the profile to the user for confirmation. Do not proceed until confirmed. + +5. **Save the confirmed profile** to `{SKILL_DIR}/profiles/{company-slug}.json`: + ```json + { + "company": "Browserbase", + "website": "https://www.browserbase.com", + "product": "Cloud browser infrastructure for AI agents...", + "existing_customers": ["Firecrawl", "Ramp", "..."], + "competitors": ["Browserless", "Apify", "..."], + "use_cases": ["AI agent browser access", "web scraping", "..."], + "researched_at": "2026-03-17" + } + ``` + +## Step 2: Depth Mode Selection + +Ask the user how many leads they want and recommend a depth mode: + +| Mode | Research per company | Best for | Default when | +|------|---------------------|----------|--------------| +| `quick` | Homepage + 1-2 searches | 100+ leads, broad discovery | User asks for 100+ leads | +| `deep` | 2-3 sub-questions, 5-8 tool calls | 25-50 leads, quality enrichment | User asks for 25-100 leads | +| `deeper` | 4-5 sub-questions, 10-15 tool calls | 10-25 leads, full intelligence | User asks for <25 leads | + +Combine Step 1 profile confirmation and depth mode into a single prompt. Once the user responds, execute Steps 3-8 silently — no more questions, no status narration. + +## Step 3: Micro-Vertical Generation + +**Formula**: `ceil(requested_leads / 35)` micro-verticals needed. Over-discover by ~2-3x because filtering typically drops 50-70%. + +Generate search queries with these patterns: +- Industry + company stage + geography ("fintech startups series A Bay Area") +- Technology stack + use case ("companies using Selenium for web scraping") +- Competitor adjacency ("alternatives to {known company in ICP}") +- Buyer persona + pain point ("engineering teams struggling with browser automation") + +Each query: 4-8 descriptive keywords, non-overlapping. Proceed immediately — do not ask the user to approve queries. + +## Step 4: Output Schema Design (auto) + +Use default columns plus enrichment fields that make sense for the ICP: + +| Column | Description | +|--------|-------------| +| `company_name` | Company name (5 words max) | +| `website` | Homepage URL | +| `product_description` | What they do (12 words max) | +| `icp_fit_score` | 1-10 integer | +| `icp_fit_reasoning` | Why this score (20 words max) | +| `personalized_email` | Ready-to-send email draft | + +Always include `industry` and `key_features`. Add `employee_estimate`, `funding_info`, `target_audience` when relevant. Do not ask the user — use sensible defaults. + +## Step 5: Batch Discovery + +Launch subagents to run search queries in parallel. See `references/workflow.md` for subagent prompt templates and wave management rules. + +**Process**: +1. **Clean up prior run**: `rm -rf /tmp/cold_discovery_batch_*.json /tmp/cold_research` +2. Launch ALL discovery subagents at once (up to ~6 per single message). Each subagent runs its queries in a SINGLE Bash call: + ```bash + bb search "{query}" --num-results 25 --output /tmp/cold_discovery_batch_{N}.json + ``` +3. Subagents report back counts only +4. After all waves complete, deduplicate: + ```bash + node {SKILL_DIR}/scripts/list_urls.mjs /tmp + ``` +5. **Filter the URL list**: Remove URLs that are clearly NOT company homepages: + - Blog posts, news articles (globenewswire.com, techcrunch.com, etc.) + - Directories/aggregators (tracxn.com, crunchbase.com, g2.com) + - The sender's own competitors (from the company profile) + - The sender's existing customers (from the company profile) + Keep only URLs that look like company homepages (e.g., `https://acme.com`, `https://www.acme.io`) + +## Step 6: Deep Research & Enrichment + +Launch subagents to research companies in parallel. See `references/workflow.md` for the enrichment subagent prompt template. See `references/research-patterns.md` for the full research methodology. + +**Important**: Enrichment subagents do NOT write emails. Emails come in Step 8 after contacts are found. + +**Process**: +1. `mkdir -p /tmp/cold_research` +2. Split filtered URLs into groups per subagent (quick: ~10, deep: ~5, deeper: ~2-3) +3. Launch ALL enrichment subagents at once (up to ~6 per message) +4. Each subagent uses ONLY Bash — for each company: + + **Phase A — Plan** (skip in quick mode): + Decompose into 2-5 sub-questions based on ICP and enrichment fields. + + **Phase B — Research Loop**: + Search and fetch pages, extract findings. Respect step budget (quick: 2-3, deep: 5-8, deeper: 10-15). + + **Phase C — Synthesize**: + Score ICP fit 1-10, fill enrichment fields. Do NOT write emails yet. + +5. Subagents write ALL markdown files in a SINGLE Bash call using chained heredocs (one prompt, not one per file) +6. After ALL subagents complete, proceed to Step 7 + +**Critical**: Include the confirmed ICP description and pitch angle verbatim in every subagent prompt. + +## Step 7: Contact Discovery + +See `references/workflow.md` for the contact discovery subagent prompt template. + +**Process**: +1. Read the markdown files in `/tmp/cold_research/` — scan YAML frontmatter for `icp_fit_score` values +2. Show a quick interim summary (lead count, score distribution, top 10 by ICP score) +3. Filter for companies with icp_fit_score >= 8 +4. **Pick 3-5 target titles** based on the sender's product: + - Dev tools → Head of DevRel, VP Engineering + - Security → CISO, VP Engineering + - Infrastructure → CTO, VP Engineering, Head of Platform + - Early-stage startups → Founder, CEO, CTO + - Marketing tools → VP Marketing, Head of Growth +5. Launch contact discovery subagents. Each subagent: + - Runs ALL contact searches in a SINGLE Bash call + - Appends ALL `## Contact` sections in a SINGLE Bash call + - Falls back to `bb search "{company name} founder CEO about us"` if title-specific search returns nothing +6. Present a contact table, then proceed to Step 8. + +## Step 8: Email Generation + CSV Compilation + +See `references/email-templates.md` for email structure, personalization signals, and anti-patterns. + +**Process**: +1. Launch email generation subagents. Each subagent: + - Writes ALL emails for its batch of companies + - Appends ALL `## Email Draft` sections in a SINGLE Bash call using chained heredocs +2. **Compile final CSV**: + ```bash + node {SKILL_DIR}/scripts/compile_csv.mjs /tmp/cold_research ~/Desktop/{company}_outbound_{date}.csv + ``` +3. Present final results: + +``` +## Outbound Lead List Complete + +- **Total leads**: {count} +- **With contacts found**: {count} +- **Depth mode**: {mode} +- **Score distribution**: + - High fit (8-10): {count} + - Medium fit (5-7): {count} + - Low fit (1-4): {count} +- **Output file**: ~/Desktop/{filename} +``` + +4. Show the **top 10 leads** in a table +5. Show 3-5 sample personalized emails + +**Note**: Email addresses are estimated using common patterns (first@company.com). Recommend verifying through Apollo.io, Hunter.io, or LinkedIn Sales Navigator before sending. + +Offer to filter the CSV, regenerate emails for specific companies, or search for additional contacts at lower-scored companies. diff --git a/skills/cold-outbound/profiles/browserbase.json b/skills/cold-outbound/profiles/browserbase.json new file mode 100644 index 0000000..12105a7 --- /dev/null +++ b/skills/cold-outbound/profiles/browserbase.json @@ -0,0 +1,9 @@ +{ + "company": "Browserbase", + "website": "https://www.browserbase.com", + "product": "Cloud browser infrastructure for AI agents and web automation. Run Playwright, Puppeteer, and Selenium at scale with stealth mode, CAPTCHA solving, session persistence, residential proxies, and debugging tools. Products include Browserbase (headless infra), Stagehand (browser automation SDK), and Director (workflow builder). Also offers MCP browser tool and Computer Use Agent support.", + "existing_customers": ["Firecrawl", "Ramp", "Exa", "Reducto", "Cerebras", "Cartesia", "Extend", "Polymarket"], + "competitors": ["Browserless", "Apify", "Scrapfly", "Surfsky", "BrowserTree", "Hyperbrowser", "Anchor Browser"], + "use_cases": ["AI agent browser access", "web scraping and data extraction", "automated testing", "form filling", "document downloading", "price monitoring", "lead research", "computer use agents"], + "researched_at": "2026-03-18" +} diff --git a/skills/cold-outbound/profiles/example.json b/skills/cold-outbound/profiles/example.json new file mode 100644 index 0000000..ae469f5 --- /dev/null +++ b/skills/cold-outbound/profiles/example.json @@ -0,0 +1,9 @@ +{ + "company": "", + "website": "", + "product": "", + "existing_customers": [], + "competitors": [], + "use_cases": [], + "researched_at": "" +} diff --git a/skills/cold-outbound/references/email-templates.md b/skills/cold-outbound/references/email-templates.md new file mode 100644 index 0000000..797a479 --- /dev/null +++ b/skills/cold-outbound/references/email-templates.md @@ -0,0 +1,125 @@ +# Cold Outbound — Email Templates Reference + +## Email Structure + +Every outbound email follows this structure: + +**Subject line**: Specific, references their company or product. Never generic ("Quick question" is bad). +Example: "Thought on {Company}'s {specific feature/challenge}" + +**Body** (100-150 words, 3-4 short paragraphs): + +1. **Opening** (1-2 sentences): Reference something specific from their website — a feature they ship, a blog post, a recent launch, a job posting. Show you actually looked. + +2. **Bridge** (2-3 sentences): Connect their situation to the sender's value prop. Use the confirmed pitch angle. Frame as "companies like yours" or "teams building X often need Y." + +3. **Ask** (1 sentence): Soft CTA. "Would a 15-min call make sense to explore this?" Never "buy now" or "schedule a demo." + +4. **Sign-off**: First name only. No title dumps. + +## Personalization Signals + +When enriching a company, look for these on their website to fuel personalization: + +| Signal | Where to find | How to use | +|--------|---------------|------------| +| What they sell | Homepage hero, product page | Opening + bridge | +| Recent launches | Blog, changelog, press page | Opening hook | +| Hiring signals | Careers page, job boards | "I noticed you're scaling your X team" | +| Tech stack | Docs, job descriptions, GitHub | Bridge to technical pitch | +| Customer base | Case studies, logos section | "Working with companies like {their customer}" | +| Pain indicators | Pricing page (complexity), docs (workarounds) | Bridge to how you solve it | +| Growth signals | New markets, new features, funding news | Opening or bridge | + +## Examples + +### Example 1: SaaS company selling to e-commerce + +**Context extracted**: +- Company: CartFlow +- Product: Checkout optimization for Shopify stores +- Recent: Launched A/B testing feature last month +- Stack: React, Node.js, Shopify API + +**Sender pitch angle**: "We help companies automate browser-based testing and monitoring" + +**Email**: +``` +Subject: CartFlow's new A/B testing — monitoring at scale? + +Hi Sarah, + +Saw CartFlow just shipped A/B testing for checkout flows — congrats. That's a big surface area to keep reliable across Shopify's theme ecosystem. + +Teams running checkout experiments at scale often hit a wall with monitoring — catching layout breaks, payment form regressions, or slow renders before customers do. We help companies like yours run automated browser checks across hundreds of store configurations without building the infra in-house. + +Would a 15-min call make sense to see if this fits where CartFlow is headed? + +Best, +Alex +``` + +### Example 2: Data analytics startup + +**Context extracted**: +- Company: InsightPipe +- Product: Real-time analytics for marketing teams +- Recent: Hiring 3 data engineers +- Customers: Mid-market DTC brands + +**Sender pitch angle**: "We provide reliable web data collection infrastructure" + +**Email**: +``` +Subject: InsightPipe's data pipeline — web sources? + +Hi Marcus, + +Noticed InsightPipe is scaling the data engineering team — makes sense given the push into real-time analytics for DTC brands. + +A challenge we hear from analytics companies: reliably collecting web data (pricing, inventory, ad placements) at the frequency your customers need without getting blocked or managing proxy infrastructure. We handle the browser infrastructure side so your team can focus on the analytics layer. + +Worth a quick chat to see if web data collection is on your roadmap? + +Best, +Alex +``` + +### Example 3: AI company + +**Context extracted**: +- Company: AgentKit +- Product: Framework for building AI agents +- Recent: Open-sourced their core library +- Stack: Python, LangChain + +**Sender pitch angle**: "We give AI agents reliable browser access" + +**Email**: +``` +Subject: Browser access for AgentKit agents + +Hi Priya, + +AgentKit's agent framework is impressive — especially the open-source move. One pattern we see in agent builders: giving agents reliable browser access (navigating, extracting, filling forms) without the pain of managing headless Chrome at scale. + +We provide managed browser infrastructure specifically for AI agents — handles anti-bot detection, session management, and scales with your users' workloads. Several agent frameworks have integrated us as their default browser layer. + +Would it be useful to chat about how this could plug into AgentKit? + +Best, +Alex +``` + +## Anti-Patterns + +Avoid these in generated emails: + +- **Generic opener**: "I came across your company and was impressed" — says nothing specific +- **Feature dump**: Listing 5+ features instead of connecting to their need +- **Multiple CTAs**: "Book a demo, check our docs, or reply here" — pick one +- **Over 200 words**: SDR emails get skimmed, not read +- **Mentioning competitors by name**: "Unlike {competitor}..." — unprofessional in cold outreach +- **Fake familiarity**: "Hope you're having a great week!" — transparent filler +- **Title/credential stuffing** in sign-off: "Alex Johnson, Senior Account Executive, ABC Corp, MBA, PMP" — just first name +- **Apologetic tone**: "Sorry to bother you" — undermines the value you're offering diff --git a/skills/cold-outbound/references/example-research.md b/skills/cold-outbound/references/example-research.md new file mode 100644 index 0000000..a4951ed --- /dev/null +++ b/skills/cold-outbound/references/example-research.md @@ -0,0 +1,91 @@ +# Example Company Research File + +Each enrichment subagent writes one markdown file per company to `/tmp/cold_research/{company-slug}.md`. The YAML frontmatter contains structured fields for CSV compilation. The body contains human-readable research, contacts, and email drafts. + +## Template + +```markdown +--- +company_name: Acme Inc +website: https://acme.com +product_description: AI-powered inventory management for e-commerce brands +industry: E-commerce / SaaS +target_audience: Mid-market e-commerce brands +key_features: demand forecasting | automated reordering | multi-warehouse sync +icp_fit_score: 8 +icp_fit_reasoning: Series A e-commerce SaaS, uses Selenium for scraping, expanding to EU — strong fit for browser infrastructure +employee_estimate: 50-100 +funding_info: Series A, $12M +headquarters: San Francisco, CA +--- + +## Product +AI-powered inventory management for e-commerce brands. Helps DTC brands +automate reordering and sync across multiple warehouses. + +## Research Findings +- **[high]** Checkout optimization for Shopify stores, serving mid-market DTC brands with $5M-$50M revenue (source: acme.com/about) +- **[high]** Series A, $12M raised in Q3 2025 from Sequoia (source: TechCrunch) +- **[medium]** Recently hired 3 data engineers, expanding platform team (source: LinkedIn job posts) +- **[medium]** Uses Selenium for web scraping in their data pipeline (source: careers page) + +## Contact +- Name: Jane Smith +- Title: VP Engineering +- Email: jane@acme.com +- LinkedIn: https://linkedin.com/in/janesmith + +## Email Draft +Subject: Acme's Shopify data pipeline — scaling scraping? + +Hi Jane, + +Saw Acme just closed a Series A — congrats. The push into multi-warehouse +sync for DTC brands is a big surface area, especially with the EU expansion. + +Teams scaling web scraping for inventory and pricing data often hit a wall +with Selenium — blocked requests, CAPTCHA walls, and proxy management eat +engineering cycles. We handle the browser infrastructure so your team can +focus on the analytics layer. + +Would a 15-min call make sense to see if this fits where Acme is headed? + +Best, +Alex +``` + +## Field Rules + +- **YAML frontmatter**: All structured fields go here. These are extracted for CSV compilation. +- **`key_features`**: Pipe-separated (`|`) list in YAML, not a JSON array. +- **`icp_fit_score`**: Integer 1-10. +- **`icp_fit_reasoning`**: One line, references specific findings. +- **Body sections**: `## Product`, `## Research Findings`, `## Contact`, `## Email Draft`. +- **Findings format**: `- **[confidence]** fact (source: url or description)` +- **Contact section**: Added during Step 7 (contact discovery). Omit if not yet discovered. +- **Email Draft section**: Added during Step 8 (email generation). Omit during Step 6. +- **Filename**: `/tmp/cold_research/{company-slug}.md` where slug is lowercase, hyphenated (e.g., `acme-inc.md`). +- **Deduplication**: One file per company. If a subagent encounters a company that already has a file, it should overwrite with richer data. + +## Writing via Bash Heredoc + +Subagents write these files using bash heredoc to avoid security prompts: + +```bash +mkdir -p /tmp/cold_research +cat << 'COMPANY_MD' > /tmp/cold_research/acme-inc.md +--- +company_name: Acme Inc +website: https://acme.com +... +--- + +## Product +... + +## Research Findings +... +COMPANY_MD +``` + +Use `'COMPANY_MD'` (quoted) as the delimiter to prevent shell variable expansion inside the content. diff --git a/skills/cold-outbound/references/research-patterns.md b/skills/cold-outbound/references/research-patterns.md new file mode 100644 index 0000000..85516cc --- /dev/null +++ b/skills/cold-outbound/references/research-patterns.md @@ -0,0 +1,178 @@ +# Cold Outbound — Deep Research Patterns + +## Overview + +This reference defines two research contexts: +1. **Self-Research** (Step 1) — Deep research on the user's own company to build a strong ICP foundation +2. **Target Research** (Step 6) — Research each discovered company using Plan→Research→Synthesize + +Both use the same 3-phase pattern but with different sub-questions and goals. + +## Self-Research (User's Company) + +This is the most important research in the pipeline. Every downstream decision depends on it. + +### Sub-Questions +- "What does {company} sell and what specific problem does it solve?" +- "Who are {company}'s existing customers? What industries, company sizes, and use cases?" +- "Who are {company}'s competitors and what differentiates them?" +- "What pricing model does {company} use and who is the typical buyer persona?" +- "What use cases and pain points does {company}'s marketing emphasize?" + +### Page Discovery +Discover site pages dynamically — do NOT hardcode paths like `/about` or `/customers`: +1. Fetch `bb fetch --allow-redirects "{company website}/sitemap.xml"` — primary source, has ALL pages +2. Scan sitemap URLs for keywords: `customer`, `case-stud`, `pricing`, `about`, `use-case`, `blog`, `docs`, `industry`, `solution` +3. Optionally fetch `bb fetch --allow-redirects "{company website}/llms.txt"` for page descriptions +4. Pick the 3-5 most relevant URLs from the sitemap and fetch those +5. Sitemap is the source of truth. llms.txt is bonus context but often incomplete. + +### External Research +- Search: `"{company} customers use cases reviews"` +- Search: `"{company} alternatives competitors vs"` +- Fetch 1-2 of the most informative third-party results (G2, blog posts, comparisons) + +### Synthesis Output +From all findings, produce a company profile: +- **Company**: name +- **Product**: what they sell, how it works, key capabilities (2-3 sentences, specific) +- **Existing Customers**: named customers or customer types found +- **Competitors**: who they compete with, key differentiators +- **Use Cases**: broad list of use cases the product serves (NOT tied to one vertical) + +Do NOT include ICP, pitch angle, or sub-verticals in the profile. Those are per-run targeting decisions made in Step 2 after the profile is confirmed. The profile is a general-purpose company fact sheet that works regardless of which vertical you target next. + +### Why This Matters +A thin profile produces generic search queries, weak lead scoring, and cookie-cutter emails. A rich profile with specific customers, competitors, and use cases produces targeted queries, accurate scoring, and emails that reference real pain points. + +--- + +## Target Company Research (Step 6) + +### Sub-Question Templates + +Generate sub-questions from these categories based on the ICP and enrichment fields requested. Not every category applies to every company — pick the most relevant. + +### Priority 1 (Always ask) +- **Product/Market**: "What does {company} sell and who are their customers?" +- **ICP Fit**: "How does {company}'s product/market relate to {sender's ICP description}?" + +### Priority 2 (Ask in deep/deeper) +- **Tech Stack**: "What technologies, frameworks, or infrastructure does {company} use?" +- **Growth Signals**: "Has {company} raised funding, launched products, or expanded recently?" +- **Pain Points**: "What challenges might {company} face that {sender's product} addresses?" + +### Priority 3 (Ask in deeper only) +- **Decision Makers**: "Who leads engineering, product, or growth at {company}?" +- **Competitive Landscape**: "Who are {company}'s competitors and how are they differentiated?" +- **Customers/Case Studies**: "Who are {company}'s notable customers and what results do they highlight?" + +### Search Query Patterns + +For each sub-question, generate 2-3 search query variations: + +``` +# Product/Market +"{company name} what they do" +"{company name} product features customers" + +# Tech Stack +"{company name} tech stack engineering blog" +"{company name} careers software engineer" (job posts reveal stack) + +# Growth Signals +"{company name} funding round 2025 2026" +"{company name} launch announcement" +"{company name} hiring" + +# Pain Points +"{company name} challenges {relevant domain}" +"{company name} {problem sender solves}" + +# Decision Makers +"{company name} VP engineering CTO LinkedIn" +"{company name} head of growth product" +``` + +## Finding Format + +Each finding is a self-contained factual statement tied to a source: + +```json +{ + "subQuestion": "What does Acme sell and who are their customers?", + "fact": "Acme provides checkout optimization for Shopify stores, serving mid-market DTC brands with $5M-$50M revenue", + "sourceUrl": "https://acme.com/about", + "sourceTitle": "About Acme - Checkout Optimization", + "confidence": "high" +} +``` + +**Confidence levels**: +- `high`: Directly stated on the company's own website or official press +- `medium`: Inferred from job postings, third-party articles, or indirect signals +- `low`: Speculative based on industry/category, or from outdated sources + +## Research Loop Rules + +1. **Process sub-questions by priority** — Priority 1 first, then 2, then 3 +2. **3-5 findings per sub-question, then move on** — Don't exhaust a topic +3. **Use parallel tool calls** — Search multiple queries simultaneously when possible +4. **Rephrase, don't retry** — If a search returns poor results, try different keywords +5. **Fetch selectively** — Don't fetch every URL from search results. Pick the 1-2 most relevant based on title and URL +6. **Stop at step limit** — Respect the depth mode's step budget per company +7. **Homepage first** — Always fetch the company's homepage before branching to other pages +8. **Deduplicate findings** — Don't record the same fact twice from different sources + +## Depth Mode Behavior + +### Quick Mode (100+ leads) +- **Skip Phase A** — No sub-question decomposition +- **Phase B**: Fetch the company homepage. Run 1-2 supplementary searches if homepage data is thin. +- **Phase C**: Extract available data, score ICP, write email from what's available +- **Budget**: 2-3 total tool calls per company +- **Trade-off**: Fast and cheap, but emails may be less personalized + +### Deep Mode (25-50 leads) +- **Phase A**: Decompose into 2-3 sub-questions (Priority 1 + selected Priority 2) +- **Phase B**: For each sub-question, run 2-3 searches + fetch 1-2 URLs. Target 3-5 findings per sub-question. +- **Phase C**: Synthesize from all findings. ICP reasoning references specific evidence. Email uses the most specific/compelling finding. +- **Budget**: 5-8 total tool calls per company +- **Trade-off**: Good balance of depth and scale + +### Deeper Mode (10-25 leads) +- **Phase A**: Decompose into 4-5 sub-questions (Priority 1 + 2 + selected Priority 3) +- **Phase B**: Research exhaustively. Fetch multiple pages per company (homepage, about, blog, careers, product pages). Target 3-5 findings per sub-question. +- **Phase C**: Synthesize with cited evidence. ICP reasoning is detailed. Email references multiple specific signals. +- **Budget**: 10-15 total tool calls per company +- **Trade-off**: High quality intelligence, but slow and expensive + +## Synthesis Instructions + +After the research loop completes for a company, synthesize findings into the output record: + +### ICP Scoring +Score 1-10 using ALL accumulated findings as evidence: +- **8-10**: Strong match. Multiple high-confidence findings confirm right industry, company stage, and clear pain point alignment. The pitch angle directly addresses a visible need supported by evidence. +- **5-7**: Partial match. Some findings suggest relevance but key signals are missing or low-confidence. Adjacent industry or unclear pain point. +- **1-4**: Weak match. Findings indicate wrong segment, too large/small, or no apparent connection to sender's product. + +Write `icp_fit_reasoning` referencing specific findings: "Series A fintech (from Crunchbase), uses Selenium for scraping (from job posting), expanding to EU market (from blog) — strong fit for browser infrastructure." + +### Email Personalization +Use the **richest, most specific** findings for email context: +- Opening: Use the most concrete finding (a specific product feature, a recent launch, a job posting) +- Bridge: Connect a finding about their challenges/stack to the sender's pitch angle +- If only low-confidence findings exist, keep the email shorter and more general — don't fabricate specificity + +### Enrichment Fields +Map findings to enrichment fields: +- `product_description` → from Product/Market findings +- `industry` → inferred from Product/Market +- `employee_estimate` → from LinkedIn search or careers page findings +- `funding_info` → from Growth Signals findings +- `headquarters` → from company homepage or about page +- `target_audience` → from Product/Market findings +- `key_features` → from product page findings + +If a field has no supporting findings, leave it empty rather than guessing. diff --git a/skills/cold-outbound/references/workflow.md b/skills/cold-outbound/references/workflow.md new file mode 100644 index 0000000..2869c19 --- /dev/null +++ b/skills/cold-outbound/references/workflow.md @@ -0,0 +1,289 @@ +# Cold Outbound — Workflow Reference + +## Discovery Batch JSON Schema + +File: `/tmp/cold_discovery_batch_{N}.json` + +`bb search --output` writes a JSON object (NOT a flat array): + +```json +{ + "requestId": "abc123", + "query": "AI data extraction startups", + "results": [ + { "url": "https://example.com", "title": "Example Corp", "author": null, "publishedDate": null }, + ... + ] +} +``` + +The `list_urls.mjs` script handles both formats (flat array and `{ results: [...] }`). + +## Company Research Markdown Format + +File: `/tmp/cold_research/{company-slug}.md` + +Each enrichment subagent writes one markdown file per company. See `references/example-research.md` for the full template with all sections and field rules. + +**YAML frontmatter fields** (used for CSV compilation): +- `company_name` (required) +- `website` (required) +- `product_description` +- `industry` +- `target_audience` +- `key_features` (pipe-separated: `feature1 | feature2 | feature3`) +- `icp_fit_score` (integer 1-10, required) +- `icp_fit_reasoning` +- `employee_estimate` +- `funding_info` +- `headquarters` + +**Body sections** (added progressively): +- `## Product` — added in Step 6 (enrichment) +- `## Research Findings` — added in Step 6 (enrichment) +- `## Contact` — added in Step 7 (contact discovery) +- `## Email Draft` — added in Step 8 (email generation) + +**CRITICAL**: Use consistent field names across all files. The `compile_csv.mjs` script reads these fields. + +## Extracting Text from HTML + +`bb fetch --allow-redirects` returns raw HTML. To extract readable text in a subagent Bash call, use: + +```bash +# Fetch and extract text in one pipeline +bb fetch --allow-redirects "https://example.com" | sed 's/]*>.*<\/script>//g; s/]*>.*<\/style>//g; s/<[^>]*>//g; s/&/\&/g; s/<//g; s/ / /g; s/&#[0-9]*;//g' | tr -s ' \n' | head -c 3000 +``` + +Or save to file first and then extract: +```bash +bb fetch --allow-redirects "https://example.com" --output /tmp/fetch_example.html && sed 's/]*>.*<\/script>//g; s/]*>.*<\/style>//g; s/<[^>]*>//g' /tmp/fetch_example.html | tr -s ' \n' | head -c 3000 +``` + +Limit to ~3000 chars per page to keep subagent context manageable. Focus on extracting the company name, product description, customer names, and key features from the text. + +## Discovery Subagent Prompt Template + +``` +You are a lead discovery subagent. Run search queries and save results. + +TOOL RULES — CRITICAL, FOLLOW EXACTLY: +1. You may ONLY use the Bash tool. No exceptions. +2. Run ALL searches in a SINGLE Bash call using && chaining. +3. BANNED TOOLS: WebFetch, WebSearch, Write, Read, Glob, Grep — ALL BANNED. + If you use ANY banned tool, the entire run fails. Use ONLY Bash. +4. NEVER use ~ or $HOME in paths — use full literal paths. + +TASK: +Run ALL of the following searches in ONE Bash command: + +bb search "{query1}" --num-results 25 --output /tmp/cold_discovery_batch_{N1}.json && \ +bb search "{query2}" --num-results 25 --output /tmp/cold_discovery_batch_{N2}.json && \ +bb search "{query3}" --num-results 25 --output /tmp/cold_discovery_batch_{N3}.json && \ +echo "Discovery complete" + +After the command completes, report back ONLY the count of results found per batch. +Do NOT analyze, summarize, or return the actual results. +``` + +## Research & Enrichment Subagent Prompt Template + +``` +You are a lead research & enrichment subagent. For each company URL, research the company using a 3-phase pattern and score ICP fit. Do NOT write emails — that happens later in Step 8 after contacts are found. + +CONTEXT: +- Sender's company: {sender_company} +- Sender's product: {sender_product} +- ICP description: {icp_description} +- Pitch angle: {pitch_angle} +- Depth mode: {depth_mode} +- Output schema columns: {columns} + +URLS TO PROCESS: +{url_list} + +TOOL RULES — CRITICAL, FOLLOW EXACTLY: +1. You may ONLY use the Bash tool. No exceptions. +2. All searches: Bash → bb search "..." --num-results 10 +3. All page fetches: Bash → bb fetch --allow-redirects "..." + bb fetch returns RAW HTML. To extract text, pipe through: + sed 's/]*>.*<\/script>//g; s/]*>.*<\/style>//g; s/<[^>]*>//g' | tr -s ' \n' | head -c 3000 + If a page returns thin content or "enable JavaScript", use bb browse instead. +4. BATCH all file writes: Write ALL markdown files in a SINGLE Bash call using chained heredocs (one permission prompt, not one per file). +5. BANNED TOOLS: WebFetch, WebSearch, Write, Read, Glob, Grep — ALL BANNED. + If you use ANY banned tool, the entire run fails. Use ONLY Bash. +6. NEVER use ~ or $HOME in paths — use full literal paths. + +RESEARCH PATTERN (per company): +Follow the 3-phase deep research pattern from references/research-patterns.md. + +Phase A — Plan (skip in quick mode): +Decompose what you need to know into sub-questions based on ICP and enrichment fields. + +Phase B — Research Loop: +For each sub-question (or just the homepage in quick mode): +1. Run bb search with relevant query +2. Pick 1-2 most relevant URLs from results +3. Run bb fetch --allow-redirects on selected URLs, pipe through sed to extract text +4. Smart page discovery: try /llms.txt or /sitemap.xml to find relevant pages — don't guess paths +5. Extract findings: factual statements with source, confidence level +6. Accumulate findings, move to next sub-question +7. Respect step budget: quick=2-3 calls, deep=5-8, deeper=10-15 + +Phase C — Synthesize: +From accumulated findings: +1. Score ICP fit 1-10 (see rubric below) +2. Fill enrichment fields from findings +3. Reference specific findings in icp_fit_reasoning +4. Do NOT write emails — that happens in Step 8 after contacts are discovered + +ICP SCORING RUBRIC: +- 8-10: Strong match. Multiple high-confidence findings confirm fit. Pitch angle directly addresses a visible need. +- 5-7: Partial match. Some findings suggest relevance but key signals missing or low-confidence. +- 1-4: Weak match. Findings indicate wrong segment or no apparent connection. + +OUTPUT — write ALL company files in a SINGLE Bash call using chained heredocs: + +mkdir -p /tmp/cold_research && cat << 'COMPANY_MD' > /tmp/cold_research/{slug1}.md +--- +company_name: {name} +website: {url} +product_description: {description} +industry: {industry} +target_audience: {audience} +key_features: {feature1} | {feature2} | {feature3} +icp_fit_score: {score} +icp_fit_reasoning: {reasoning} +employee_estimate: {estimate} +funding_info: {funding} +headquarters: {location} +--- + +## Product +{product description paragraph} + +## Research Findings +- **[{confidence}]** {finding} (source: {url}) +COMPANY_MD +cat << 'COMPANY_MD' > /tmp/cold_research/{slug2}.md +--- +... +--- +... +COMPANY_MD + +Use 'COMPANY_MD' (quoted) as the heredoc delimiter to prevent shell variable expansion. + +Report back ONLY: "Batch {batch_id}: {succeeded}/{total} enriched, {findings_count} total findings." +Do NOT return raw data to the main conversation. +``` + +## Wave Management + +### Key Principle: Maximize Parallelism, Minimize Prompts +Launch as many subagents as possible in a single message (up to ~6 Agent tool calls per message). Each subagent MUST batch all its Bash operations to minimize permission prompts — the user should only see 1-2 prompts per subagent, not one per file write. + +### Discovery Phase +- Launch up to 6 discovery subagents in a single message (multiple Agent tool calls) +- Each subagent runs ALL its queries in a SINGLE Bash call using `&&` chaining +- After all discovery waves complete, run `node {SKILL_DIR}/scripts/list_urls.mjs /tmp` to get deduplicated URLs +- **Filter URLs**: Remove blog posts, news articles, directories, competitors, and existing customers. Keep only company homepages. + +### Research & Enrichment Phase +- Companies per subagent varies by depth: + - `quick`: ~10 companies per subagent (light research per company) + - `deep`: ~5 companies per subagent (moderate research per company) + - `deeper`: ~2-3 companies per subagent (intensive research per company) +- Launch up to 6 subagents in a single message +- Each subagent writes ALL its markdown files in a SINGLE Bash call (chained heredocs) +- After ALL enrichment subagents complete, the main agent reads `/tmp/cold_research/*.md` frontmatter for the interim summary + +### Sizing Formula +``` +micro_verticals = ceil(requested_leads / 35) +discovery_subagents = micro_verticals +expected_urls = micro_verticals * 20 (avg yield ~20 per 25-result query after filtering) + +quick: enrichment_subagents = ceil(expected_urls / 10) +deep: enrichment_subagents = ceil(expected_urls / 5) +deeper: enrichment_subagents = ceil(expected_urls / 3) + +discovery_waves = ceil(discovery_subagents / 6) +enrichment_waves = ceil(enrichment_subagents / 6) +``` + +### Error Handling +- If a subagent fails, log the error and continue with remaining batches +- If >50% of subagents fail in a wave, pause and inform the user +- Never retry identical queries — adjust wording if a query returns poor results +- If `bb fetch --allow-redirects` fails (thin content, timeout, 1MB limit, redirect loop), try `bb browse` as fallback or skip and note in stats + +## Contact Discovery Subagent Prompt Template + +``` +You are a contact discovery subagent. Find decision makers at target companies. + +TOOL RULES — CRITICAL, FOLLOW EXACTLY: +1. You may ONLY use the Bash tool. No exceptions. +2. All searches: Bash → bb search "..." --num-results 10 +3. BATCH all searches into ONE Bash call using && chaining. +4. BATCH all file appends into ONE Bash call using && chaining. +5. BANNED TOOLS: WebFetch, WebSearch, Write, Read, Glob, Grep — ALL BANNED. + If you use ANY banned tool, the entire run fails. Use ONLY Bash. +6. NEVER use ~ or $HOME in paths — use full literal paths. + +COMPANIES TO RESEARCH: +{company_list_with_websites} + +SENDER CONTEXT: +- Sender's company: {sender_company} +- Sender's product: {sender_product} +- ICP description: {icp_description} + +TARGET TITLES: +{target_titles} + +RESEARCH PATTERN (per company): +1. Search: bb search "{company name} {title} LinkedIn" --num-results 10 for each target title +2. Search: bb search "{company name} team leadership engineering about" --num-results 10 +3. If no results for specific titles, fall back to: bb search "{company name} founder CEO about us" --num-results 10 +4. From results, extract: name, title, LinkedIn URL +5. Estimate email: first@companydomain.com (use the company's actual domain) + +OUTPUT — append ALL contact sections in a SINGLE Bash call: + +cat << 'CONTACT_MD' >> /tmp/cold_research/{slug1}.md + +## Contact +- Name: {full name} +- Title: {actual title} +- Email: {first}@{domain} +- LinkedIn: {url or "—"} +CONTACT_MD +cat << 'CONTACT_MD' >> /tmp/cold_research/{slug2}.md + +## Contact +- Name: {full name} +... +CONTACT_MD + +If no contact found for a company, write "## Contact\n- No contact found". + +Report back a summary table of contacts found per company. +``` + +## CSV Compilation + +Use the bundled `compile_csv.mjs` script instead of writing one-off compilation code: + +```bash +node {SKILL_DIR}/scripts/compile_csv.mjs /tmp/cold_research ~/Desktop/{company}_outbound_{date}.csv +``` + +The script: +- Reads all `.md` files from the research directory +- Parses YAML frontmatter for structured fields +- Extracts `## Contact` and `## Email Draft` sections from the body +- Deduplicates by normalized company name (keeps highest ICP score) +- Outputs CSV with priority columns first (company_name, website, icp_fit_score, etc.) +- Prints a JSON summary to stderr (total leads, score distribution, duplicates removed) diff --git a/skills/cold-outbound/scripts/compile_csv.mjs b/skills/cold-outbound/scripts/compile_csv.mjs new file mode 100755 index 0000000..56b4222 --- /dev/null +++ b/skills/cold-outbound/scripts/compile_csv.mjs @@ -0,0 +1,155 @@ +#!/usr/bin/env node + +// Compiles per-company markdown research files into a deduplicated CSV. +// Parses YAML frontmatter for structured fields, extracts contact info +// and email drafts from markdown body sections. +// +// Usage: node compile_csv.mjs [output-file] +// Example: node compile_csv.mjs /tmp/cold_research ~/Desktop/leads.csv + +import { readdirSync, readFileSync, writeFileSync } from 'fs'; +import { join } from 'path'; + +const args = process.argv.slice(2); + +if (args.includes('--help') || args.includes('-h') || args.length === 0) { + console.error(`Usage: node compile_csv.mjs [output-file] + +Reads all .md files from , parses YAML frontmatter and +body sections (Contact, Email Draft), and writes a deduplicated CSV. + +If no output file is specified, writes CSV to stdout. + +Options: + --help, -h Show this help message + +Examples: + node compile_csv.mjs /tmp/cold_research + node compile_csv.mjs /tmp/cold_research ~/Desktop/leads.csv`); + process.exit(args.includes('--help') || args.includes('-h') ? 0 : 1); +} + +const dir = args[0]; +const outputFile = args[1] || null; + +let files; +try { + files = readdirSync(dir).filter(f => f.endsWith('.md')).sort(); +} catch (err) { + console.error(`Error reading directory ${dir}: ${err.message}`); + process.exit(1); +} + +if (files.length === 0) { + console.error(`No .md files found in ${dir}`); + process.exit(1); +} + +const rows = []; + +for (const file of files) { + const content = readFileSync(join(dir, file), 'utf-8'); + + // Parse YAML frontmatter + const fmMatch = content.match(/^---\n([\s\S]*?)\n---/); + if (!fmMatch) { + console.error(`Warning: No frontmatter in ${file}, skipping`); + continue; + } + const fields = {}; + for (const line of fmMatch[1].split('\n')) { + const idx = line.indexOf(':'); + if (idx > 0) { + const key = line.slice(0, idx).trim(); + const val = line.slice(idx + 1).trim().replace(/^["']|["']$/g, ''); + if (key && val) fields[key] = val; + } + } + + // Extract contact info from ## Contact section + const contactMatch = content.match(/## Contact\n([\s\S]*?)(?=\n## |$)/); + if (contactMatch && !contactMatch[1].includes('No contact found')) { + const block = contactMatch[1]; + const nameM = block.match(/Name:\s*(.+)/); + const titleM = block.match(/Title:\s*(.+)/); + const emailM = block.match(/Email:\s*(.+)/); + const linkedinM = block.match(/LinkedIn:\s*(.+)/); + if (nameM) fields.contact_name = nameM[1].trim(); + if (titleM) fields.contact_title = titleM[1].trim(); + if (emailM) fields.estimated_email = emailM[1].trim(); + if (linkedinM && linkedinM[1].trim() !== '—') fields.linkedin_url = linkedinM[1].trim(); + } + + // Extract email draft from ## Email Draft section + const emailMatch = content.match(/## Email Draft\n([\s\S]*?)(?=\n## |$)/); + if (emailMatch) { + fields.personalized_email = emailMatch[1].trim().replace(/\n/g, '\\n'); + } + + rows.push(fields); +} + +if (rows.length === 0) { + console.error('No valid research files found'); + process.exit(1); +} + +// Deduplicate by normalized company name (keep highest ICP score) +const seen = new Map(); +for (const row of rows) { + const name = (row.company_name || '').toLowerCase().replace(/\s*(inc|llc|ltd|corp|co)\s*\.?$/i, '').trim(); + const score = parseInt(row.icp_fit_score) || 0; + if (!seen.has(name) || score > (parseInt(seen.get(name).icp_fit_score) || 0)) { + seen.set(name, row); + } +} +const dedupedRows = [...seen.values()]; + +// Priority columns first, then rest alphabetically +const priority = [ + 'company_name', 'website', 'product_description', 'icp_fit_score', + 'icp_fit_reasoning', 'contact_name', 'contact_title', 'estimated_email', + 'linkedin_url', 'personalized_email' +]; +const allCols = [...new Set(dedupedRows.flatMap(r => Object.keys(r)))]; +const cols = [ + ...priority.filter(c => allCols.includes(c)), + ...allCols.filter(c => !priority.includes(c)).sort() +]; + +function csvEscape(v) { + if (!v) return ''; + if (v.includes(',') || v.includes('"') || v.includes('\n')) { + return '"' + v.replace(/"/g, '""') + '"'; + } + return v; +} + +const csvLines = [cols.join(',')]; +for (const row of dedupedRows) { + csvLines.push(cols.map(c => csvEscape(row[c] || '')).join(',')); +} +const csvContent = csvLines.join('\n') + '\n'; + +if (outputFile) { + writeFileSync(outputFile, csvContent); +} else { + process.stdout.write(csvContent); +} + +// Summary to stderr +const scores = dedupedRows.map(r => parseInt(r.icp_fit_score) || 0); +const high = scores.filter(s => s >= 8).length; +const medium = scores.filter(s => s >= 5 && s < 8).length; +const low = scores.filter(s => s < 5).length; +const withContacts = dedupedRows.filter(r => r.contact_name).length; +const dupsRemoved = rows.length - dedupedRows.length; + +console.error(JSON.stringify({ + total_leads: dedupedRows.length, + duplicates_removed: dupsRemoved, + with_contacts: withContacts, + score_distribution: { high_8_10: high, medium_5_7: medium, low_1_4: low }, + columns: cols, + output_file: outputFile || 'stdout' +}, null, 2)); diff --git a/skills/cold-outbound/scripts/list_urls.mjs b/skills/cold-outbound/scripts/list_urls.mjs new file mode 100755 index 0000000..516236c --- /dev/null +++ b/skills/cold-outbound/scripts/list_urls.mjs @@ -0,0 +1,85 @@ +#!/usr/bin/env node + +// Deduplicates discovery URLs from bb search JSON output files. +// Usage: node list_urls.mjs /tmp [--prefix cold] +// Reads all {prefix}_discovery_batch_*.json files, deduplicates by domain, +// outputs one URL per line to stdout, stats to stderr. + +import { readdirSync, readFileSync } from 'fs'; +import { join } from 'path'; + +const args = process.argv.slice(2); + +if (args.includes('--help') || args.includes('-h') || args.length === 0) { + console.error(`Usage: node list_urls.mjs [--prefix ] + +Reads all _discovery_batch_*.json files from , +deduplicates URLs by domain, and outputs one URL per line to stdout. + +Options: + --prefix Batch file prefix (default: "cold") + --help, -h Show this help message + +Examples: + node list_urls.mjs /tmp + node list_urls.mjs /tmp --prefix cold`); + process.exit(args.includes('--help') || args.includes('-h') ? 0 : 1); +} + +const dir = args[0]; +const prefixIdx = args.indexOf('--prefix'); +const prefix = prefixIdx !== -1 && args[prefixIdx + 1] ? args[prefixIdx + 1] : 'cold'; + +const pattern = new RegExp(`^${prefix}_discovery_batch_.*\\.json$`); + +let files; +try { + files = readdirSync(dir) + .filter(f => pattern.test(f)) + .sort(); +} catch (err) { + console.error(`Error reading directory ${dir}: ${err.message}`); + process.exit(1); +} + +if (files.length === 0) { + console.error(`No ${prefix}_discovery_batch_*.json files found in ${dir}`); + process.exit(1); +} + +const seenDomains = new Set(); +const urls = []; +let totalResults = 0; + +for (const file of files) { + try { + const data = JSON.parse(readFileSync(join(dir, file), 'utf-8')); + const results = Array.isArray(data) ? data : (data.results || []); + totalResults += results.length; + + for (const result of results) { + const url = result.url; + if (!url) continue; + + try { + const hostname = new URL(url).hostname.replace(/^www\./, ''); + if (!seenDomains.has(hostname)) { + seenDomains.add(hostname); + urls.push(url); + } + } catch { + // Skip invalid URLs + } + } + } catch (err) { + console.error(`Warning: Failed to parse ${file}: ${err.message}`); + } +} + +// Output deduplicated URLs to stdout +for (const url of urls) { + console.log(url); +} + +// Stats to stderr +console.error(`\n${files.length} files, ${totalResults} total results, ${urls.length} unique domains`); diff --git a/skills/cold-outbound/scripts/package.json b/skills/cold-outbound/scripts/package.json new file mode 100644 index 0000000..bdeb272 --- /dev/null +++ b/skills/cold-outbound/scripts/package.json @@ -0,0 +1,6 @@ +{ + "name": "cold-outbound-scripts", + "version": "0.1.0", + "private": true, + "type": "module" +}