A skill for conducting multi-phase deep research using Tavily CLI for search and Firecrawl CLI (REQUIRED) for content scraping, with content quality validation (REQUIRED) and human judgment at each step.
This skill provides a structured 3-phase research methodology:
- Initial Discovery - Map the landscape, identify themes, scrape and validate high-value sources
- Breadth Expansion - Explore multiple angles, scrape and validate content systematically
- Depth Exploration - Deep dive into priority domains, comprehensive scraping with validation
Key principles:
- Human judgment guides the research at every phase, not automation
- Raw content MUST be preserved using Firecrawl on high-value sources
- Content quality MUST be validated - every scraped file is evaluated for errors and relevance
- Quality gates MUST be met - sufficient high-quality sources required before synthesis
- Source attribution is mandatory in every synthesis
π¬ Autoresearch Optimized: All quality evaluation parameters (risk thresholds, scoring weights, content requirements) have been systematically optimized through autoresearch iterations for maximum research accuracy and minimal false positives.
π Includes Python Package: This skill bundles a custom Tavily CLI (tavaliy-cli/) - a Python CLI tool with API key rotation, multi-format output, and comprehensive test suite. Install with uv tool install.
npx skills add https://github.com/socamalo/deep-research.gitgit clone https://github.com/socamalo/deep-research.git ~/.claude/skills/deep-research- Download or clone this repository
- Copy the
deep-researchfolder to your Claude Code skills directory:- macOS/Linux:
~/.claude/skills/ - Windows:
%USERPROFILE%\.claude\skills\
- macOS/Linux:
After installation, install the required CLI tools:
# Check if already installed
which tavily && tavily --help
# Install from local path (bundled with this skill)
uv tool install ~/.claude/skills/deep-research/tavaliy-cli
# Update (if needed)
uv tool uninstall tavily
uv tool install ~/.claude/skills/deep-research/tavaliy-cliConfigure Tavily API Keys:
cd ~/.claude/skills/deep-research/tavaliy-cli
cp .env.example .env
# Edit .env and add your Tavily API key(s)Supports multiple keys for automatic rotation (TAVILY_API_KEY_1, TAVILY_API_KEY_2, etc.)
npm install -g firecrawlConfigure Firecrawl API Key:
export FIRECRAWL_API_KEY="your-firecrawl-api-key"Get your API keys:
- Tavily: https://tavily.com
- Firecrawl: https://firecrawl.dev
BEFORE scraping, use Quality Evaluator (Pre-Scrape Mode) to filter Tavily results.
- Tavily Score β content quality (many 1.00 scores lead to 404 pages)
- Portal sites have high link rot
- Some sites block direct article access but allow homepage + crawl
- Pre-filtering saves time and reduces failed scrapes
First, read references/quality-evaluator.md (Mode 1: Pre-Scrape Assessment) for detailed evaluation guidelines.
Then invoke:
Evaluate sources (Pre-Scrape Mode):
- Research topic: {topic}
- Results file: ./01-initial-discovery/raw-results/search-01.json
Use skill: quality-evaluator
Output: JSON with recommended/excluded URLs and risk assessmentHigh Risk (frequent 404/invalid URLs):
- News portals: sina.com.cn, sohu.com, 163.com, ifeng.com
- Regional news subdomains
- Temporary event pages
Strategy for high-risk domains:
- Try direct scrape first (quick fail check)
- If 404 β extract homepage URL
- Use
firecrawl map "https://homepage.com"to discover valid content - Or skip and find alternative sources
Every scraped file MUST be validated using Quality Evaluator (Post-Scrape Mode) before entering synthesis.
Without validation, your research pipeline may include:
- 404/403 error pages (even 5-line nginx errors)
- CAPTCHA/login walls
- Marketing pages with high word count but low substance
- Thin content with mostly navigation/ads
Simple word-counting FAILS:
- A 404 error page can have 20+ lines of nginx HTML
- A marketing page can have 400+ lines of specs without real insight
| Rating | Criteria | Action |
|---|---|---|
| high | Weighted score >= 7.7, passed validity | Keep and prioritize |
| medium | Weighted score 5.0-7.6, passed validity | Keep for synthesis |
| low | Weighted score 3.0-4.9, passed validity | Discard |
| poor | Weighted score < 3.0, passed validity | Discard immediately |
| failed | Failed validity (404/error/CAPTCHA) | Discard + retry |
Based on systematic optimization (targeting 50 total sources with authority-weighted scoring):
Phase 1 (Initial Discovery):
- Queries: 7 broad queries
- Max results: 15 per query
- Min quality score: 0.52
- Minimum: 5 high/medium quality sources
- Target: 8 sources
- Ideal: 10+ diverse sources
Phase 2 (Breadth Expansion):
- Queries: 9 targeted queries
- Max results: 12 per query
- Min quality score: 0.62
- Minimum: 8 high/medium quality sources
- Target: 12 sources
- Ideal: 15+ sources
Phase 3 (Depth Exploration):
- Queries: 8 deep queries
- Max results: 10 per query
- Min quality score: 0.67
- Minimum: 10 high/medium quality sources
- Target: 15+ sources
- Ideal: 20+ authoritative sources
Overall Strategy:
- max_total_sources: 50 (across all phases)
- Scoring priority: Authority (0.32) > Density (0.25) > Relevance (0.25) > Timeliness (0.10) > Uniqueness (0.08)
Rule: Do NOT proceed to synthesis until quality gate is met.
- Scrape URL with Firecrawl
- Read scraped file
- Invoke Quality Evaluator (Post-Scrape Mode)
- Get quality rating
- Keep high/medium, discard low/poor/failed
- Retry with new searches if quality gate not met
Every research phase MUST use Firecrawl to scrape high-value sources.
- URLs with Tavily score > 0.75
- Authoritative sources (universities, official docs, recognized experts)
- In-depth articles or guides
- Content you need to quote or reference
# Always save to raw-content directory
firecrawl scrape "https://example.com/article" markdown \
-o ./01-initial-discovery/raw-content/example-com-article.md
# Scrape with only main content
firecrawl scrape "https://example.com/article" markdown \
--only-main-content \
-o ./raw-content/example-com-article.md# Start Claude
claude
# Ask Claude to research using the skill:
# "Research AI agent frameworks using the deep-research skill"The skill guides Claude through structured research:
- Phase 1: Claude designs broad searches, scrapes 5-8 URLs, validates content quality, ensures 5+ high-quality sources
- User Checkpoint: Review synthesis with quality ratings, set research direction
- Phase 2: Claude explores multiple angles, scrapes 8-12 sources, validates, ensures 8+ high-quality sources
- User Checkpoint: Identify priority domains for deep dive
- Phase 3: Claude conducts targeted deep research, scrapes 10-15 sources, validates, ensures 10+ high-quality sources
- Final Report: Comprehensive synthesis with full source attribution
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User provides research topic β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Phase 1: Initial Discovery β
β - Claude designs 3-5 broad Tavily searches β
β - Identifies 5-8 high-value URLs (score > 0.75) β
β - Scrapes each URL with Firecrawl β
β - Quality Evaluator evaluates each scrape β
β - Quality Gate: Minimum 5 high/medium sources β
β - If not met: retry with new searches β
β - Reviews validated content, synthesizes findings β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Checkpoint 1 β
β - Review synthesis & quality report β
β - Discuss and set direction β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Phase 2: Breadth Expansion β
β - Claude designs targeted searches per angle β
β - Identifies 8-12 high-value sources β
β - Scrapes all sources with Firecrawl β
β - Quality Evaluator evaluates each scrape β
β - Quality Gate: Minimum 8 high/medium sources β
β - If not met: retry with new searches β
β - Identifies 3-5 core domains β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Checkpoint 2 β
β - Review breadth findings with quality report β
β - Prioritize domains for deep dive β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Phase 3: Depth Exploration β
β - Targeted searches on priority domains β
β - Identifies 10-15 authoritative sources β
β - Comprehensive Firecrawl scraping β
β - Quality Evaluator evaluates each scrape β
β - Quality Gate: Minimum 10 high/medium sources β
β - If not met: retry with new searches β
β - Domain synthesis with cross-domain analysis β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Final Report β
β - Comprehensive synthesis β
β - Clear arguments with source references β
β - Appendix listing all raw content files with quality ratings β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Firecrawl ββββββΆβ Quality ββββββΆβ Quality Gate β
β Scrape URL β β Validator β β Check β
βββββββββββββββββββ βββββββββββββββββββ ββββββββββ¬βββββββββ
β
ββββββββββββββββββββββββββββΌβββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββ ββββββββββββ ββββββββββ
β high β β medium β β low/ β
β quality β β quality β β failed β
ββββββ¬βββββ βββββββ¬βββββ βββββ¬βββββ
β β β
βΌ βΌ βΌ
βββββββββββ ββββββββββββ ββββββββββ
β Keep β β Keep β β Discardβ
β& Prioritize β β β + Retryβ
βββββββββββ ββββββββββββ ββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Quality Gate Logic β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Count high + medium quality sources β
β β β
β βΌ β
β >= Minimum required? ββYESβββΆ Proceed to synthesis β
β β β
β NO β
β β β
β βΌ β
β Document failed URLs β
β Design new search queries β
β Execute new Tavily search β
β Scrape new URLs β
β Validate new content β
β β β
β ββββββββββββββββββββββββββΆ Loop until gate met β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Basic search
tavily search "query"
# Advanced search with more results
tavily search --depth advanced --max-results 10 "query"
# Extract content from URL
tavily extract "https://example.com"# Scrape to markdown (always use -o flag)
firecrawl scrape "https://example.com" markdown \
-o ./raw-content/example-com.md
# Scrape only main content
firecrawl scrape "https://example.com" markdown \
--only-main-content \
-o ./raw-content/example-com.md
# Scrape multiple formats
firecrawl scrape "https://example.com" \
--format markdown,links,summary \
-o ./raw-content/example-com.json
# Crawl website
firecrawl crawl "https://example.com"All research outputs are saved in research-output/:
research-output/
βββ 01-initial-discovery/
β βββ raw-results/ # Tavily search results (JSON)
β βββ raw-content/ # Firecrawl outputs - REQUIRED
β β βββ source-01-domain-com.md
β β βββ source-02-authority-org.md
β βββ synthesis.md # Phase findings & proposed directions
β βββ user-discussion.md # Direction decisions
β βββ quality-report.md # Quality validation results
βββ 02-breadth-expansion/
β βββ raw-results/
β βββ raw-content/ # Phase 2 scraped sources
β β βββ source-01.md
β β βββ source-02.md
β β βββ source-03.md
β βββ synthesis.md # Breadth findings & domains
β βββ user-discussion.md # Domain prioritization
β βββ quality-report.md # Quality validation results
βββ 03-depth-exploration/
β βββ raw-results/
β βββ raw-content/ # Phase 3 deep scraped sources
β β βββ source-01.md
β β βββ source-02.md
β β βββ source-03.md
β βββ synthesis.md # Deep domain insights
β βββ user-discussion.md
β βββ quality-report.md # Quality validation results
βββ 04-final-report/
β βββ comprehensive-report.md # Full synthesis with source refs
βββ meta/
βββ research-log.md # Complete search history
βββ iteration-notes.md # Feedback for improvement
Use descriptive names based on source:
huain-com-guzheng-article.md
guzheng-cn-composer-interview.md
people-cn-culture-report.md
nature-com-ai-research-paper.md
github-com-project-readme.md
Each scraped file includes metadata:
# Scraped Content
**Source**: https://example.com/article-path
**Scraped Date**: 2026-03-18
**Tavily Score**: 0.95
**Relevance**: High - authoritative source on topic
**Quality**: high (validated)
---
[Original content follows...]Each phase includes a quality report:
# Content Quality Report - Phase X
## Summary
- Total scraped: 12
- High quality: 5
- Medium quality: 4
- Low quality: 1
- Failed: 2
## Quality Gate Status: PASSED
- Required: 5 high/medium
- Achieved: 9 high/medium
## Failed Sources (Retried)
- source-05-broken.md: 404 Not Found
- source-09-paywall.md: Login required
## Retry Actions
- Searched alternative sources for [topic]
- Found replacements: source-05-alt.md, source-09-alt.md
- All replacements passed quality validationEvery synthesis.md includes source references:
## Source References
- [source-01-domain-com.md](./raw-content/source-01-domain-com.md) - Key findings on X (quality: high)
- [source-02-authority-org.md](./raw-content/source-02-authority-org.md) - Data on Y (quality: medium)
- [source-03-github-com.md](./raw-content/source-03-github-com.md) - Implementation details (quality: high)- Engage at checkpoints - Your input shapes the research direction
- Ask for rationale - Have Claude explain why certain sources were selected
- Challenge assumptions - If synthesis seems off, push back and refine
- Verify Firecrawl usage - Ensure high-value sources are being scraped
- Check quality validation - Ensure every scraped file is validated
- Monitor quality gates - Confirm sufficient sources before synthesis
- Check source attribution - Every synthesis should reference raw-content files
- Iterate if needed - Don't hesitate to revisit a phase
Topic: "Compare React server components vs traditional SSR"
Phase 1: Map the landscape - RSC architecture, SSR patterns, comparison articles
- Scraped: React docs, Vercel blog posts, comparison articles
- Validated: 6 high/medium quality sources (target: 5)
- Checkpoint 1: User wants focus on performance and developer experience
Phase 2: Explore performance benchmarks, DX feedback, adoption patterns
- Scraped: Benchmark studies, GitHub discussions, case studies
- Validated: 10 high/medium quality sources (target: 8)
- Checkpoint 2: User prioritizes: performance data and migration stories
Phase 3: Deep dive into benchmark studies and case studies
- Scraped: Detailed benchmark reports, migration guides, official docs
- Validated: 12 high/medium quality sources (target: 10)
Report: Data-backed comparison with recommendations and full source attribution
Topic: "AI coding assistant market landscape 2024"
Phase 1: Identify key players, segments, recent developments
- Scraped: Market reports, company blogs, analyst reviews
- Validated: 8 high/medium quality sources (target: 5)
- Checkpoint 1: User wants focus on enterprise adoption
Phase 2: Explore enterprise case studies, pricing, ROI analyses
- Scraped: Enterprise case studies, pricing pages, ROI calculators
- Validated: 12 high/medium quality sources (target: 8)
- Checkpoint 2: User prioritizes: security and integration challenges
Phase 3: Deep research on security audits and integration patterns
- Scraped: Security whitepapers, integration docs, compliance reports
- Validated: 15 high/medium quality sources (target: 10)
Report: Market analysis with security-focused recommendations
| Mistake | Fix |
|---|---|
| Only using Tavily, not Firecrawl | MUST scrape high-value URLs with Firecrawl after every search phase |
| Not saving raw content | ALWAYS save scraped markdown to raw-content/ directory |
| Skipping content validation | ALWAYS validate with Quality Evaluator after scraping |
| Not meeting quality gates | Loop with new searches until minimum sources achieved |
| Immediate synthesis without review | Review scraped AND validated content before synthesizing |
| Losing source attribution | Reference specific scraped files in synthesis.md |
| Scoring threshold too low | Only scrape sources with score > 0.75 or clear authority |
| Not creating raw-content directory | Create directory structure BEFORE starting research |
| Skipping user checkpoints | Always pause for direction - user's input shapes quality |
| Too many searches without synthesis | Stop to analyze patterns every 3-5 searches |
| Rushing to final report | Depth exploration often reveals critical insights |
MIT - Feel free to adapt for your own use.