The codebase scanner automatically detects your project languages and structure, making it easy to populate the Neo4J graph with your code.
🎯 New Feature: The scanner now automatically detects project languages using:
- Build file analysis (package.json, pom.xml, build.gradle, setup.py, pyproject.toml, etc.)
- Project metadata extraction (name, version, description, dependencies, frameworks)
- Multi-language project support (e.g., TypeScript frontend + Java backend)
- Sub-project detection for mono-repositories
- Fallback to file extension detection when build files aren't available
# Auto-detect languages and scan (recommended)
npm run scan /path/to/your/project
# Manual language specification (optional override)
npm run scan /path/to/your/project -- --languages typescript,javascript
# Include test files
npm run scan /path/to/your/project -- --include-tests
# Clear existing project data before scanning
npm run scan /path/to/your/project -- --clear-graph
# Clear ALL database data before scanning (all projects)
npm run scan /path/to/your/project -- --clear-all
# Exclude specific directories
npm run scan /path/to/your/project -- --exclude node_modules,dist,build🌐 New Feature: Scan repositories directly from GitHub, GitLab, and Bitbucket without local cloning:
# Public repositories
npm run scan https://github.com/owner/repo.git
npm run scan https://gitlab.com/owner/repo.git
npm run scan https://bitbucket.org/owner/repo.git
# Private repositories (requires authentication - see installation guide)
GITHUB_TOKEN=ghp_xxx npm run scan https://github.com/private/repo.git
# Specific branches or tags
npm run scan https://github.com/owner/repo.git -- --branch develop
npm run scan https://github.com/owner/repo.git -- --branch v2.0.0
# SSH URLs (requires SSH key setup)
npm run scan git@github.com:owner/repo.git
# With additional options
npm run scan https://github.com/owner/repo.git -- --languages typescript,java --analyzeUnderstanding Clear Options:
--clear-graph: Clears only the current project's data (useful for re-scanning a single project in a multi-project setup)--clear-all: Clears the entire database (all projects) - use when starting fresh or resolving data conflicts
# TypeScript project (detects from package.json)
npm run scan /path/to/react-app
# Output: ✅ TypeScript project detected, Framework: React
# Java project (detects from pom.xml or build.gradle)
npm run scan /path/to/spring-app
# Output: ✅ Java project detected, Build system: Maven, Framework: Spring Boot
# Python project (detects from setup.py or pyproject.toml)
npm run scan /path/to/django-app
# Output: ✅ Python project detected, Build system: Poetry, Framework: Django
# Multi-language project
npm run scan /path/to/fullstack-app
# Output: ✅ Multi-language project detected: TypeScript (frontend), Java (backend)
# Remote repository examples
npm run scan https://github.com/microsoft/vscode.git
# Output: ✅ TypeScript project detected, Build system: npm, Framework: Electron
npm run scan https://github.com/spring-projects/spring-boot.git
# Output: ✅ Java project detected, Build system: Gradle, Framework: Spring BootOnce connected to an AI tool, you can use:
Use the scan_dir tool to scan my current project at /path/to/project
Parameters for scan_dir:
directory_path: Path to your project rootlanguages: Array of languages (auto-detected if not specified)exclude_paths: Paths to exclude (auto-suggested based on detected languages)include_tests: Include test files (true/false, default based on project type)clear_existing: Clear existing graph data (true/false)max_depth: Maximum directory depth to scan (10)
Remote Repository Tools:
Use the scan_remote_repo tool to analyze https://github.com/owner/repo.git
Parameters for scan_remote_repo:
repository_url: Git repository URL (GitHub, GitLab, Bitbucket)branch: Branch name to scan (default: main/master)project_id: Custom project identifier (optional)languages: Array of languages (auto-detected if not specified)include_tests: Include test files (true/false)clear_existing: Clear existing graph data (true/false)use_cache: Use cached repository if available (true/false)shallow_clone: Use shallow clone for faster scanning (true/false)
Auto-Detection Features:
- Languages are automatically detected from build files and file extensions
- Project metadata (name, version, description) extracted from build files
- Framework detection (React, Spring Boot, Django, etc.) from dependencies
- Language-specific exclude paths automatically suggested
- Mono-repository and sub-project detection
- TypeScript (
.ts,.tsx) - JavaScript (
.js,.jsx) - Java (
.java) - Python (
.py) - C# (
.cs) - Coming soon
The scanner extracts:
Node Types:
- Classes and interfaces
- Methods and functions
- Fields and properties
- Packages and modules
- Enums and exceptions
Relationships:
- Inheritance (
extends) - Interface implementation (
implements) - Method calls (
calls) - Field references (
references) - Containment (
contains) - Package membership (
belongs_to)
Default Exclusions:
node_modules/dist/,build/,out/.git/- Test files (unless
--include-testsis specified) - Hidden files and directories (starting with
.)
Custom Exclusions:
# Exclude additional directories
npm run scan /path/to/project -- --exclude vendor,tmp,cacheInclude Test Files:
# Include test files in analysis
npm run scan /path/to/project -- --include-testsAutomatic Detection (Recommended): The scanner now uses intelligent language detection:
- Build File Analysis - Analyzes package.json, pom.xml, build.gradle, setup.py, pyproject.toml, etc.
- Metadata Extraction - Extracts project name, version, dependencies, and frameworks
- Multi-Language Support - Handles projects with multiple languages (e.g., frontend + backend)
- Sub-Project Detection - Identifies mono-repositories with multiple sub-projects
- File Extension Fallback - Uses file extensions when build files aren't available
Manual Override (When Needed):
# Override auto-detection for specific languages only
npm run scan /path/to/project -- --languages java,python
# Force TypeScript only (useful for mixed projects)
npm run scan /path/to/project -- --languages typescript# Limit scan depth to prevent deep recursion
npm run scan /path/to/project -- --max-depth 5=== CodeRAG Project Scanner ===
Scanning: /path/to/my-react-app
Project ID: my-react-app
🔍 Auto-detecting project structure...
✅ TypeScript project detected
📋 Project: my-react-app v1.2.0
🏗️ Framework: React
📦 Build system: npm
💡 Recommendations: Include test files, exclude node_modules, dist
📁 Scanning files...
TypeScript: 45 files
JavaScript: 12 files
📦 Processing entities...
Classes: 23
Interfaces: 8
Methods: 156
Fields: 89
🔗 Building relationships...
Inheritance: 12
Implementations: 15
Calls: 234
References: 178
✅ Scan completed successfully!
Project Summary:
- Project Name: my-react-app
- Version: 1.2.0
- Framework: React
- Total entities: 276
- Total relationships: 439
- Languages: TypeScript, JavaScript
- Scan duration: 2.3s
Common Issues:
- File Permission Errors:
⚠️ Warning: Cannot read file /path/to/file.ts (permission denied)
- Syntax Errors:
⚠️ Warning: Parse error in /path/to/file.ts:line 45
- Unsupported Files:
📁 Skipping unsupported file: /path/to/file.xml
Before Scanning:
- Ensure the project compiles successfully
- Fix any major syntax errors
- Consider excluding generated code directories
- Remove or exclude large vendor directories
New Project (First Time):
# Start with a clean database
npm run scan /path/to/project -- --clear-allRe-scanning Project:
# Update existing project data
npm run scan /path/to/project -- --clear-graphAdding to Multi-Project Database:
# Preserve other projects, add new one
npm run scan /path/to/new-projectLarge Projects:
# Limit depth and exclude unnecessary directories
npm run scan /path/to/large-project -- \
--max-depth 8 \
--exclude node_modules,dist,build,vendor,tmpIncremental Scanning:
- For large codebases, consider scanning modules separately
- Use project isolation to manage different components
- Monitor scan duration and adjust exclusions as needed
Automatic Multi-Language Detection:
# Auto-detects all languages in polyglot projects
npm run scan /path/to/fullstack-project
# Output: ✅ Multi-language project: TypeScript (frontend), Java (backend), Python (scripts)Manual Multi-Language Specification:
# Override auto-detection for specific languages
npm run scan /path/to/polyglot-project -- \
--languages typescript,java,python \
--include-testsMono-Repository Handling:
# Auto-detects sub-projects in mono-repositories
npm run scan /path/to/monorepo
# Output: 🏗️ Mono-repository detected with 3 sub-projects
# Suggestion: Consider scanning sub-projects separately for better organizationCheck Prerequisites:
# Verify Node.js version
node --version # Should be 18+
# Verify build
npm run build
# Check Neo4J connection
neo4j status # or check Docker containerCommon Solutions:
- Update Dependencies:
npm install
npm run build-
Check File Encoding:
- Ensure files are UTF-8 encoded
- Check for byte order marks (BOM)
-
Syntax Validation:
- Fix syntax errors in source files
- Consider excluding problematic files
Slow Scanning:
- Reduce Scope:
# Exclude large directories
npm run scan /path/to/project -- --exclude node_modules,dist,vendor- Limit Depth:
# Reduce maximum directory depth
npm run scan /path/to/project -- --max-depth 6- Language Filtering:
# Scan only specific languages
npm run scan /path/to/project -- --languages typescriptConnection Failures:
- Verify Neo4J is running
- Check
.envconfiguration - Test connection manually
Memory Issues:
- Increase Neo4J heap size
- Clear database before large scans
- Use
--clear-allfor fresh start
Authentication Failures:
# Error: 401 Unauthorized
# Solution: Check your authentication tokens in .env
GITHUB_TOKEN=ghp_xxx npm run scan https://github.com/private/repo.gitNetwork Issues:
# Error: Failed to clone repository
# Solutions:
# 1. Check internet connection
# 2. Verify repository URL
# 3. Try SSH URL if HTTPS fails
npm run scan git@github.com:owner/repo.gitBranch Not Found:
# Error: Branch 'feature-branch' not found
# Solution: Verify branch exists
npm run scan https://github.com/owner/repo.git -- --branch mainLarge Repository Timeouts:
# Solution: Use shallow clone for faster processing
npm run scan https://github.com/large/repo.git -- --shallow-cloneFor advanced users, you can modify scanner behavior by:
- Environment Variables:
# Set custom scan timeout
export SCANNER_TIMEOUT=300000 # 5 minutes
# Enable debug logging
export LOG_LEVEL=debug- Programmatic Usage:
// Use scanner in your own scripts
const { Scanner } = require('./build/scanner');
const scanner = new Scanner({
neo4jConfig: {
uri: 'bolt://localhost:7687',
user: 'neo4j',
password: 'password'
},
scanConfig: {
languages: ['typescript', 'java'],
excludePaths: ['node_modules', 'dist'],
includeTests: false,
maxDepth: 10
}
});
await scanner.scanDirectory('/path/to/project');# Scan multiple projects in sequence
for project in /path/to/projects/*; do
echo "Scanning $project"
npm run scan "$project"
done- Multi-Project Management - Manage multiple codebases
- Available Tools - Explore analysis tools
- Quality Metrics - Understand code quality analysis