Skip to content

Owly-dabs/doc-extractor

Repository files navigation

Docstring Extractor

A command-line tool that extracts docstrings from source code files across multiple programming languages using tree-sitter parsing.

🚀 Quick Start

  1. Clone the repository

    git clone https://github.com/yourusername/docstring-parser.git
    cd docstring-parser
  2. Install dependencies

    pip install -r requirements.txt

    This installs all required dependencies including tree-sitter, tree-sitter-languages, and pydantic.

  3. Run the tool

    python3 main.py sample_inputs

    This will extract docstrings from all supported languages in the sample_inputs directory and save them to docstrings.json.

📖 Usage

python3 main.py [input_directory] [output_file] [--language LANG]

Arguments

  • input_directory: Path to the directory containing source files (required)
  • output_file: Path to the output JSON file (optional, defaults to docstrings.json)
  • --language LANG: Extract from specific language only (optional)

Examples

# Extract from all languages in sample_inputs, output to docstrings.json
python3 main.py sample_inputs

# Extract only Python docstrings
python3 main.py sample_inputs python_docs.json --language python

# Extract from a custom directory
python3 main.py /path/to/your/codebase output.json

# Extract C++ docstrings only
python3 main.py sample_inputs cpp_docs.json --language cpp

🛠 Prerequisites

  • Python 3.11+
  • pip package manager

🏗 Development Setup

This project uses uv for dependency and virtual environment management.

1. Install uv

Install uv globally:

curl -LsSf https://astral.sh/uv/install.sh | sh

Or with Homebrew (macOS):

brew install astral-sh/uv/uv

2. Clone the repository

git clone https://github.com/yourusername/docstring-parser.git
cd docstring-parser

3. Create a virtual environment

uv venv

4. Activate the environment

source .venv/bin/activate      # macOS/Linux
.venv\Scripts\activate         # Windows (PowerShell or CMD)

5. Install dependencies

uv pip sync

🌐 Supported Languages

  • Python
  • C
  • C++
  • Java
  • JavaScript
  • TypeScript

📁 Project Structure

docstring-parser/
├── main.py                 # CLI entry point
├── extractor_utils.py      # Utility functions for docstring extraction
├── datamodels.py           # Pydantic models for docstrings and symbols
├── logger.py               # Logging configuration
├── extractor/              # Language-specific extractors
├── sample_inputs/          # Sample code files for testing
├── requirements.txt        # Python dependencies
└── README.md              # This file

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test with sample inputs
  5. Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A command-line tool that extracts docstrings from source code files across multiple programming languages using tree-sitter parsing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors