A command-line tool that extracts docstrings from source code files across multiple programming languages using tree-sitter parsing.
-
Clone the repository
git clone https://github.com/yourusername/docstring-parser.git cd docstring-parser -
Install dependencies
pip install -r requirements.txt
This installs all required dependencies including tree-sitter, tree-sitter-languages, and pydantic.
-
Run the tool
python3 main.py sample_inputs
This will extract docstrings from all supported languages in the
sample_inputsdirectory and save them todocstrings.json.
python3 main.py [input_directory] [output_file] [--language LANG]input_directory: Path to the directory containing source files (required)output_file: Path to the output JSON file (optional, defaults todocstrings.json)--language LANG: Extract from specific language only (optional)
# Extract from all languages in sample_inputs, output to docstrings.json
python3 main.py sample_inputs
# Extract only Python docstrings
python3 main.py sample_inputs python_docs.json --language python
# Extract from a custom directory
python3 main.py /path/to/your/codebase output.json
# Extract C++ docstrings only
python3 main.py sample_inputs cpp_docs.json --language cpp- Python 3.11+
- pip package manager
This project uses uv for dependency and virtual environment management.
Install uv globally:
curl -LsSf https://astral.sh/uv/install.sh | shOr with Homebrew (macOS):
brew install astral-sh/uv/uvgit clone https://github.com/yourusername/docstring-parser.git
cd docstring-parseruv venvsource .venv/bin/activate # macOS/Linux
.venv\Scripts\activate # Windows (PowerShell or CMD)uv pip sync- Python
- C
- C++
- Java
- JavaScript
- TypeScript
docstring-parser/
├── main.py # CLI entry point
├── extractor_utils.py # Utility functions for docstring extraction
├── datamodels.py # Pydantic models for docstrings and symbols
├── logger.py # Logging configuration
├── extractor/ # Language-specific extractors
├── sample_inputs/ # Sample code files for testing
├── requirements.txt # Python dependencies
└── README.md # This file
- Fork the repository
- Create a feature branch
- Make your changes
- Test with sample inputs
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.