diff --git a/README.md b/README.md index a92f9dc..31bf0a0 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # PURL2SRC - Package URL to Source Download URLs -[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) +[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/) [![PyPI version](https://img.shields.io/pypi/v/purl2src.svg)](https://pypi.org/project/purl2src/) diff --git a/docs/api.md b/docs/api.md new file mode 100644 index 0000000..c3f9f2f --- /dev/null +++ b/docs/api.md @@ -0,0 +1,478 @@ +# PURL2SRC - API Reference + +## Table of Contents +- [Overview](#overview) +- [Core Functions](#core-functions) +- [Classes](#classes) +- [Data Types](#data-types) +- [Exceptions](#exceptions) +- [Examples](#examples) + +## Overview + +The PURL2SRC Python API provides programmatic access to PURL resolution functionality. + +## Core Functions + +### `get_download_url(purl, validate=False, timeout=30)` + +Resolves a PURL to its download URL. + +**Parameters:** +- `purl` (str): Package URL to resolve +- `validate` (bool): Whether to validate the URL is accessible +- `timeout` (int): Timeout in seconds for validation + +**Returns:** +- `ResolvedPackage`: Object containing resolution results + +**Raises:** +- `InvalidPURLError`: If PURL format is invalid +- `ResolutionError`: If resolution fails +- `ValidationError`: If validation fails + +**Example:** +```python +from purl2src import get_download_url + +result = get_download_url("pkg:npm/express@4.17.1", validate=True) +print(result.download_url) +# https://registry.npmjs.org/express/-/express-4.17.1.tgz +``` + +### `process_purls(purls, validate=False, parallel=True)` + +Process multiple PURLs in batch. + +**Parameters:** +- `purls` (List[str]): List of PURLs to process +- `validate` (bool): Whether to validate URLs +- `parallel` (bool): Process in parallel for speed + +**Returns:** +- `List[ResolvedPackage]`: List of resolution results + +**Example:** +```python +from purl2src import process_purls + +purls = [ + "pkg:npm/express@4.17.1", + "pkg:pypi/requests@2.28.0" +] + +results = process_purls(purls, validate=True) +for result in results: + print(f"{result.purl} -> {result.download_url}") +``` + +### `validate_purl(purl)` + +Validates PURL format without resolving. + +**Parameters:** +- `purl` (str): PURL to validate + +**Returns:** +- `bool`: True if valid + +**Raises:** +- `InvalidPURLError`: If format is invalid + +**Example:** +```python +from purl2src import validate_purl + +try: + validate_purl("pkg:npm/express@4.17.1") + print("Valid PURL") +except InvalidPURLError as e: + print(f"Invalid: {e}") +``` + +## Classes + +### `PURLResolver` + +Main class for PURL resolution with configuration options. + +```python +from purl2src import PURLResolver + +resolver = PURLResolver( + validate=True, + timeout=30, + cache_enabled=True, + max_retries=3 +) +``` + +#### Methods + +##### `resolve(purl)` + +Resolves a single PURL. + +**Parameters:** +- `purl` (str): PURL to resolve + +**Returns:** +- `ResolvedPackage`: Resolution result + +**Example:** +```python +result = resolver.resolve("pkg:npm/express@4.17.1") +``` + +##### `resolve_batch(purls)` + +Resolves multiple PURLs. + +**Parameters:** +- `purls` (List[str]): List of PURLs + +**Returns:** +- `List[ResolvedPackage]`: Resolution results + +**Example:** +```python +results = resolver.resolve_batch([ + "pkg:npm/express@4.17.1", + "pkg:pypi/django@4.0.0" +]) +``` + +##### `set_strategy(strategy)` + +Sets resolution strategy. + +**Parameters:** +- `strategy` (ResolutionStrategy): Strategy to use + +**Options:** +- `ResolutionStrategy.DIRECT`: Direct URL construction only +- `ResolutionStrategy.REGISTRY`: Registry API queries only +- `ResolutionStrategy.FALLBACK`: Local package managers only +- `ResolutionStrategy.ALL`: Try all strategies (default) + +**Example:** +```python +from purl2src import ResolutionStrategy + +resolver.set_strategy(ResolutionStrategy.DIRECT) +``` + +### `ResolvedPackage` + +Result object from PURL resolution. + +**Attributes:** +- `purl` (str): Original PURL +- `download_url` (str): Resolved download URL +- `ecosystem` (str): Package ecosystem +- `name` (str): Package name +- `version` (str): Package version +- `namespace` (str): Package namespace (if any) +- `validated` (bool): Whether URL was validated +- `resolution_method` (str): Method used for resolution +- `metadata` (dict): Additional package metadata + +**Example:** +```python +result = get_download_url("pkg:npm/express@4.17.1") + +print(f"Package: {result.name}") +print(f"Version: {result.version}") +print(f"URL: {result.download_url}") +print(f"Method: {result.resolution_method}") +``` + +### `PackageRegistry` + +Interface to package registries. + +```python +from purl2src import PackageRegistry + +registry = PackageRegistry("npm") +``` + +#### Methods + +##### `get_package_info(name, version)` + +Gets package information from registry. + +**Parameters:** +- `name` (str): Package name +- `version` (str): Package version + +**Returns:** +- `dict`: Package metadata + +**Example:** +```python +info = registry.get_package_info("express", "4.17.1") +print(info["dist"]["tarball"]) +``` + +## Data Types + +### `ResolutionStrategy` + +Enum for resolution strategies. + +```python +from purl2src import ResolutionStrategy + +ResolutionStrategy.DIRECT # Direct URL construction +ResolutionStrategy.REGISTRY # Registry API queries +ResolutionStrategy.FALLBACK # Local package managers +ResolutionStrategy.ALL # Try all methods +``` + +### `Ecosystem` + +Enum for supported ecosystems. + +```python +from purl2src import Ecosystem + +Ecosystem.NPM # Node.js packages +Ecosystem.PYPI # Python packages +Ecosystem.MAVEN # Java packages +Ecosystem.CARGO # Rust packages +Ecosystem.NUGET # .NET packages +Ecosystem.GITHUB # GitHub repositories +Ecosystem.GEM # Ruby gems +Ecosystem.GOLANG # Go modules +Ecosystem.CONDA # Conda packages +Ecosystem.GENERIC # Generic packages +``` + +### `PURLComponents` + +Parsed PURL components. + +```python +from purl2src import parse_purl + +components = parse_purl("pkg:npm/@scope/name@1.0.0?qualifier=value") + +print(components.type) # "npm" +print(components.namespace) # "@scope" +print(components.name) # "name" +print(components.version) # "1.0.0" +print(components.qualifiers) # {"qualifier": "value"} +``` + +## Exceptions + +### `PURLError` + +Base exception for all PURL-related errors. + +### `InvalidPURLError` + +Raised when PURL format is invalid. + +```python +from purl2src import InvalidPURLError + +try: + get_download_url("invalid-purl") +except InvalidPURLError as e: + print(f"Invalid PURL: {e}") +``` + +### `ResolutionError` + +Raised when PURL cannot be resolved. + +```python +from purl2src import ResolutionError + +try: + get_download_url("pkg:npm/nonexistent@1.0.0") +except ResolutionError as e: + print(f"Resolution failed: {e}") +``` + +### `ValidationError` + +Raised when URL validation fails. + +```python +from purl2src import ValidationError + +try: + get_download_url("pkg:npm/private-package@1.0.0", validate=True) +except ValidationError as e: + print(f"Validation failed: {e}") +``` + +### `UnsupportedEcosystemError` + +Raised for unsupported package ecosystems. + +```python +from purl2src import UnsupportedEcosystemError + +try: + get_download_url("pkg:unknown/package@1.0.0") +except UnsupportedEcosystemError as e: + print(f"Unsupported: {e}") +``` + +## Examples + +### Basic Usage + +```python +from purl2src import get_download_url + +# Simple resolution +result = get_download_url("pkg:npm/express@4.17.1") +print(result.download_url) +``` + +### Batch Processing + +```python +from purl2src import process_purls + +# Read PURLs from file +with open("purls.txt") as f: + purls = [line.strip() for line in f] + +# Process all PURLs +results = process_purls(purls, validate=True, parallel=True) + +# Save results +import json +with open("results.json", "w") as f: + json.dump([r.__dict__ for r in results], f, indent=2) +``` + +### Custom Configuration + +```python +from purl2src import PURLResolver, ResolutionStrategy + +# Configure resolver +resolver = PURLResolver( + validate=True, + timeout=60, + cache_enabled=True, + cache_dir="~/.purl2src/cache", + max_retries=5, + user_agent="MyApp/1.0" +) + +# Use specific strategy +resolver.set_strategy(ResolutionStrategy.REGISTRY) + +# Resolve +result = resolver.resolve("pkg:pypi/django@4.0.0") +``` + +### Error Handling + +```python +from purl2src import ( + get_download_url, + InvalidPURLError, + ResolutionError, + ValidationError +) + +def safe_resolve(purl): + try: + result = get_download_url(purl, validate=True) + return result.download_url + except InvalidPURLError: + return f"Invalid PURL format: {purl}" + except ResolutionError: + return f"Could not resolve: {purl}" + except ValidationError: + return f"URL not accessible: {purl}" + except Exception as e: + return f"Unexpected error: {e}" + +url = safe_resolve("pkg:npm/express@4.17.1") +print(url) +``` + +### Registry Direct Access + +```python +from purl2src import PackageRegistry + +# Access npm registry directly +npm_registry = PackageRegistry("npm") +info = npm_registry.get_package_info("express", "4.17.1") + +print(f"Description: {info['description']}") +print(f"License: {info['license']}") +print(f"Download: {info['dist']['tarball']}") +``` + +### Async Support + +```python +import asyncio +from purl2src import async_get_download_url + +async def resolve_async(): + tasks = [ + async_get_download_url("pkg:npm/express@4.17.1"), + async_get_download_url("pkg:pypi/django@4.0.0"), + async_get_download_url("pkg:cargo/serde@1.0.0") + ] + + results = await asyncio.gather(*tasks) + return results + +results = asyncio.run(resolve_async()) +for result in results: + print(f"{result.purl} -> {result.download_url}") +``` + +### Integration with SEMCL.ONE + +```python +from purl2src import get_download_url +import subprocess + +def download_and_analyze(purl): + # Resolve PURL to download URL + result = get_download_url(purl, validate=True) + + # Download the package + subprocess.run([ + "wget", result.download_url, + "-O", f"{result.name}-{result.version}.tar.gz" + ]) + + # Extract and analyze with other SEMCL.ONE tools + subprocess.run([ + "tar", "xzf", f"{result.name}-{result.version}.tar.gz" + ]) + + # Run ossnotices on extracted content + subprocess.run([ + "ossnotices", f"{result.name}-{result.version}", + "-o", f"{result.name}-NOTICE.txt" + ]) + + return f"{result.name}-NOTICE.txt" + +notice_file = download_and_analyze("pkg:npm/express@4.17.1") +print(f"Notices generated: {notice_file}") +``` + +## See Also + +- [User Guide](user-guide.md) - Complete usage documentation +- [Examples](examples.md) - More code examples +- [PURL Specification](https://github.com/package-url/purl-spec) \ No newline at end of file diff --git a/docs/examples.md b/docs/examples.md new file mode 100644 index 0000000..4e7e6a5 --- /dev/null +++ b/docs/examples.md @@ -0,0 +1,599 @@ +# PURL2SRC - Examples + +## Table of Contents +- [Basic Examples](#basic-examples) +- [Ecosystem-Specific Examples](#ecosystem-specific-examples) +- [Batch Processing](#batch-processing) +- [Integration Workflows](#integration-workflows) +- [Advanced Scenarios](#advanced-scenarios) + +## Basic Examples + +### Example 1: Simple PURL Resolution + +```bash +# Resolve a single PURL +purl2src "pkg:npm/express@4.17.1" + +# With validation +purl2src "pkg:npm/express@4.17.1" --validate + +# JSON output +purl2src "pkg:npm/express@4.17.1" --format json +``` + +### Example 2: Multiple Output Formats + +```bash +# Text format (default) +purl2src "pkg:pypi/requests@2.28.0" +# Output: pkg:pypi/requests@2.28.0 -> https://files.pythonhosted.org/... + +# JSON format +purl2src "pkg:pypi/requests@2.28.0" --format json +# Output: {"purl": "pkg:pypi/requests@2.28.0", "download_url": "...", ...} + +# CSV format +purl2src "pkg:pypi/requests@2.28.0" --format csv +# Output: "pkg:pypi/requests@2.28.0","https://files.pythonhosted.org/..." +``` + +## Ecosystem-Specific Examples + +### NPM (Node.js) + +```bash +# Regular package +purl2src "pkg:npm/lodash@4.17.21" + +# Scoped package +purl2src "pkg:npm/@angular/core@14.0.0" + +# Beta version +purl2src "pkg:npm/typescript@4.8.0-beta" + +# With specific registry +purl2src "pkg:npm/express@4.17.1?registry=https://npm.pkg.github.com" +``` + +### PyPI (Python) + +```bash +# Standard package +purl2src "pkg:pypi/numpy@1.23.0" + +# Pre-release version +purl2src "pkg:pypi/scipy@1.9.0rc1" + +# Package with hyphen +purl2src "pkg:pypi/django-rest-framework@3.13.0" +``` + +### Maven (Java) + +```bash +# Basic artifact +purl2src "pkg:maven/org.springframework/spring-core@5.3.20" + +# With classifier for sources +purl2src "pkg:maven/org.apache.commons/commons-lang3@3.12.0?classifier=sources" + +# With type specification +purl2src "pkg:maven/org.junit.jupiter/junit-jupiter@5.8.2?type=pom" + +# Android library +purl2src "pkg:maven/com.google.android.material/material@1.6.0" +``` + +### Cargo (Rust) + +```bash +# Popular crates +purl2src "pkg:cargo/serde@1.0.140" +purl2src "pkg:cargo/tokio@1.20.0" +purl2src "pkg:cargo/async-trait@0.1.56" +``` + +### NuGet (.NET) + +```bash +# Microsoft packages +purl2src "pkg:nuget/Microsoft.Extensions.Logging@6.0.0" + +# Popular libraries +purl2src "pkg:nuget/Newtonsoft.Json@13.0.1" +purl2src "pkg:nuget/AutoMapper@11.0.0" +``` + +### GitHub + +```bash +# Release by tag +purl2src "pkg:github/facebook/react@v18.2.0" + +# Specific commit +purl2src "pkg:github/torvalds/linux@5f9e832c1370" + +# Branch reference +purl2src "pkg:github/nodejs/node@main" +``` + +### RubyGems + +```bash +# Rails framework +purl2src "pkg:gem/rails@7.0.3" + +# Popular gems +purl2src "pkg:gem/devise@4.8.1" +purl2src "pkg:gem/sidekiq@6.5.0" +``` + +### Go Modules + +```bash +# Standard library extension +purl2src "pkg:golang/golang.org/x/crypto@v0.0.0-20220622213112-05595931fe9d" + +# Popular frameworks +purl2src "pkg:golang/github.com/gin-gonic/gin@v1.8.1" +purl2src "pkg:golang/github.com/gorilla/mux@v1.8.0" +``` + +### Conda + +```bash +# With channel specification +purl2src "pkg:conda/pandas@1.4.3?channel=conda-forge&subdir=linux-64" + +# With build string +purl2src "pkg:conda/tensorflow@2.9.1?channel=anaconda&build=gpu_py39h8c0d9a2_0" +``` + +## Batch Processing + +### File-Based Processing + +Create a file `purls.txt`: +```text +pkg:npm/express@4.17.1 +pkg:npm/@angular/core@14.0.0 +pkg:pypi/django@4.0.0 +pkg:pypi/requests@2.28.0 +pkg:maven/org.springframework.boot/spring-boot@2.7.0 +pkg:cargo/serde@1.0.140 +pkg:gem/rails@7.0.3 +pkg:golang/github.com/gin-gonic/gin@v1.8.1 +``` + +Process the file: +```bash +# Basic processing +purl2src -f purls.txt + +# With validation and JSON output +purl2src -f purls.txt --validate --format json -o results.json + +# CSV format for spreadsheet import +purl2src -f purls.txt --format csv -o results.csv +``` + +### Shell Script for Downloading + +```bash +#!/bin/bash +# download_packages.sh + +OUTPUT_DIR="packages" +mkdir -p "$OUTPUT_DIR" + +while IFS= read -r purl; do + echo "Processing: $purl" + + # Get download URL + url=$(purl2src "$purl" | awk '{print $3}') + + if [ ! -z "$url" ]; then + # Extract filename from URL + filename=$(basename "$url") + + # Download the package + wget -q "$url" -O "$OUTPUT_DIR/$filename" + echo " Downloaded: $filename" + else + echo " Failed to resolve" + fi +done < purls.txt + +echo "Downloads complete. Files in $OUTPUT_DIR/" +``` + +### Python Script for Batch Processing + +```python +#!/usr/bin/env python3 +"""batch_resolver.py - Resolve and download packages""" + +import json +import subprocess +import requests +from pathlib import Path + +def resolve_and_download(purls_file, output_dir): + output_dir = Path(output_dir) + output_dir.mkdir(exist_ok=True) + + # Resolve all PURLs + result = subprocess.run( + ["purl2src", "-f", purls_file, "--format", "json"], + capture_output=True, + text=True + ) + + packages = json.loads(result.stdout) + + for package in packages: + if package.get("download_url"): + print(f"Downloading {package['name']}@{package['version']}...") + + # Download file + response = requests.get(package["download_url"]) + filename = package["download_url"].split("/")[-1] + + # Save file + output_file = output_dir / filename + output_file.write_bytes(response.content) + + print(f" Saved: {filename}") + +if __name__ == "__main__": + resolve_and_download("purls.txt", "downloads") +``` + +## Integration Workflows + +### CI/CD Pipeline Integration + +#### GitHub Actions + +```yaml +name: Download Dependencies + +on: + push: + paths: + - 'dependencies.txt' + +jobs: + download: + runs-on: ubuntu-latest + + steps: + - uses: actions/checkout@v3 + + - name: Install purl2src + run: pip install purl2src + + - name: Resolve PURLs + run: | + purl2src -f dependencies.txt \ + --validate \ + --format json \ + -o resolved-urls.json + + - name: Download packages + run: | + mkdir -p packages + cat resolved-urls.json | \ + jq -r '.[] | .download_url' | \ + xargs -I {} wget {} -P packages/ + + - name: Upload artifacts + uses: actions/upload-artifact@v3 + with: + name: packages + path: packages/ +``` + +#### Jenkins Pipeline + +```groovy +pipeline { + agent any + + stages { + stage('Setup') { + steps { + sh 'pip install purl2src' + } + } + + stage('Resolve Dependencies') { + steps { + sh ''' + purl2src -f dependencies.txt \ + --validate \ + --format json \ + -o resolved.json + ''' + } + } + + stage('Download Packages') { + steps { + sh ''' + mkdir -p packages + cat resolved.json | \ + jq -r '.[] | .download_url' | \ + while read url; do + wget "$url" -P packages/ + done + ''' + } + } + + stage('Archive') { + steps { + archiveArtifacts artifacts: 'packages/*' + } + } + } +} +``` + +### Docker Integration + +```dockerfile +# Dockerfile +FROM python:3.9-slim + +# Install purl2src +RUN pip install purl2src + +# Copy PURLs list +COPY purls.txt /app/ + +WORKDIR /app + +# Resolve and download packages +RUN purl2src -f purls.txt --format json -o resolved.json && \ + mkdir -p packages && \ + apt-get update && apt-get install -y wget jq && \ + cat resolved.json | jq -r '.[] | .download_url' | \ + xargs -I {} wget {} -P packages/ + +# Continue with your application setup +COPY . /app +``` + +### Makefile Integration + +```makefile +# Makefile +.PHONY: deps download-deps clean-deps update-deps + +DEPS_FILE = dependencies.txt +DEPS_DIR = vendor + +# Resolve and download dependencies +deps: $(DEPS_DIR) + @echo "Dependencies up to date" + +$(DEPS_DIR): $(DEPS_FILE) + @echo "Resolving dependencies..." + @purl2src -f $(DEPS_FILE) --validate --format json -o resolved.json + @echo "Downloading packages..." + @mkdir -p $(DEPS_DIR) + @cat resolved.json | jq -r '.[] | .download_url' | \ + xargs -n1 -I {} sh -c 'wget -q {} -P $(DEPS_DIR)/ && echo " Downloaded: $$(basename {})"' + @touch $(DEPS_DIR) + +# Update dependency URLs +update-deps: + @purl2src -f $(DEPS_FILE) --validate --format json -o resolved.json + @echo "Updated resolved.json" + +# Clean downloaded dependencies +clean-deps: + @rm -rf $(DEPS_DIR) resolved.json + @echo "Cleaned dependencies" + +# Download specific ecosystem +download-npm: + @grep "pkg:npm" $(DEPS_FILE) | \ + purl2src -f - --format json | \ + jq -r '.[] | .download_url' | \ + xargs -I {} wget {} -P $(DEPS_DIR)/npm/ + +download-pypi: + @grep "pkg:pypi" $(DEPS_FILE) | \ + purl2src -f - --format json | \ + jq -r '.[] | .download_url' | \ + xargs -I {} wget {} -P $(DEPS_DIR)/pypi/ +``` + +## Advanced Scenarios + +### Mirror Creation + +```bash +#!/bin/bash +# create_mirror.sh - Create local package mirror + +MIRROR_DIR="/var/packages/mirror" +PURLS_FILE="all-dependencies.txt" + +# Create directory structure +mkdir -p "$MIRROR_DIR"/{npm,pypi,maven,cargo,gem} + +# Process each ecosystem separately +for ecosystem in npm pypi maven cargo gem; do + echo "Processing $ecosystem packages..." + + grep "pkg:$ecosystem" "$PURLS_FILE" | \ + purl2src -f - --validate --format json | \ + jq -r '.[] | .download_url' | \ + while read url; do + filename=$(basename "$url") + wget -q "$url" -O "$MIRROR_DIR/$ecosystem/$filename" + echo " $ecosystem/$filename" + done +done + +# Create index +find "$MIRROR_DIR" -type f -name "*" > "$MIRROR_DIR/index.txt" +echo "Mirror created with $(wc -l < $MIRROR_DIR/index.txt) packages" +``` + +### License Compliance Check + +```python +#!/usr/bin/env python3 +"""compliance_check.py - Download and check licenses""" + +import json +import subprocess +import tempfile +import zipfile +from pathlib import Path + +def check_package_license(purl): + # Resolve PURL + result = subprocess.run( + ["purl2src", purl, "--format", "json"], + capture_output=True, + text=True + ) + + package_info = json.loads(result.stdout)[0] + + # Download package + with tempfile.NamedTemporaryFile(suffix=".tar.gz") as tmp: + subprocess.run(["wget", "-q", package_info["download_url"], "-O", tmp.name]) + + # Extract and look for license + # (simplified - actual implementation would handle various formats) + license_found = "Unknown" + + # Run ossnotices on extracted content + result = subprocess.run( + ["ossnotices", tmp.name, "--format", "json"], + capture_output=True, + text=True + ) + + if result.returncode == 0: + notices = json.loads(result.stdout) + if notices.get("packages"): + license_found = notices["packages"][0].get("license", "Unknown") + + return { + "package": f"{package_info['name']}@{package_info['version']}", + "license": license_found + } + +# Check all packages +with open("purls.txt") as f: + purls = [line.strip() for line in f] + +results = [] +for purl in purls: + print(f"Checking {purl}...") + results.append(check_package_license(purl)) + +# Report +print("\nLicense Report:") +print("-" * 40) +for result in results: + print(f"{result['package']}: {result['license']}") +``` + +### Dependency Graph Building + +```python +#!/usr/bin/env python3 +"""dep_graph.py - Build dependency graph from PURLs""" + +import json +import subprocess +import networkx as nx +import matplotlib.pyplot as plt + +def resolve_purl(purl): + result = subprocess.run( + ["purl2src", purl, "--format", "json"], + capture_output=True, + text=True + ) + return json.loads(result.stdout)[0] + +# Build graph +G = nx.DiGraph() + +# Read main dependencies +with open("purls.txt") as f: + main_deps = [line.strip() for line in f] + +# Add nodes +for purl in main_deps: + info = resolve_purl(purl) + node_id = f"{info['name']}@{info['version']}" + G.add_node(node_id, ecosystem=info['ecosystem']) + +# Add edges (simplified - actual deps would come from package metadata) +# This is just for visualization +if len(G.nodes) > 1: + nodes = list(G.nodes) + for i in range(len(nodes) - 1): + G.add_edge("root", nodes[i]) + +# Visualize +pos = nx.spring_layout(G) +nx.draw(G, pos, with_labels=True, node_color='lightblue', + node_size=1000, font_size=8, arrows=True) +plt.savefig("dependency_graph.png") +print("Dependency graph saved as dependency_graph.png") +``` + +### Automated Updates + +```bash +#!/bin/bash +# check_updates.sh - Check for package updates + +PURLS_FILE="dependencies.txt" +UPDATES_FILE="available_updates.txt" + +> "$UPDATES_FILE" + +while IFS= read -r purl; do + # Extract package and version + pkg=$(echo "$purl" | sed 's/@[^@]*$//') + current_version=$(echo "$purl" | sed 's/.*@//') + + # Get latest version (simplified - would need registry queries) + latest_purl="${pkg}@latest" + + echo "Checking $pkg..." + + # Try to resolve latest + if latest_url=$(purl2src "$latest_purl" 2>/dev/null | awk '{print $3}'); then + if [ ! -z "$latest_url" ]; then + echo "$purl -> $latest_purl" >> "$UPDATES_FILE" + fi + fi +done < "$PURLS_FILE" + +if [ -s "$UPDATES_FILE" ]; then + echo "Updates available:" + cat "$UPDATES_FILE" +else + echo "All packages up to date" +fi +``` + +## See Also + +- [User Guide](user-guide.md) - Complete usage documentation +- [API Reference](api.md) - Python API documentation +- [SEMCL.ONE Integration](https://semcl.one) - Ecosystem tools \ No newline at end of file diff --git a/docs/user-guide.md b/docs/user-guide.md new file mode 100644 index 0000000..7b21282 --- /dev/null +++ b/docs/user-guide.md @@ -0,0 +1,352 @@ +# PURL2SRC - User Guide + +## Table of Contents +- [Introduction](#introduction) +- [Installation](#installation) +- [Understanding PURLs](#understanding-purls) +- [Basic Usage](#basic-usage) +- [Advanced Features](#advanced-features) +- [Supported Ecosystems](#supported-ecosystems) +- [Troubleshooting](#troubleshooting) + +## Introduction + +PURL2SRC translates Package URLs (PURLs) into validated download URLs for source code artifacts. It provides a reliable way to retrieve source code across multiple package ecosystems. + +### Key Concepts + +- **PURL (Package URL)**: A standardized format for identifying packages across ecosystems +- **Download URL**: Direct link to download the source code artifact +- **Resolution Strategy**: Three-level approach to find download URLs + +## Installation + +### From PyPI + +```bash +pip install purl2src +``` + +### From Source + +```bash +git clone https://github.com/SemClone/purl2src.git +cd purl2src +pip install -e . +``` + +### Verify Installation + +```bash +purl2src --version +``` + +## Understanding PURLs + +### PURL Format + +A PURL follows this general format: +``` +pkg:ECOSYSTEM/NAMESPACE/NAME@VERSION?QUALIFIERS#SUBPATH +``` + +Examples: +- `pkg:npm/express@4.17.1` - NPM package +- `pkg:pypi/django@4.0.0` - Python package +- `pkg:maven/org.apache.commons/commons-lang3@3.12.0` - Maven package +- `pkg:github/facebook/react@v18.0.0` - GitHub repository + +### Components + +- **Ecosystem**: Package type (npm, pypi, maven, etc.) +- **Namespace**: Optional grouping (e.g., org.apache.commons) +- **Name**: Package name +- **Version**: Package version +- **Qualifiers**: Optional key-value pairs +- **Subpath**: Optional path within package + +## Basic Usage + +### Single PURL Resolution + +```bash +# Basic usage - returns download URL +purl2src "pkg:npm/express@4.17.1" + +# Output: +# pkg:npm/express@4.17.1 -> https://registry.npmjs.org/express/-/express-4.17.1.tgz +``` + +### With Validation + +```bash +# Verify the download URL is accessible +purl2src "pkg:pypi/requests@2.28.0" --validate +``` + +### Different Output Formats + +```bash +# JSON output +purl2src "pkg:npm/express@4.17.1" --format json + +# CSV output +purl2src "pkg:npm/express@4.17.1" --format csv +``` + +## Advanced Features + +### Batch Processing + +Create a file with multiple PURLs: + +```text +# purls.txt +pkg:npm/express@4.17.1 +pkg:pypi/django@4.0.0 +pkg:maven/org.apache.commons/commons-lang3@3.12.0 +``` + +Process the file: + +```bash +# Process batch with default text output +purl2src -f purls.txt + +# Save results to JSON file +purl2src -f purls.txt --format json --output results.json + +# Save as CSV +purl2src -f purls.txt --format csv --output results.csv +``` + +### Resolution Strategy + +PURL2SRC uses a three-level resolution strategy: + +1. **Direct URL Construction**: Uses known patterns for each ecosystem +2. **Registry API Queries**: Queries package registries for metadata +3. **Local Fallback**: Uses local package managers if available + +### Validation Options + +```bash +# Validate all URLs (slower but ensures accessibility) +purl2src -f purls.txt --validate + +# Skip validation for faster processing +purl2src -f purls.txt --no-validate +``` + +## Supported Ecosystems + +### NPM (Node.js) + +```bash +# Regular package +purl2src "pkg:npm/express@4.17.1" + +# Scoped package +purl2src "pkg:npm/@angular/core@12.0.0" + +# With dist-tag +purl2src "pkg:npm/react@latest" +``` + +### PyPI (Python) + +```bash +# Regular package +purl2src "pkg:pypi/requests@2.28.0" + +# With classifier +purl2src "pkg:pypi/numpy@1.23.0" +``` + +### Maven (Java) + +```bash +# Basic artifact +purl2src "pkg:maven/org.apache.commons/commons-lang3@3.12.0" + +# With classifier +purl2src "pkg:maven/org.apache.xmlgraphics/batik-anim@1.9.1?classifier=sources" + +# With type +purl2src "pkg:maven/org.springframework/spring-core@5.3.20?type=jar" +``` + +### Cargo (Rust) + +```bash +purl2src "pkg:cargo/serde@1.0.140" +purl2src "pkg:cargo/tokio@1.20.0" +``` + +### NuGet (.NET) + +```bash +purl2src "pkg:nuget/Newtonsoft.Json@13.0.1" +purl2src "pkg:nuget/Microsoft.Extensions.Logging@6.0.0" +``` + +### GitHub + +```bash +# Release tag +purl2src "pkg:github/facebook/react@v18.0.0" + +# Commit hash +purl2src "pkg:github/torvalds/linux@5f9e832c1370" +``` + +### RubyGems + +```bash +purl2src "pkg:gem/rails@7.0.0" +purl2src "pkg:gem/bundler@2.3.0" +``` + +### Go Modules + +```bash +purl2src "pkg:golang/github.com/gin-gonic/gin@v1.8.0" +purl2src "pkg:golang/golang.org/x/net@v0.0.0-20220127200216-cd36cc0744dd" +``` + +### Conda + +```bash +# With channel and subdir +purl2src "pkg:conda/numpy@1.23.0?channel=conda-forge&subdir=linux-64" + +# With build string +purl2src "pkg:conda/python@3.9.0?build=h1234567_0" +``` + +### Generic + +```bash +# With explicit download URL +purl2src "pkg:generic/mypackage@1.0.0?download_url=https://example.com/mypackage-1.0.0.tar.gz" + +# With checksum validation +purl2src "pkg:generic/mypackage@1.0.0?download_url=https://example.com/pkg.tar.gz&checksum=sha256:abcd1234..." +``` + +## Troubleshooting + +### Common Issues + +#### Invalid PURL Format + +**Error**: `Invalid PURL format` + +**Solution**: Ensure your PURL follows the correct format: +- Starts with `pkg:` +- Has ecosystem type +- Includes package name +- Has version with `@` + +#### Package Not Found + +**Error**: `Package not found in registry` + +**Solutions**: +1. Verify package name and version exist +2. Check ecosystem type is correct +3. Try without validation flag + +#### Network Issues + +**Error**: `Connection timeout` + +**Solutions**: +1. Check internet connection +2. Try with `--timeout 60` for slower connections +3. Use `--no-validate` to skip URL verification + +#### Validation Failures + +**Error**: `URL validation failed` + +**Solutions**: +1. The package might be private or removed +2. Try different version +3. Skip validation with `--no-validate` + +### Debug Mode + +For detailed troubleshooting: + +```bash +# Enable verbose output +purl2src "pkg:npm/express@4.17.1" --verbose + +# With debug logging +PURL2SRC_DEBUG=1 purl2src "pkg:npm/express@4.17.1" +``` + +### Environment Variables + +```bash +# Set timeout +export PURL2SRC_TIMEOUT=60 + +# Set output format +export PURL2SRC_FORMAT=json + +# Enable debug mode +export PURL2SRC_DEBUG=1 +``` + +## Integration Examples + +### Shell Script + +```bash +#!/bin/bash +# download_sources.sh + +while IFS= read -r purl; do + url=$(purl2src "$purl" --no-validate | cut -d' ' -f3) + if [ ! -z "$url" ]; then + wget "$url" -P downloads/ + fi +done < purls.txt +``` + +### Makefile + +```makefile +download-deps: + @mkdir -p sources + @purl2src -f purls.txt --format json | \ + jq -r '.[] | .download_url' | \ + xargs -I {} wget {} -P sources/ +``` + +### CI/CD Pipeline + +```yaml +# .github/workflows/download-sources.yml +- name: Download source packages + run: | + pip install purl2src + purl2src -f purls.txt --validate --output urls.txt + cat urls.txt | cut -d' ' -f3 | xargs -n1 wget -P sources/ +``` + +## Best Practices + +1. **Always validate in production**: Use `--validate` for critical workflows +2. **Cache results**: Save outputs to avoid repeated API calls +3. **Use batch processing**: More efficient than individual PURL resolution +4. **Handle failures gracefully**: Some packages might not resolve +5. **Keep PURLs updated**: Maintain accurate version information + +## See Also + +- [API Reference](api.md) - Python API documentation +- [Examples](examples.md) - More usage examples +- [PURL Specification](https://github.com/package-url/purl-spec) - Official PURL specification \ No newline at end of file