Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ jobs:
# your codebase is analyzed, see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/codeql-code-scanning-for-compiled-languages
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6

# Add any setup steps before running the `github/codeql-action/init` action.
# This includes steps like installing compilers or runtimes (`actions/setup-node`
Expand Down
114 changes: 113 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,119 @@ All notable changes to the pyUSPTO package will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.3.0] - TBD
## [0.4.5] - 2026-03-11

### Changed

- **Refactor**: Split `_make_request` into three typed request methods:
- `_get_model` — returns parsed model `T`
- `_get_json` — returns `dict[str, Any]`
- `_stream_request` — returns `requests.Response`
- Removed all `assert isinstance` runtime checks and `# type: ignore` annotations at 28 call sites
- `DocumentBag`, `StatusCodeSearchResponse`, and `PetitionDecisionDownloadResponse` now conform to `FromDictProtocol` (`include_raw_data` parameter added to `from_dict`)

### Fixed

- PTAB appeals integration test query (`Appeal` → `REGULAR` for `applicationTypeCategory`)
- Read the Docs build: updated Python from 3.10 to 3.14 to satisfy `myst-parser>=3.11` requirement

## [0.4.4] - 2026-03-08

### Added

- `get_patent(patent_number)` — lookup `PatentFileWrapper` by granted patent number
- `get_publication(publication_number)` — lookup by publication number
- `get_pct(pct_number)` — lookup by PCT application or publication number (auto-detected)

### Fixed

- `sanitize_application_number` now strips leading zeros from PCT serial numbers

## [0.4.3] - 2026-03-05

### Added

- `get_IFW` method for bulk downloading all documents in an application's IFW
- `IFWResult` model with document map and optional ZIP output
- `get_IFW_metadata` now populates `document_bag` on the returned `PatentFileWrapper`
- Example: searching CPC codes (#102)

### Fixed

- Auto-quote `classification_q` values containing spaces or slashes (#101)
- Missing `typing_extensions` dependency
- CI refactor to exclude dev requirements from install

## [0.4.2] - 2026-02-26

### Fixed

- Quote multi-word values in query convenience parameters

## [0.4.1] - 2026-02-25

### Added

- Download path validation and zip-bomb protection
- Sanitize download filenames to prevent path traversal
- Skip symlinks during archive extraction
- Session lifecycle and extraction safety documentation

### Changed

- Enable retries for POST requests
- Remove unused `utils.http` module and `ALLOWED_METHODS`
- Enforce keyword-only arguments in `get_IFW_metadata`
- Optimize tox deps and enable parallel

## [0.4.0] - 2026-02-23

### Changed

- **Refactor**: Centralize session management in `USPTOConfig`
- `FileData.file_date` changed to `datetime` type

### Fixed

- Prevent path traversal in archive extraction
- Fix #79

## [0.3.4] - 2026-01-11

### Added

- JSON parsing error handling
- Pagination validation
- Documentation for HTTP method restrictions and `include_raw_data` flag

### Changed

- Aligned backoff factor default

## [0.3.3] - 2026-01-08

### Changed

- Bulk data client refactor (#40)

## [0.3.2] - 2025-12-31

### Changed

- Refactor downloads (#35)
- Configurable download chunk size
- Session sharing across clients

## [0.3.1] - 2025-12-15

### Added

- `paginate_decisions` POST support
- Configurable download chunk size
- Session sharing across clients
- Bulk data endpoint updates

## [0.3.0] - 2025-12-09

### Added

Expand Down
155 changes: 49 additions & 106 deletions examples/bulk_data_example.py
Original file line number Diff line number Diff line change
@@ -1,25 +1,18 @@
"""Example usage of pyUSPTO for the BulkDataClient.
"""Example usage of pyUSPTO for bulk data products.

This example demonstrates how to use the BulkDataClient to interact with the USPTO Bulk Data API.
It shows how to search for products, retrieve product details, and download files.
Demonstrates the BulkDataClient for searching products, listing files,
and downloading bulk data archives.
"""

import os

from pyUSPTO.clients import BulkDataClient
from pyUSPTO.config import USPTOConfig
from pyUSPTO.models.bulk_data import FileData
from pyUSPTO import BulkDataClient, FileData, USPTOConfig

DEST_PATH = "./notes/download-example"

def format_size(size_bytes: int | float) -> str:
"""Format a size in bytes to a human-readable string (KB, MB, GB, etc.).

Args:
size_bytes: The size in bytes to format

Returns:
A human-readable string representation of the size
"""
def format_size(size_bytes: int | float) -> str:
"""Format a size in bytes to a human-readable string (KB, MB, GB, etc.)."""
if size_bytes == 0:
return "0 B"

Expand All @@ -29,39 +22,22 @@ def format_size(size_bytes: int | float) -> str:
size_bytes /= 1024
i += 1

# Round to 2 decimal places
return f"{size_bytes:.2f} {size_names[i]}"


# ============================================================================
# Client Initialization Methods
# ============================================================================

# Method 1: Initialize with USPTOConfig object
print("\nMethod 1: Initialize with USPTOConfig")
config = USPTOConfig(api_key="YOUR_API_KEY_HERE")
# --- Client Initialization ---
api_key = os.environ.get("USPTO_API_KEY", "YOUR_API_KEY_HERE")
if api_key == "YOUR_API_KEY_HERE":
raise ValueError(
"API key is not set. Set the USPTO_API_KEY environment variable."
)
config = USPTOConfig(api_key=api_key)
client = BulkDataClient(config=config)

# Method 2: Initialize from environment variables (recommended)
print("\nMethod 2: Initialize from environment variables")
os.environ["USPTO_API_KEY"] = "YOUR_API_KEY_HERE" # Set this outside your script
config_from_env = USPTOConfig.from_env()
client = BulkDataClient(config=config_from_env)

print("\n" + "=" * 60)
print("Beginning API requests with configured client")
print("=" * 60)

print("-" * 40)
print("Example 1: Search for products")
print("-" * 40)

# ============================================================================
# Example 1: Search for Products
# ============================================================================

print("\n--- Example 1: Search for Products ---")
# The Bulk Data API supports full-text search via the query parameter
# Field-specific queries (e.g., "productIdentifier:value") are not supported

# Search for patent-related products
response = client.search_products(query="patent", limit=5)
print(f"Found {response.count} products matching 'patent'")

Expand All @@ -72,30 +48,22 @@ def format_size(size_bytes: int | float) -> str:
print(f" Total files: {product.product_file_total_quantity}")
print(f" Total size: {format_size(product.product_total_file_size)}")

print("-" * 40)
print("Example 2: Paginate through products")
print("-" * 40)

# ============================================================================
# Example 2: Paginate Through All Products
# ============================================================================

print("\n--- Example 2: Paginate Through Products ---")
# Use pagination to iterate through all matching products

max_items = 20
count = 0
for product in client.paginate_products(query="trademark", limit=10):
count += 1
print(f" {count}. {product.product_title_text} ({product.product_identifier})")
if count >= 20: # Limit output for example
print(" ... (stopping after 20 products)")
if count >= max_items:
print(f" ... (stopping at {max_items} products)")
break


# ============================================================================
# Example 3: Get Product Details by ID
# ============================================================================

print("\n--- Example 3: Get Product by ID ---")
# Retrieve a specific product by its identifier
# Use include_files=True to get file listing
print("-" * 40)
print("Example 3: Get product by ID")
print("-" * 40)

product_id = "PTGRXML" # Patent Grant Full-Text Data (No Images) - XML
product = client.get_product_by_id(product_id, include_files=True, latest=True)
Expand All @@ -107,13 +75,9 @@ def format_size(size_bytes: int | float) -> str:
print(f"Categories: {product.product_dataset_category_array_text}")
print(f"Date range: {product.product_from_date} to {product.product_to_date}")


# ============================================================================
# Example 4: List Files for a Product
# ============================================================================

print("\n--- Example 4: List Files for a Product ---")
# Get product with files and display file details
print("-" * 40)
print("Example 4: List files for a product")
print("-" * 40)

if product.product_file_bag and product.product_file_bag.file_data_bag:
print(f"Found {len(product.product_file_bag.file_data_bag)} file(s):")
Expand All @@ -130,13 +94,9 @@ def format_size(size_bytes: int | float) -> str:
else:
print("No files found for this product")


# ============================================================================
# Example 5: Download a File
# ============================================================================

print("\n--- Example 5: Download a File ---")
# Download a file from the product
print("-" * 40)
print("Example 5: Download a file (with extraction)")
print("-" * 40)

min_file: FileData | None = None
last_bytes: float = float("inf")
Expand All @@ -151,40 +111,23 @@ def format_size(size_bytes: int | float) -> str:
print(f"Downloading smallest file: {min_file.file_name}")
print(f"Size: {format_size(min_file.file_size)}")

try:
# Download with extraction (default behavior for archives)
downloaded_path = client.download_file(
file_data=min_file,
destination="./downloads",
overwrite=True,
extract=True, # Auto-extract if it's a tar.gz or zip
)
print(f"SUCCESS: Downloaded to {downloaded_path}")
except Exception as e:
print(f"ERROR: {e}")
downloaded_path = client.download_file(
file_data=min_file,
destination=DEST_PATH,
overwrite=True,
extract=True,
)
print(f"Downloaded to {downloaded_path}")


# ============================================================================
# Example 6: Download Without Extraction
# ============================================================================

print("\n--- Example 6: Download Without Extraction ---")
# Download archive file without extracting
print("-" * 40)
print("Example 6: Download without extraction")
print("-" * 40)

if product.product_file_bag and product.product_file_bag.file_data_bag and min_file:
try:
# Download without extraction
downloaded_path = client.download_file(
file_data=min_file,
destination="./downloads",
overwrite=True,
extract=False, # Keep archive compressed
)
print(f"SUCCESS: Archive saved to {downloaded_path}")
except Exception as e:
print(f"ERROR: {e}")


print("\n" + "=" * 60)
print("Examples complete!")
print("=" * 60)
downloaded_path = client.download_file(
file_data=min_file,
destination=DEST_PATH,
overwrite=True,
extract=False,
)
print(f"Archive saved to {downloaded_path}")
Loading
Loading