Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ multiversion: Makefile
@$(BUILD) -M $@ "$(SOURCE)" "$(OUT)" $(OPTS)

enhance-topics:
git diff --name-only --diff-filter=d $(BASE_SHA) $(HEAD_SHA) | xargs -r $(PYTHON) scripts/enhance_topics.py
git diff --name-only --diff-filter=d HEAD | xargs -r $(PYTHON) scripts/enhance_topics.py

lint:
./sphinx-lint-with-ros source
Expand Down
1 change: 1 addition & 0 deletions conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@
'sphinx_adopters',
'sphinxcontrib.googleanalytics',
'sphinxcontrib.mermaid',
'short_description',
]

# Intersphinx mapping
Expand Down
32 changes: 32 additions & 0 deletions plugins/short_description.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
from __future__ import annotations

from docutils import nodes
from sphinx.util.docutils import SphinxDirective


class ShortDescriptionDirective(SphinxDirective):
"""Directive to render the short description of an article."""

has_content = True
required_arguments = 0
optional_arguments = 0
option_spec = {}

def run(self) -> list[nodes.Node]:
# Create a container node to hold the parsed content
node = nodes.container()
node['classes'].append('short-description')

# Parse the directive content into the container node
self.state.nested_parse(self.content, self.content_offset, node)

return [node]


def setup(app):
app.add_directive('short-description', ShortDescriptionDirective)
return {
'parallel_read_safe': True,
'parallel_write_safe': True,
'version': '0.1.0',
}
128 changes: 128 additions & 0 deletions scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Topic Enhancement Tools

This directory contains a suite of AI-powered tools designed to automatically enhance the ROS 2 documentation (`.rst` files) with high-quality metadata and descriptive content.

## Overview

The primary tool, `enhance_topics.py`, uses OpenAI's latest models to analyse technical articles and inject:
1. **SEO Metadata**: `description` and `keywords` fields within a `.. meta::` directive.
2. **Short Descriptions**: A concise summary paragraph injected into the custom `.. short-description::` directive, using Retrieval-Augmented Generation (RAG) to match the project's style.

## Orchestration Logic

The execution follows a top-to-bottom flow through several key layers:

### 1. Entry Point: `main()`
The execution starts in the `main()` function, which handles the high-level setup:
- **Logging**: Configures standard logging and silences noisy HTTP libraries (`httpx`, `httpcore`).
- **Argument Parsing**: Collects file paths from the command line and filters for `.rst` files.
- **Client Setup**: Initialises the `OpenAI` client by loading the API key from environment variables or a `.env` file.
- **Orchestration**: Executes the two main enhancement phases: `enhance_metadata()` and `enhance_short_descriptions()`.
- **Metrics**: Calculates and logs a summary of how many files were processed, how many had valid results, and how many were actually updated.

### 2. Phase 1: Metadata Enhancement (`enhance_metadata`)
This phase focuses on the `.. meta::` block:
- **Task Definition**: Creates two `EnhancementTask` objects—one for `description` and one for `keywords`.
- **Skip Logic**: Each task includes a `should_skip` check that reads the file content to see if these metadata fields already exist.
- **Analysis**: Calls `analyze_files()`, which uploads the file to OpenAI and runs the analysis.
- **Application**: Calls `update_meta_files()`, which uses the `MetadataApplyHook` to merge the new metadata into the existing (or new) `.. meta::` block in the RST file.

### 3. Phase 2: Short Description Enhancement (`enhance_short_descriptions`)
This phase uses Retrieval-Augmented Generation (RAG):
- **Vector Store Setup**: Uploads a set of "gold standard" example RST files to an OpenAI Vector Store. This allows the model to "search" for examples of good short descriptions to match the project's style.
- **Task Definition**: Creates a task for `short-description`.
- **Skip Logic**: Skips if the file already contains a `.. short-description::` directive.
- **Analysis**: Calls `analyze_files()`, where the analysis function (`analyze_with_responses`) includes the `vector_store_id` to enable the file search capability.
- **Application**: Uses the `ShortDescriptionApplyHook` to inject the generated text into the file.
- **Cleanup**: A `finally` block ensures the temporary vector store and hosted files are deleted from OpenAI.

### 4. Core Processing Engines

#### `analyze_files`
This is the central engine for interacting with the AI:
1. **File Upload**: Uploads the target RST file to OpenAI's Files API.
2. **Task Execution**: For every task that isn't skipped:
- Calls the task's `analyze` function (wrapped in `tenacity` retries).
- **Validation**: If a result is returned, it runs `validate_content()`, which performs a **Moderation check** (safety) and a **Language check** (ensuring the LLM actually returned English).
3. **Storage**: Valid results are stored in an `EnhanceData` object.
4. **Cleanup**: Deletes the uploaded file from OpenAI.

#### `update_enhanced_files`
This handles the file I/O and content modification:
1. **Hook Application**: Passes the file content and the AI results to an `ApplyHook` (either `MetadataApplyHook` or `ShortDescriptionApplyHook`).
2. **Regex Injection**: The hooks use utility functions (from `rst_utils.py`) to perform precise regex-based injection of the new content.
3. **Persistence**: If the content has changed, it overwrites the file and marks it as "updated" in the metrics.

### 5. Key Abstractions
- **`EnhancementTask`**: Bundles the "what" (key), "when to skip" (logic), and "how to analyze" (API call).
- **`ApplyHook`**: A strategy pattern for defining how different types of AI results should be written back to the RST format.
- **`EnhanceData`**: A state-tracking object that carries results and metrics through the various stages of the pipeline.

## Workflow Diagrams

### High-Level Orchestration
```mermaid
graph TD
A[Start: main] --> B[Configure Logging & Client]
B --> C[Filter .rst Files]
C --> D[Phase 1: enhance_metadata]
D --> E[Phase 2: enhance_short_descriptions]
E --> F[Calculate & Log Metrics]
F --> G[End]

subgraph "Per File Processing"
D1[Check existing meta] --> D2[Upload to OpenAI]
D2 --> D3[Analyze & Validate]
D3 --> D4[Inject via Regex]
end
```

### Analysis & Validation Loop
```mermaid
sequenceDiagram
participant S as Script
participant O as OpenAI API
participant F as Filesystem

S->>O: Upload File (purpose=user_data)
S->>O: Create Response (Analysis Task)
O-->>S: Raw Content
S->>O: Moderation Check
S->>O: English Language Check
alt Valid
S->>F: Write updated RST
else Invalid
S->>S: Log Warning (Skip)
end
S->>O: Delete Uploaded File
```

## User Guide

### Prerequisites
1. **API Key**: Ensure you have an `OPENAI_API_KEY` set in your environment or in a `.env` file in the repository root.
2. **Dependencies**: Install the required Python packages:
```bash
pip install -r requirements.txt -c constraints.txt
```

### Running via Makefile (Recommended)
The simplest way to process your recent changes is via the `Makefile`. This command automatically identifies files that have been modified or staged in Git and runs the enhancement script on them.

```bash
make enhance-topics
```

### Running Directly
You can call the script directly to process specific files or directories:

```bash
# Process a single file
python scripts/enhance_topics.py source/path/to/article.rst

# Process multiple files
python scripts/enhance_topics.py file1.rst file2.rst
```

### Configuration
Tuning constants, such as the model version (`gpt-5.4-nano`), timeouts, and prompt strings, are centrally managed in `scripts/config.py`.
73 changes: 73 additions & 0 deletions scripts/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
"""
Central configuration for the enhancement scripts.

Holds tuning constants and prompt strings used by ``enhance_topics`` and
``openai_retrieval``. Kept as a leaf module (no imports from sibling scripts)
so it can be imported freely without risk of circular dependencies.
"""

# Define constants
GPT_MODEL = "gpt-5.4-nano" # GPT model to use for the API calls
# Maximum content length in characters, approximately 300k tokens (leaving 100k for instructions/output)
MAX_CONTENT_LENGTH = 1_200_000
RST_EXTENSION = '.rst' # File extension for RST files

# Define timeout and retry parameters for API calls
# - Individual API calls timeout after DEFAULT_TIMEOUT seconds
# - On rate limits/connection errors, retry up to MAX_RETRIES times
# - Wait between retries, increasing exponentially: MIN_WAIT → MAX_WAIT (capped)
DEFAULT_TIMEOUT = 30 # Default timeout in seconds for an individual API call
MAX_RETRIES = 10 # Maximum number of retry attempts for exponential backoff
MIN_WAIT = 10 # Minimum wait time between retries in seconds
MAX_WAIT = 120 # Maximum wait time between retries in seconds

# Responses API tuning (used by openai_retrieval for short descriptions)
# Maximum wall-clock time for one article: file upload plus responses.create
RESPONSE_TIMEOUT = 120

# Example RST paths (relative to repository root) indexed into the vector store for file_search
SHORT_DESCRIPTION_EXAMPLE_PATHS = [
"source/About-ROS.rst",
"source/First-Steps.rst",
"source/Concepts/Basic/Interfaces-Topics-Services-Actions.rst",
]

# Define prompts for the AI model

SHORT_DESCRIPTION_PROMPT = """You are a technical author, and your role is to analyze RST content within supplied documents, and then create new, supplementary content for a new draft article based on this analysis.

## Examples
Use file_search to read through the following RST files in their entirety as examples of completed articles:

{"\n".join(f"- {path.split('/')[-1]}" for path in SHORT_DESCRIPTION_EXAMPLE_PATHS)}

## Short Description
For each article in this set of examples, analyse the content associated with the "short-description" directive, and what it constitutes in relation to the article it describes.
For example, in the First-Steps article, the 3 sentences which begin as follows comprise the specified short description:

* "Interfaces in ROS..."
* "This article explains the..."
* "With this information..."

This short description content does not include the single line of text commencing with "**Area...", or the "contents" (Table of Contents) directive.

When you have identified the short description in all example articles, remember the formatting and how the paragraph is constructed, including tone/style and length. We call this the article Short Description.

Finally, generate the short description for the new article given in the attached article file, with no additional styling, characters, or formatting. Each sentence must start on a new line.
"""

KEYWORDS_PROMPT = """You are a content analyst, and your role is to analyze text content within supplied documents.

Your role is to extract 3 to 5 keywords from the content for use in metadata. The keywords should be single words that are the most important and relevant words to the content topic.

Finally, generate a comma-separated list of these keywords, in lowercase, with no additional styling, characters, or formatting."""

DESCRIPTION_PROMPT = """You are a content analyst, and your role is to analyze text content within supplied documents.

Your role is to create a concise description of the content for use in metadata. The description should be a single sentence (of a maximum of 130 characters) that captures the main idea of the content.

Finally, generate this description, with no additional styling, characters, or formatting."""

ENGLISH_LANGUAGE_CHECK_PROMPT = """You are a validation assistant, and your role is to determine whether the following text is written entirely in English. Common technical terms, acronyms, and internationally recognised proper nouns are acceptable if they are normally used in English technical documentation.

Answer ONLY with the single word yes or no in lowercase, with no punctuation, explanation, or additional text."""
7 changes: 5 additions & 2 deletions scripts/enhance_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,11 @@ def add_analysis_result(data: EnhanceData, filename: str, analysis_type: str, re
Returns:
New EnhanceData with the result added.
"""
new_results = {**data.results} # Shallow copy: replace one filename entry immutably
file_results = {**new_results.get(filename, {})} # Preserve other analysis keys for this file

# Creates a new EnhanceData object with the analysis result added for the given file and analysis type,
# making copies so that original data is not changed (keeping EnhanceData immutable).
new_results = {**data.results}
file_results = {**new_results.get(filename, {})}
file_results[analysis_type] = result
new_results[filename] = file_results
return EnhanceData(results=new_results, updated_files=data.updated_files) # ``updated_files`` unchanged here
Expand Down
Loading
Loading