Skip to content

Update the gallery pages to use jekyll templates#5

Open
dstopp wants to merge 11 commits intomainfrom
4_galleries_templates
Open

Update the gallery pages to use jekyll templates#5
dstopp wants to merge 11 commits intomainfrom
4_galleries_templates

Conversation

@dstopp
Copy link
Copy Markdown
Collaborator

@dstopp dstopp commented Jan 27, 2026

  • Modernize the gallery pages to create jekyll yaml files instead of html.
  • Create a template for the jekyll yaml files that creates gallery pages, using modern and accessible html-
  • Upgrade code to python3; note that changes to creating individual press release pages are included in this but not necessarily working, as we have decided not to continue to maintain those pages.
  • Add requirements file.
  • Update readme.md file to reflect changes

Summary by CodeRabbit

  • Documentation

    • Added Python 3 install instructions; marked legacy gallery workflow DEPRECATED/LEGACY and pointed to modern catalog; reorganized legacy/unsupported workflow guidance.
  • New Features

    • Gallery/page output now uses YAML front-matter for templated site rendering; optional resolution of Photojournal final URLs for thumbnails/links.
  • Bug Fixes

    • Prevented iteration/mutation errors and improved URL resolution, parsing, and navigation output.
  • Refactor

    • Modernized codebase for Python 3 (I/O/encoding, iteration, imports, string handling).
  • Chores

    • Added requirements manifest.
  • Removal

    • Removed legacy Julian time utilities.

…art of the template

Upgrade all supporting code to python3; note that changes to creating individual press release pages are included in this but not necessary working, as we have decided not to continue to maintain those pages.

Also included a requirements file.
@dstopp dstopp requested a review from rfrenchseti January 27, 2026 21:37
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 27, 2026

Walkthrough

Migrated repository to Python 3, added a requirements manifest, replaced HTML gallery outputs with YAML-front-matter Jekyll pages (with redirect resolution), refactored PiaPage parsing/IO, modernized imports/iteration/encoding across modules, and removed the julian.py time utilities module.

Changes

Cohort / File(s) Summary
Docs & deps
README.md, requirements.txt
Added Python 3 installation instructions; removed Python 2 references; marked PiaPage workflow DEPRECATED/LEGACY; added requirements.txt with pinned packages.
dicts package
dicts/__init__.py, dicts/TARGET_FULL_NAMES.py
Switched absolute imports to relative imports; iterate over list(...) / convert dict views to lists to allow safe in-loop mutation when adding lowercased keys.
Gallery generation core
galleries.py, gallerypage.py
Replaced HTML-generation with YAML front-matter outputs for Jekyll (uses FixIndent dumper); added Photojournal URL redirect resolution; introduced gallery/Jekyll path constants; removed prior neighbor/navigation/table HTML logic; multiple Python 3 I/O/iteration updates.
piapage core
piapage/__init__.py, piapage/piapage_galleries.py
Reworked parsing to use different BeautifulSoup subtrees, switched to UTF‑8/binary I/O, replaced iteritems()items(), renamed get_caption_and_background()description_and_background(), and changed local page URL generation to use new gallery constants; sorting key logic for targets updated.
piapage scripts & utilities
piapage/make_COLOR_VS_PIAPAGE.py, piapage/random-scripts/fix_binary_text.py, piapage/repairs.py
Modernized shebangs and print() calls, enforced UTF‑8 file I/O, adjusted dict/key handling for Python 3, switched regexes to raw strings, and changed membership-based normalization logic in repairs.
Stored page / catalog I/O
storedpage.py
Converted dict iteration to .items() and switched pickle serialization to binary read/write modes; behavior preserved aside from Python 3 compatibility.
Removed module — time utilities
julian.py
Deleted entire julian.py module — removes all Julian/time-system utilities, leapsecond logic, ISO parsing/formatting, and associated tests/APIs.
Misc compatibility & I/O
piapage/..., piapage/random-scripts/...
Replaced StringIO with io.BytesIO, switched various file reads/writes to UTF‑8/binary modes, updated BeautifulSoup imports, and applied multiple Python 2→3 compatibility fixes across files.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.90% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: updating gallery pages to use Jekyll templates, which aligns with the primary objective of replacing HTML generation with Jekyll YAML files and templates.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
piapage/repairs.py (1)

25-41: Conditions are always False: Magellan repair logic is dead code.

missions is a list, so missions == 'Magellan' will never be True. Both Magellan-related code blocks (host reordering at lines 27-33 and instrument remapping at lines 37-41) will never execute. The commented-out original code correctly used missions[0] == 'Magellan' to check the primary mission.

🐛 Proposed fix to restore primary mission checks
-    `#if` missions[0] == 'Magellan' and len(hosts) > 1:
-    if missions == 'Magellan' and len(hosts) > 1:
+    if missions[0] == 'Magellan' and len(hosts) > 1:
         if 'Magellan' in hosts[1:]:
             hosts.remove('Magellan')
             hosts = ['Magellan'] + hosts

         if 'Arecibo Observatory' in hosts:
             if 'Arecibo' in instruments[0]:
                 instruments = instruments[1:] + [instruments[0]]

-    `#if` missions[0] == 'Magellan' and 'Radar System' in instruments:
-    if missions == 'Magellan' and 'Radar System' in instruments:
+    if missions[0] == 'Magellan' and 'Radar System' in instruments:
         k = instruments.index('Radar System')
         instruments[k] = 'Imaging Radar'
piapage/random-scripts/fix_binary_text.py (1)

1-15: Replace eval() with ast.literal_eval().
Line 13 uses eval() to parse file content, which executes arbitrary code if the input is compromised. Since the file contains bytes literals (e.g., b'...'), use ast.literal_eval() instead for safe parsing.

🛠️ Proposed fix
import os
import sys
+import ast

for arg in sys.argv[1:]:
    if arg[-12:-9] != 'PIA': continue
    if arg[-4:] != '.txt': continue

    with open(arg, 'r', encoding='utf-8') as f:
        content = f.read()

    if content[:2] != "b'": continue

-    text = eval(content)
+    text = ast.literal_eval(content)

    with open(arg, 'w', encoding='utf-8') as f:
🤖 Fix all issues with AI agents
In `@galleries.py`:
- Around line 34-51: The URLError handler in get_final_NASA_url currently
returns a string like "Error: {e.reason}" which ends up as an invalid href;
change the except block to log or print the error (including e and/or e.reason)
and then return the original url argument so callers receive a valid href (i.e.,
keep the error output to stdout but ensure get_final_NASA_url returns url on
failure).

In `@gallerypage.py`:
- Around line 60-63: The FixIndent class (subclassing yaml.Dumper with
overridden increase_indent) is duplicated in gallerypage.py and galleries.py;
extract this helper into a single shared utility module (e.g.,
utils/yaml_helpers or similar) and import FixIndent from there in both
gallerypage.py and galleries.py, keeping the implementation of FixIndent (class
FixIndent(yaml.Dumper) and its increase_indent override) exactly once to prevent
divergence and ensure both files reference the same symbol.
- Around line 1072-1089: The local links and icon paths written in GalleryPage's
output omit the site baseurl which breaks subpath deployments; update the
f.write calls that use self.local_medium_url (the medium image link in the first
big image block) and the movie icon string construction (the icon variable where
movie icon path is set) to prefix the URL/path with "{{ site.baseurl }}" like
the existing use for self.local_small_url and self.local_medium_url earlier so
all local assets use the site baseurl consistently.

In `@piapage/__init__.py`:
- Around line 179-188: Uncomment and restore initialization of self.article_soup
and self.section_soup so they are assigned before calling
description_and_background() (the commented triple-quoted block around
self.article_soup = self.soup.find('article') and self.section_soup =
self.article_soup.find('section') must be removed); additionally add a nil-check
guard after obtaining self.article_soup to raise or handle a clear error if
article_soup is None so methods like description_and_background() and
get_title() don't raise AttributeError when section_soup is missing.
- Around line 727-731: The code reads the wrong attribute name on the time
element: replace the incorrect time_element['datatime'] with the correct HTML
attribute and safe access (e.g. time_element.get('datetime', '') ) so the date
variable is populated from the "datetime" attribute; update the reference in
piapage/__init__.py where date, time_element are used to avoid KeyError and
ensure the proper "datetime" attribute is read.
- Around line 771-782: The code assumes header_element =
self.section_soup.find('h2', string="Description") always exists and immediately
calls header_element.find_next_sibling(), which will throw if header_element is
None; update the logic in the block that builds
caption_as_soup/background_as_soup to first check if header_element is truthy
and only run the while loop (starting from header_element.find_next_sibling())
when it exists, otherwise leave caption_as_soup and background_as_soup empty or
handle the missing-description case (e.g., skip processing or return) to avoid
calling methods on None; reference header_element, section_soup,
find_next_sibling, caption_as_soup, background_as_soup, and current_node to
locate and change the code.
- Around line 173-178: The block using caption_soup and extracting piatext is
unsafe and prints a debug probe using the builtin id; update the code in the
method that contains caption_soup to first check caption_soup.find('figcaption')
exists before accessing .text (e.g., assign figcap =
caption_soup.find('figcaption') and only read figcap.text when truthy), compare
against the intended identifier via self.soup.id (or the correct attribute)
rather than printing, and remove the debug print statement (print(f'id: {id}'))
altogether so no builtin is leaked.
- Around line 423-427: The function local_page_url_for_id calls non-existent
GalleryPage.PHOTOJOURNAL_URL causing AttributeError; update this by either
referencing the correct existing attribute on GalleryPage (e.g.,
PHOTOJOURNAL_BASE, URL_PREFIX, or similar) or add a new class attribute
PHOTOJOURNAL_URL on GalleryPage with the intended base path, then return that
base + PiaPage.get_id(id) + '.html' inside local_page_url_for_id to avoid the
AttributeError.
- Around line 820-853: The parsing loop mistakenly processes `item` after the
`for k in range(len(items)):` loop so only the last `item` survives; move the
logic that splits `item` into `parts` and appends to `new_items` (and the
subsequent `pair.append(new_items)`) inside the per-item loop in the block that
iterates `for k in range(len(items)):` (keeping the existing cleaning steps that
trim and remove escaped newlines), so each cleaned `item` is split by commas,
stripped and added to `new_items` as you iterate through `items` before
appending `new_items` to `pair`.
- Around line 883-887: Replace the temporary hardcoded Windows path return with
the original Jekyll-root based return: remove the developer-specific string
"c:/seti/..." and restore the commented-out return that concatenates
PiaPage.JEKYLL_ROOT_ and GalleryPage.PRESS_RELEASES_SUBDIR_ (the existing
commented line using 'pages/%sxxx/%s.html' % (id[:5], id)); ensure the function
returns that constructed path using PiaPage.JEKYLL_ROOT_ and
GalleryPage.PRESS_RELEASES_SUBDIR_ instead of the local filesystem path.

In `@piapage/piapage_galleries.py`:
- Around line 514-516: The code converts targets to a list from a set but
disables sorting and leaves a debug print, causing nondeterministic ordering and
leaking debug output; re-enable deterministic ordering by calling
targets.sort(key=exoplanet_sort) (or an equivalent stable key) after targets =
list(targets) and remove the unconditional print(f'targets: {targets}') so no
debug information is emitted.

In `@piapage/repairs.py`:
- Around line 21-23: The comparison targets == 'Comet' is wrong because targets
is a list; update the Rosetta branch so it checks the primary target element
(e.g., ensure targets is non-empty and test targets[0] == 'Comet') or check
membership with 'Comet' in targets, then set targets to
'67P/Churyumov-Gerasimenko'; refer to the symbols 'Rosetta', 'targets', and
'67P/Churyumov-Gerasimenko' when applying the change.
- Around line 11-19: The code assigns strings to variables systems and targets
which are lists; change those assignments to keep list types: replace
assignments of '' with an empty list and 'Sun' with a single-element list
['Sun'] so downstream list operations on systems and targets (e.g., indexing,
iteration, membership) continue to work; update the branches shown (the checks
for 'Sol'/'Sun' in systems, 'Unknown' in systems, and 'Sol' in targets) to
assign [] or ['Sun'] to systems/targets as appropriate to preserve list
semantics.

In `@README.md`:
- Around line 5-9: Update the README.md installation step that currently shows
the Windows-only activation command "source venv\Scripts\activate": replace it
with a platform-aware instruction that shows both commands (Windows:
"venv\Scripts\activate" or "venv\\Scripts\\Activate.ps1" for PowerShell;
macOS/Linux: "source venv/bin/activate") or add a short note like "Activate the
virtualenv (Windows: venv\\Scripts\\activate, macOS/Linux: source
venv/bin/activate)" so users on all platforms can follow the setup.
- Around line 28-31: Fix the typo and formatting in the README entry for the
Photojournal gallery: change "obseletes" to "obsoletes" in the sentence
referencing the new gallery design (mentioning PiaPage), remove the trailing
space at the end of that line, and normalize indentation from 4 spaces to 2
spaces so the bullet lines align consistently with the surrounding list.

In `@requirements.txt`:
- Around line 1-10: Pin all dependencies in requirements.txt to specific secure
versions or constrained ranges (e.g., numpy, pyparsing, pycurl, lxml, pillow,
PyYAML, rms-filecache, rms-textkernel, rms-julian) and replace the ambiguous
"bs4" entry with the proper package name "beautifulsoup4"; use >= and < bounds
where appropriate to prevent breaking changes, choose versions that avoid known
vulnerable transitive deps (urllib3, protobuf), and produce a lockfile
(pip-compile or pip freeze) plus run pip-audit to verify no HIGH vulnerabilities
remain.

Copy link
Copy Markdown
Collaborator

@rfrenchseti rfrenchseti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Does this mean we can close #3?
  • This code is such a mess I didn't spend a lot of time on it, but just let my eyes run over it to find things that stood out. I do think the things I noticed (and commented on separately) are important, though.
  • There are a bunch of CodeRabbit comments (all of which appear to be accurate and important) and it isn't clear which were responded to. When you respond to CR, make each conversation is resolved so you can keep track of what you did. If you disagree with a CR comment, then add a comment to the conversation and let CR know.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
galleries.py (1)

505-537: 🧹 Nitpick | 🔵 Trivial

Consider breaking down the _gallery function for testability.

This function spans 200+ lines with 6 nested helper functions that share state through closures. While functional, this structure makes unit testing difficult. Consider extracting the helper functions to module level with explicit parameters, or at minimum, documenting the expected structure of the links parameter more clearly with type hints.

🤖 Fix all issues with AI agents
In `@galleries.py`:
- Around line 553-577: The nested function create_previous_next_menu currently
closes over url, all_links, item, subindex, index, and main_links which makes it
hard to test or reuse; refactor it to accept explicit parameters (e.g., def
create_previous_next_menu(isSublist, previous_link, url, all_links, item,
subindex, index, main_links)) and keep the same return value page_menu_data and
logic, then update all call sites to pass those values from the outer scope so
behavior remains identical while removing the hidden dependencies.
- Around line 641-642: The dict comprehension that filters out None values is
applied twice and the first result is discarded; update the code to filter
row_data once and append that filtered dict to floated_img_blocks (use the
filtered result variable or replace the two lines with a single append of the
filtered dict) so only one comprehension runs and row_data is trimmed before
appending to floated_img_blocks (identify and change occurrences around the
variables row_data and floated_img_blocks).
- Around line 539-542: The hardcoded flag get_referring_url should be made
configurable: replace the module-level constant with a configurable option
(e.g., an environment variable or a parameter) and propagate it into
create_floated_img_blocks; add a new parameter named resolve_urls (bool) to
create_floated_img_blocks and any callers, defaulting to the current behavior
via os.environ.get('RESOLVE_URLS') or similar; ensure you remove or repurpose
get_referring_url and update all references inside create_floated_img_blocks to
use resolve_urls so callers can toggle NASA URL resolution without editing
source.
- Around line 646-656: The write_yaml_file function may raise FileNotFoundError
because it assumes the output directory exists; before opening filepath
(constructed from fileroot and filename) ensure the target directory exists by
calling os.makedirs on the directory portion (e.g., os.path.dirname(filepath) or
fileroot) with exist_ok=True; add this directory-creation step immediately
before the with open(filepath, 'w') block in write_yaml_file to guarantee the
path is present.
- Around line 496-499: The FixIndent class in galleries.py duplicates the
identical class in gallerypage.py; extract this class into a shared utility
module (e.g., yaml_utils.py) and import it from both galleries.py and
gallerypage.py to avoid drift, and update the class to use modern super() syntax
(replace super(FixIndent, self).increase_indent with super().increase_indent).
Ensure the shared module exposes FixIndent and update any references to use the
imported class name to preserve behavior in functions that call FixIndent.
- Around line 41-50: The code opens arbitrary schemes and captures but never
uses the URLError; before calling urllib.request.Request/open, parse the URL
with urllib.parse.urlparse and only proceed if parsed.scheme is "http" or
"https" (otherwise return the original url), and on exception catch
urllib.error.URLError as err (or a different name) and either log err or include
it in the return-path decision; update the try/except around
urllib.request.build_opener / Request / opener.open to use the validated scheme
and replace the unused variable `e` with a used name `err` (or remove the name
and log the exception) so the exception is not ignored.
- Around line 691-708: The loop unpacks title and subtitle into variables that
are never used; update the unpacking in the branch that assigns (url, title,
subtitle, alt,) to use intentionally ignored names (e.g. (url, _title,
_subtitle, alt,)) so the linter recognizes them as unused, leaving the else
branch (url, _, alt,) unchanged; touch the loop around create_yaml_header,
create_previous_next_menu, create_jump_to_links, create_floated_img_blocks, and
write_yaml_file to ensure consistent underscore-prefixed ignored variables.

galleries.py Outdated
Comment on lines +553 to +577
def create_previous_next_menu(isSublist, previous_link):
page_menu_data = {}

# Locate target of ">"
if k < nsubs - 1:
neighbors[2] = sublist[k+1]
elif j == nlinks - 1:
neighbors[2] = None
else:
neighbors[2] = links[j+1][0]

# Locate target of ">>"
if k < nsubs - 1:
neighbors[3] = sublist[-1]
elif j == nlinks - 1:
neighbors[3] = None
elif sublist_is_open[-1]:
neighbors[3] = links[-1][-1]
else:
neighbors[3] = links[-1][0]
if url != all_links[0][0]:
page_menu_data['first_link'] = all_links[0][0]

# Write the Jekyll file
(filename, _, _, title) = sublist[k]
filepath = os.path.join(fileroot, filename)
with open(filepath, 'w') as f:
if url != all_links[-1][0]:
page_menu_data['last_link'] = all_links[-1][0]

# Write header
escaped = title.encode('ascii', 'xmlcharrefreplace')

f.write('---\n')
f.write('layout: base\n')
f.write('layout_style: wide\n')
f.write('format: html\n')
f.write('title: "%s"\n' % escaped.replace('"',"'"))
f.write('---\n\n')

f.write('<style>\n')
f.write('.floated_img\n')
f.write('{\n')
f.write(' float: left;\n')
f.write(' padding: 3px;\n')
f.write('}\n')
f.write('.floated_img table {\n')
f.write(' width: 200px;\n')
f.write(' border: 1px solid lightgrey;\n')
f.write(' border-collapse: collapse;\n')
f.write('}\n')
f.write('td.thumbnail {\n')
f.write(' vertical-align: bottom;\n')
f.write(' height: 100px;\n')
f.write(' text-align: center;\n')
f.write(' margin-left: auto;\n')
f.write(' margin-right: auto;\n')
f.write(' border: none;\n')
f.write('}\n')
f.write('td.caption {\n')
f.write(' vertical-align: top;\n')
f.write(' text-align: center;\n')
f.write(' height: 66px;\n')
f.write(' font-size: 10pt;\n')
f.write(' max-height: 66px;\n')
f.write(' word-wrap: break-word;\n')
f.write(' overflow: hidden;\n')
f.write(' text-overflow: ellipsis;\n')
f.write(' display: -webkit-box;\n')
f.write(' border: none;\n')
f.write(' -webkit-line-clamp: 3;\n')
f.write(' -webkit-box-orient: vertical;\n')
f.write('}\n')
f.write('</style>\n\n')

f.write('<h1>%s</h1>\n\n' % escaped)
f.write('<hr/>\n')

# Write first/prev/next/last navigation
if nlinks > 1:
f.write('<div>\n<p style="text-align:center;">\n')

if neighbors[0] is None:
f.write('&lt;&lt; first |\n')
else:
f.write('<a href="%s">&lt;&lt; first</a> |\n' % neighbors[0][0])
# handle the sublist next/previous differently
if isSublist:
if subindex + 1 < len(item):
page_menu_data['next_link'] = item[subindex + 1][0]

if neighbors[1] is None:
f.write('&lt; previous |\n')
else:
f.write('<a href="%s">&lt; previous</a> |\n' % neighbors[1][0])
elif index + 1 < len(main_links):
page_menu_data['next_link'] = main_links[index + 1][0]

if neighbors[2] is None:
f.write('next &gt; |\n')
else:
f.write('<a href="%s">next &gt;</a> |\n' % neighbors[2][0])
else:
if index + 1 < len(main_links):
page_menu_data['next_link'] = main_links[index + 1][0]

if previous_link is not None:
page_menu_data['previous_link'] = previous_link

return page_menu_data
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Nested function relies heavily on closure variables from outer scope.

create_previous_next_menu accesses url, all_links, item, subindex, index, and main_links from the enclosing scope. This tight coupling makes the function difficult to test, understand, and refactor. Consider passing required variables as explicit parameters.

🧰 Tools
🪛 Ruff (0.14.14)

[warning] 553-553: Missing return type annotation for private function create_previous_next_menu

(ANN202)

🤖 Prompt for AI Agents
In `@galleries.py` around lines 553 - 577, The nested function
create_previous_next_menu currently closes over url, all_links, item, subindex,
index, and main_links which makes it hard to test or reuse; refactor it to
accept explicit parameters (e.g., def create_previous_next_menu(isSublist,
previous_link, url, all_links, item, subindex, index, main_links)) and keep the
same return value page_menu_data and logic, then update all call sites to pass
those values from the outer scope so behavior remains identical while removing
the hidden dependencies.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (6)
piapage/__init__.py (4)

425-428: GalleryPage.PHOTOJOURNAL_URL is undefined — raises AttributeError.

The new module-level constant PHOTOJOURNAL_URL = '//photojournal.jpl.nasa.gov/catalog/' (line 41) exists in this module but local_page_url_for_id references GalleryPage.PHOTOJOURNAL_URL, which GalleryPage is unlikely to define. The intent of this change (per the AI summary) appears to be to use the module-level constant.

🐛 Proposed fix
-        return GalleryPage.PHOTOJOURNAL_URL + id + '.html'
+        return PHOTOJOURNAL_URL + id + '.html'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 425 - 428, The function
local_page_url_for_id currently references GalleryPage.PHOTOJOURNAL_URL which
does not exist and raises AttributeError; change it to use the module-level
constant PHOTOJOURNAL_URL instead and construct the URL as PHOTOJOURNAL_URL +
PiaPage.get_id(id) + '.html' (ensure you still call PiaPage.get_id(id) to
normalize the id). Update the reference inside local_page_url_for_id to
PHOTOJOURNAL_URL and remove the GalleryPage qualifier so the function uses the
module-level constant.

730-732: 'datatime' typo causes KeyError at runtime.

HTML's attribute is datetime, not datatime. Using time_element['datatime'] will raise a KeyError on every page that has a <time> element.

🐛 Proposed fix
-            date = time_element['datatime']
+            date = time_element.get('datetime', '')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 730 - 732, Typo: change the attribute
access time_element['datatime'] to use the correct HTML attribute and avoid
KeyError by using time_element.get('datetime') (or time_element['datetime'] if
you need to enforce presence); update the code around time_element and
section_soup where date is set so it reads the 'datetime' attribute and fall
back to time_element.get_text(strip=True) or None if missing.

884-887: Hardcoded developer-local Windows path must not ship.

jekyll_filepath_for_id returns a path specific to one developer's machine and the intended path (using PiaPage.JEKYLL_ROOT_ + GalleryPage.PRESS_RELEASES_SUBDIR_) remains commented out.

🐛 Proposed fix
-        # uncomment for testing
-        return f'c:/seti/rms-website/website/galleries/{id[:5]}xxx/{id}.html'
-        """return PiaPage.JEKYLL_ROOT_ + GalleryPage.PRESS_RELEASES_SUBDIR_ + \
-                    'pages/%sxxx/%s.html' % (id[:5], id)"""
+        return PiaPage.JEKYLL_ROOT_ + GalleryPage.PRESS_RELEASES_SUBDIR_ + \
+                    'pages/%sxxx/%s.html' % (id[:5], id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 884 - 887, The function
jekyll_filepath_for_id currently returns a hardcoded developer-local Windows
path; replace that line with the intended construction using
PiaPage.JEKYLL_ROOT_ and GalleryPage.PRESS_RELEASES_SUBDIR_ to build the
relative path for the gallery HTML (use the commented-out expression as the
basis), ensuring you format the id into the 'pages/%sxxx/%s.html' pattern and
avoid embedding any machine-specific absolute paths; prefer using the existing
constants (PiaPage.JEKYLL_ROOT_, GalleryPage.PRESS_RELEASES_SUBDIR_) and, if
needed, use os.path.join to produce a portable path.

180-376: self.section_soup (and most instance attributes) are never initialized.

The constructor body from line 180 to line 376 is wrapped in a triple-quoted string, so self.section_soup, self.title, self.credit, self.release_date, self.caption_soup, etc. are never set. Any call to get_title(), get_credit(), get_release_date(), get_is_movie(), description_and_background(), or get_pia_tables() on a non-_dict instance will raise AttributeError.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 180 - 376, The large constructor block is
accidentally enclosed in a triple-quoted string, so instance attributes like
self.section_soup, self.caption_soup, self.title, self.credit, self.release_date
and the calls to description_and_background(), get_title(), get_credit(),
get_release_date(), get_is_movie(), get_pia_tables(), etc. never run; fix this
by removing the surrounding triple quotes (or moving the code out of the string)
so the initialization code executes, restore assignment of self.article_soup and
self.section_soup and the subsequent calls (description_and_background,
get_pia_tables, get_is_movie, etc.), and run any dependent steps
(remote_version_info population, PiaPage.download_images_for_id, write_jekyll,
thumbnail shape logic) as originally intended.
galleries.py (1)

641-642: Redundant dict comprehension — first result is immediately discarded.

Line 641 filters row_data but the result is overwritten on line 642 by an identical comprehension.

🐛 Proposed fix
-            row_data = {k: v for k, v in row_data.items() if v is not None}
-            floated_img_blocks.append({k: v for k, v in row_data.items() if v is not None})
+            floated_img_blocks.append({k: v for k, v in row_data.items() if v is not None})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@galleries.py` around lines 641 - 642, The dict comprehension filtering of
row_data is done twice: the first assignment "row_data = {k: v for k, v in
row_data.items() if v is not None}" is immediately overwritten by the next line
that appends a newly built identical dict to floated_img_blocks; fix by
performing the filtering once and reusing that result when appending to
floated_img_blocks (i.e., compute the filtered dict into row_data or a new
variable and then call floated_img_blocks.append(...) with that same filtered
dict), updating the code paths around row_data and floated_img_blocks
accordingly.
piapage/piapage_galleries.py (1)

514-514: ⚠️ Potential issue | 🟠 Major

Exoplanet targets are not sorted with exoplanet_sort — regression from intended fix.

Every other target category uses a domain-specific sort key (asteroid_sort, comet_sort), but this change replaced the disabled targets.sort(key=exoplanet_sort) with plain alphabetical sorted(), losing the numeric/string decomposition in exoplanet_sort. Additionally, Ruff C414 flags the unnecessary list() call inside sorted().

🐛 Proposed fix
-targets = sorted(list(targets))
+targets = sorted(targets, key=exoplanet_sort)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/piapage_galleries.py` at line 514, The exoplanet targets lost their
domain-specific ordering because the code uses alphabetical
sorted(list(targets)) instead of the exoplanet_sort key; replace that line so
targets are sorted with exoplanet_sort (e.g., use targets = sorted(targets,
key=exoplanet_sort) or targets.sort(key=exoplanet_sort)) and remove the
unnecessary list() call to satisfy Ruff C414; update the spot where targets is
assigned (variable name targets) and reference the exoplanet_sort function to
restore the intended numeric/string decomposition ordering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@galleries.py`:
- Line 506: The resolve_urls parameter on _gallery is declared but never used;
either re-enable the URL-resolution logic inside create_floated_img_blocks
(uncomment the lines that use resolve_urls, wire in whatever URL-resolver
function is required, ensure images’ src/href are replaced only when
resolve_urls is True, and add/adjust unit tests to cover both True/False paths)
or remove the parameter and its docstring entry entirely (and update all callers
and tests to stop passing resolve_urls). Locate symbols _gallery and
create_floated_img_blocks to make the change and ensure the public API and
docstring remain consistent with the chosen option.
- Around line 518-519: Remove the stray '+' diff marker from the docstring line
describing the parameter resolve_urls in _gallery: edit the _gallery docstring
so the line reads "resolve_urls    if True, resolve NASA URLs to their final
redirected location (time-consuming operation)." (no leading '+') and ensure
indentation/spacing matches surrounding docstring style so help(_gallery) and
generated docs are clean.
- Line 674: The tuple unpacking in the galleries.py snippet binds a variable
named title that is never used; update the unpacking where item is destructured
(the line using (url, title, alt) = item) to either drop/ignore the unused
element by renaming it to _ (e.g., (url, _, alt) = item) or remove the title
slot and only unpack the values you need, ensuring no unused variable remains;
if title was intended to be used, instead reference the variable where
appropriate in the surrounding function or loop that processes item.

In `@piapage/__init__.py`:
- Around line 958-964: The bare "except:" in the block that sets piapages and
updating should be restricted to application exceptions; replace the
unconstrained except with "except Exception as e:" (or "except Exception:") so
KeyboardInterrupt/SystemExit propagate, and optionally include the caught
exception in the verbose print/log to preserve error context for debugging;
update the handler around the piapages = {} / if verbose: print('starting new
catalog') / updating = False section accordingly.
- Line 783: Remove the dead assignment to background_indices in
piapage.__init__.py: the variable background_indices is created but never used
because the code that referenced it is commented out, so delete the line
"background_indices = []" to eliminate the unused-variable warning (Ruff F841)
and keep the commented block unchanged; search for the symbol background_indices
in the file to confirm no other references before removing.

---

Duplicate comments:
In `@galleries.py`:
- Around line 641-642: The dict comprehension filtering of row_data is done
twice: the first assignment "row_data = {k: v for k, v in row_data.items() if v
is not None}" is immediately overwritten by the next line that appends a newly
built identical dict to floated_img_blocks; fix by performing the filtering once
and reusing that result when appending to floated_img_blocks (i.e., compute the
filtered dict into row_data or a new variable and then call
floated_img_blocks.append(...) with that same filtered dict), updating the code
paths around row_data and floated_img_blocks accordingly.

In `@piapage/__init__.py`:
- Around line 425-428: The function local_page_url_for_id currently references
GalleryPage.PHOTOJOURNAL_URL which does not exist and raises AttributeError;
change it to use the module-level constant PHOTOJOURNAL_URL instead and
construct the URL as PHOTOJOURNAL_URL + PiaPage.get_id(id) + '.html' (ensure you
still call PiaPage.get_id(id) to normalize the id). Update the reference inside
local_page_url_for_id to PHOTOJOURNAL_URL and remove the GalleryPage qualifier
so the function uses the module-level constant.
- Around line 730-732: Typo: change the attribute access
time_element['datatime'] to use the correct HTML attribute and avoid KeyError by
using time_element.get('datetime') (or time_element['datetime'] if you need to
enforce presence); update the code around time_element and section_soup where
date is set so it reads the 'datetime' attribute and fall back to
time_element.get_text(strip=True) or None if missing.
- Around line 884-887: The function jekyll_filepath_for_id currently returns a
hardcoded developer-local Windows path; replace that line with the intended
construction using PiaPage.JEKYLL_ROOT_ and GalleryPage.PRESS_RELEASES_SUBDIR_
to build the relative path for the gallery HTML (use the commented-out
expression as the basis), ensuring you format the id into the
'pages/%sxxx/%s.html' pattern and avoid embedding any machine-specific absolute
paths; prefer using the existing constants (PiaPage.JEKYLL_ROOT_,
GalleryPage.PRESS_RELEASES_SUBDIR_) and, if needed, use os.path.join to produce
a portable path.
- Around line 180-376: The large constructor block is accidentally enclosed in a
triple-quoted string, so instance attributes like self.section_soup,
self.caption_soup, self.title, self.credit, self.release_date and the calls to
description_and_background(), get_title(), get_credit(), get_release_date(),
get_is_movie(), get_pia_tables(), etc. never run; fix this by removing the
surrounding triple quotes (or moving the code out of the string) so the
initialization code executes, restore assignment of self.article_soup and
self.section_soup and the subsequent calls (description_and_background,
get_pia_tables, get_is_movie, etc.), and run any dependent steps
(remote_version_info population, PiaPage.download_images_for_id, write_jekyll,
thumbnail shape logic) as originally intended.

In `@piapage/piapage_galleries.py`:
- Line 514: The exoplanet targets lost their domain-specific ordering because
the code uses alphabetical sorted(list(targets)) instead of the exoplanet_sort
key; replace that line so targets are sorted with exoplanet_sort (e.g., use
targets = sorted(targets, key=exoplanet_sort) or
targets.sort(key=exoplanet_sort)) and remove the unnecessary list() call to
satisfy Ruff C414; update the spot where targets is assigned (variable name
targets) and reference the exoplanet_sort function to restore the intended
numeric/string decomposition ordering.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b4ce77d and 1d03ce7.

📒 Files selected for processing (3)
  • galleries.py
  • piapage/__init__.py
  • piapage/piapage_galleries.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@gallerypage.py`:
- Line 1064: The NameError occurs because the code references
galleries.FixIndent but the file imports FixIndent directly (from galleries
import FixIndent); change the call to use the imported symbol directly by
passing Dumper=FixIndent to yaml.dump (or alternatively import the module as
import galleries and keep galleries.FixIndent) so the reference matches the
import and avoids the undefined name.
- Around line 1049-1062: The loop using zip(names, values) can silently ignore
length mismatches; update the loop header (for (name, value) in zip(names,
values):) to use zip(names, values, strict=True) so a ValueError is raised if
lengths differ, ensuring misalignment is detected (or if running on Python
<3.10, replace with itertools.zip_longest and explicitly raise when a fill value
is seen).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c7d1a3d8-3fe8-46b9-95b5-b61d9284976f

📥 Commits

Reviewing files that changed from the base of the PR and between 1d03ce7 and f7b0356.

📒 Files selected for processing (2)
  • galleries.py
  • gallerypage.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
gallerypage.py (1)

1064-1064: ⚠️ Potential issue | 🔴 Critical

Use the imported FixIndent symbol here.

Line 16 imports FixIndent directly, but Line 1064 references galleries.FixIndent. galleries is not defined in this module, so _write_jekyll() raises a NameError before it writes any front matter.

🐛 Proposed fix
-            yaml_output = yaml.dump(yaml_data, Dumper=galleries.FixIndent, default_flow_style=False, sort_keys=False)
+            yaml_output = yaml.dump(yaml_data, Dumper=FixIndent, default_flow_style=False, sort_keys=False)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gallerypage.py` at line 1064, In _write_jekyll() the yaml.dump call
incorrectly references galleries.FixIndent causing a NameError; change that
reference to the imported FixIndent symbol (i.e., use FixIndent as the Dumper)
so the yaml_output line uses FixIndent directly rather than galleries.FixIndent.
requirements.txt (1)

1-10: ⚠️ Potential issue | 🟠 Major

Lock the dependency graph before documenting this as the install path.

With only minimum versions here and three completely unpinned rms-* packages, pip install -r requirements.txt is still non-reproducible. The current audit output for this change already resolves a vulnerable urllib3, so this is not an auditable baseline yet. Please add a constraints/lock file, or tighten the unresolved ranges, before making this the canonical setup flow.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@requirements.txt` around lines 1 - 10, The requirements.txt currently lists
only minimum versions and three unpinned packages (rms-filecache,
rms-textkernel, rms-julian), making installs non-reproducible; generate and
commit a pinned constraints/lock file (e.g., constraints.txt or
requirements.lock) that pins every package (including numpy, pyparsing, pycurl,
beautifulsoup4, lxml, pillow, PyYAML and the rms-* packages) to exact versions,
or tighten the ranges in requirements.txt to exact versions, and update
installation docs to instruct pip install -r requirements.txt -c
constraints.txt; use your package manager of choice (pip-compile, pip freeze, or
poetry lock) to produce the lock and ensure the vulnerable urllib3 version is
resolved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@gallerypage.py`:
- Around line 1073-1085: The code is incorrectly prepending "{{ site.baseurl }}"
to remote URLs from remote_version_info, which breaks absolute remote links;
update the f.write call that emits the link (using the local variables url and
key inside the loop that iterates remote_keys and reads
self.remote_version_info[key]) to use the raw url value (do not prefix with
site.baseurl), so the rendered link remains the original remote URL.

In `@piapage/repairs.py`:
- Around line 25-26: The condition uses missions == 'Magellan' which always
fails because missions is a list; change the checks to test the list contents
(e.g., use "if missions and missions[0] == 'Magellan':" or "if 'Magellan' in
missions:") for the two Magellan-specific blocks (the host reordering block that
references hosts and the Radar System normalization block that checks Radar
System values). Ensure you guard against empty lists (missions) to avoid
IndexError when using missions[0].

In `@README.md`:
- Around line 30-33: Update README to mark the PiaPage/Photojournal workflow as
legacy: add a clear "Legacy / Unsupported" heading and move the press-release
setup/deploy steps that still reference the old Photojournal flow (the PiaPage
workflow entries referenced around lines that update press-release steps) into
that legacy section, or remove them from the main onboarding flow; ensure any
mentions of PiaPage or Photojournal point to the new catalog URL and explicitly
state that the old gallery workflow is deprecated so contributors won’t follow
it by default.

---

Duplicate comments:
In `@gallerypage.py`:
- Line 1064: In _write_jekyll() the yaml.dump call incorrectly references
galleries.FixIndent causing a NameError; change that reference to the imported
FixIndent symbol (i.e., use FixIndent as the Dumper) so the yaml_output line
uses FixIndent directly rather than galleries.FixIndent.

In `@requirements.txt`:
- Around line 1-10: The requirements.txt currently lists only minimum versions
and three unpinned packages (rms-filecache, rms-textkernel, rms-julian), making
installs non-reproducible; generate and commit a pinned constraints/lock file
(e.g., constraints.txt or requirements.lock) that pins every package (including
numpy, pyparsing, pycurl, beautifulsoup4, lxml, pillow, PyYAML and the rms-*
packages) to exact versions, or tighten the ranges in requirements.txt to exact
versions, and update installation docs to instruct pip install -r
requirements.txt -c constraints.txt; use your package manager of choice
(pip-compile, pip freeze, or poetry lock) to produce the lock and ensure the
vulnerable urllib3 version is resolved.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 984e550d-83d6-45b3-b7a0-c7b55413c140

📥 Commits

Reviewing files that changed from the base of the PR and between f7b0356 and 2ac6a26.

📒 Files selected for processing (4)
  • README.md
  • gallerypage.py
  • piapage/repairs.py
  • requirements.txt

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
gallerypage.py (1)

320-326: ⚠️ Potential issue | 🟡 Minor

Incomplete comment — sentence is cut off.

Line 320 reads "Failing that, see if there is exactly" but the sentence is incomplete. This appears to be a truncated comment that should describe what condition is being tested.

📝 Proposed fix
-                # Failing that, see if there is exactly
+                # Failing that, see if there is exactly one in the background
                 else:
                     test = self._keywords_with_suffixes_from_background
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gallerypage.py` around lines 320 - 326, The comment "Failing that, see if
there is exactly" is incomplete; update it to clearly state the condition being
checked in the else block: that we look for exactly one unique base keyword
(prefix before '+') in self._keywords_with_suffixes_from_background that
contains the given suffix and, if exactly one exists, set primary to that base
keyword (variables involved: self._keywords_with_suffixes_from_background,
suffix, test, primary). Make the comment a single concise sentence describing
this exact check and outcome.
piapage/__init__.py (2)

472-476: ⚠️ Potential issue | 🔴 Critical

page_filepath_for_id will construct malformed paths.

local_page_url_for_id now returns a URL-like string (e.g., //pds-rings.seti.org/press_releases/pages/...), but page_filepath_for_id concatenates this with DOCUMENTS_FILE_ROOT_ (which is /Library/WebServer/Documents/). This produces an invalid filesystem path like /Library/WebServer/Documents//pds-rings.seti.org/....

Per the relevant snippet from gallerypage.py:68-71, DOCUMENTS_FILE_ROOT_ expects a local path segment, not a URL.

Analysis

The original local_page_url_for_id returned /press_releases/pages/... which worked with the filesystem concatenation. The new implementation returns a protocol-relative URL which breaks this contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 472 - 476, page_filepath_for_id currently
concatenates GalleryPage.DOCUMENTS_FILE_ROOT_ with
PiaPage.local_page_url_for_id(id) but local_page_url_for_id now returns a
protocol-relative URL (e.g., //host/...), producing invalid filesystem paths;
fix by normalizing the value returned by PiaPage.local_page_url_for_id(id)
inside page_filepath_for_id: detect and strip any leading protocol or host
portion (remove leading '//' or any '://.../host' prefix) so you keep only the
local path segment (e.g., /press_releases/...), then safely join it with
GalleryPage.DOCUMENTS_FILE_ROOT_ (or remove/adjust duplicate slashes) so the
final filesystem path is valid; reference: page_filepath_for_id,
PiaPage.local_page_url_for_id, GalleryPage.DOCUMENTS_FILE_ROOT_.

424-428: ⚠️ Potential issue | 🔴 Critical

GalleryPage.PDS_PHOTOJOURNAL_URL does not exist — use the module-level constant.

Line 428 references GalleryPage.PDS_PHOTOJOURNAL_URL, but this constant is defined at module level as PDS_PHOTOJOURNAL_URL (line 42), not as a class attribute on GalleryPage. This will raise AttributeError at runtime.

🐛 Fix
     `@staticmethod`
     def local_page_url_for_id(id):
 
         id = PiaPage.get_id(id)
-        return GalleryPage.PDS_PHOTOJOURNAL_URL + id + '.html'
+        return PDS_PHOTOJOURNAL_URL + id + '.html'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 424 - 428, The method local_page_url_for_id
uses GalleryPage.PDS_PHOTOJOURNAL_URL which doesn't exist; change it to use the
module-level constant PDS_PHOTOJOURNAL_URL instead so it returns
PDS_PHOTOJOURNAL_URL + id + '.html' after resolving id with PiaPage.get_id(id);
update the reference in local_page_url_for_id accordingly.
♻️ Duplicate comments (10)
piapage/__init__.py (4)

726-734: ⚠️ Potential issue | 🟠 Major

Typo: datatime should be datetime.

Line 732 reads the HTML attribute 'datatime' which doesn't exist — the correct HTML5 attribute is 'datetime'. This will raise KeyError or return None depending on how BeautifulSoup handles missing attributes.

🐛 Proposed fix
     def get_release_date(self):
         """Return the release date in yyyy-mm-dd format."""
 
         date = ''
         time_element = self.section_soup.find('time')
         if time_element:
-            date = time_element['datatime']
+            date = time_element.get('datetime', '')
 
         return date
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 726 - 734, In get_release_date(), the code
reads the non-existent attribute 'datatime' from time_element which causes
errors; update the attribute access to use the correct HTML5 attribute
'datetime' and make the access defensive (use time_element.get('datetime') or
similar) so get_release_date() returns an empty string instead of raising
KeyError when the attribute is missing.

770-781: ⚠️ Potential issue | 🟠 Major

Guard against missing "Description" header.

Line 776 calls header_element.find_next_sibling() without checking if header_element is None. If the page lacks an <h2>Description</h2> element, this will raise AttributeError.

🛡️ Proposed fix
     def description_and_background(self):
         """Return the description and any identified background information as two soups."""
 
         # Get the caption as soup
 
         header_element = self.section_soup.find('h2', string="Description")
 
         caption_as_soup = BeautifulSoup('', PARSER)
         background_as_soup = BeautifulSoup('', PARSER)
 
+        if header_element is None:
+            return (caption_as_soup, background_as_soup)
+
         # loop thru the elements after "Description"
         current_node = header_element.find_next_sibling()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 770 - 781, The code assumes header_element
exists before calling header_element.find_next_sibling(), which will raise
AttributeError if no <h2 string="Description"> is found; update the logic in the
block that creates header_element, caption_as_soup, background_as_soup and the
while loop to first check if header_element is None and short-circuit (e.g.,
leave caption_as_soup/background_as_soup empty or return/skip processing) so you
only call header_element.find_next_sibling() and iterate current_node when
header_element is truthy; reference the variables header_element, section_soup,
caption_as_soup, background_as_soup, and current_node when making the guard.

180-376: ⚠️ Potential issue | 🔴 Critical

Commented-out block leaves self.section_soup uninitialized.

The entire constructor body from line 180 to 376 is wrapped in a triple-quoted string, which means:

  • self.section_soup is never assigned
  • Methods like get_title(), get_credit(), get_release_date(), description_and_background() that reference self.section_soup will raise AttributeError

If this code is intentionally disabled pending future work, consider raising an explicit error or providing a stub implementation rather than leaving the class in a broken state.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 180 - 376, The constructor body is
accidentally wrapped in a triple-quoted string so self.section_soup (and other
attributes) are never assigned, causing AttributeError in methods like
description_and_background, get_title, get_credit, and get_release_date; fix by
either restoring the constructor block (remove the triple quotes) so the code
that assigns self.article_soup and self.section_soup runs, or if the code must
remain disabled, explicitly initialize self.section_soup (and any other
attributes those methods expect) and raise a clear exception (e.g.,
NotImplementedError) from __init__ to fail fast; reference the class
constructor, self.section_soup, description_and_background(), get_title(),
get_credit(), and get_release_date() when making the change.

843-852: ⚠️ Potential issue | 🔴 Critical

Typo .strop() and loop logic error — only last item is processed.

Two issues in this block:

  1. Line 845: .strop() is a typo for .strip() — this will raise AttributeError.
  2. Lines 849-850: The for part in item.split(',') loop is outside the for item in items loop, so only the last item is processed.
🐛 Proposed fix
                 new_items = []
                 for item in items:
                     item = item.strip()
-                    item = item.replace('\\r','').replace('\\n','').strop() # Fix HTML
+                    item = item.replace('\\r','').replace('\\n','').strip() # Fix HTML
                     if not item:
                         continue
+                    for part in item.split(','):
+                        new_items.append(part.strip())
 
-                for part in item.split(','):
-                    new_items.append(part.strip())
-
                 pair.append(new_items)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 843 - 852, There are two bugs in the block
working with items: fix the typo `.strop()` to `.strip()` on the `item` string
and move the `for part in item.split(','):` loop (and its
`new_items.append(part.strip())`) inside the `for item in items:` loop so each
item is split and added rather than only the last one; ensure you preserve the
existing `if not item: continue` behavior and that `pair.append(new_items)` runs
after processing all items for that pair.
galleries.py (5)

518-519: ⚠️ Potential issue | 🔴 Critical

Diff artifact + in docstring must be removed.

Line 519 contains a literal + character (a diff marker) that was accidentally committed. This will appear in help(_gallery) and documentation output.

🐛 Proposed fix
         resolve_urls    if True, resolve NASA URLs to their final redirected
-+                        location (time-consuming operation).
+                         location (time-consuming operation).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@galleries.py` around lines 518 - 519, The docstring for the gallery function
contains a stray diff marker '+' in the parameter description for resolve_urls;
remove the literal '+' character so the sentence reads correctly (e.g., "resolve
NASA URLs to their final redirected location...") in the docstring associated
with _gallery (or the function/class where resolve_urls is documented) to
prevent the '+' from appearing in help() and generated docs.

506-506: ⚠️ Potential issue | 🟠 Major

resolve_urls parameter is declared but never used.

The resolve_urls parameter is documented but the code that would use it (lines 632-633) is double-commented out. Callers passing resolve_urls=True will see no effect. Either uncomment the resolution code or remove the parameter until the feature is ready.

Also applies to: 632-633

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@galleries.py` at line 506, The _gallery function declares a resolve_urls
parameter but never uses it; either re-enable the URL resolution block or remove
the parameter and update callers/docs. Fix by either (A) uncommenting and wiring
the existing resolution code inside _gallery so that when resolve_urls is True
it runs the URL resolution logic (ensure any helper functions it calls are
imported/available and that the conditional checks resolve_urls), or (B) remove
the resolve_urls parameter from _gallery signature, delete the commented
resolution block, and update any callers and the function docstring to remove
references to resolve_urls. Make sure to keep the function name _gallery and
update tests/call sites accordingly.

47-56: ⚠️ Potential issue | 🟠 Major

Validate URL scheme before opening to prevent SSRF.

urllib.request.Request(url) accepts any scheme including file:, which could be a security concern if URLs originate from untrusted sources. Additionally, the URLError exception variable was removed per past feedback but the scheme validation was not added.

🛡️ Proposed fix
 def get_final_NASA_url(url):
     """ Since NASA is now redirecting all the original photojournal links,
         this will (hopefully) get the redirected links for the updated galleries

     Input:
         url     Original photojournal URL w/PIA embedded, example:
                 https://photojournal.jpl.nasa.gov/catalog/PIA24615
     """
+    # Only allow http/https schemes
+    if not url.startswith(('http://', 'https://')):
+        return url
+
     try:
         opener = urllib.request.build_opener()
         request = urllib.request.Request(url)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@galleries.py` around lines 47 - 56, Before calling
urllib.request.Request(url)/opener.open(request) validate the URL scheme using
urllib.parse.urlparse(url).scheme and only allow 'http' and 'https'; if the
scheme is missing or not allowed return the original url. Insert this check at
the start of the try block (before building the opener/request) surrounding the
existing opener.open(request)/response.geturl() logic, and keep the existing
exception handling that returns url on urllib.error.URLError (you may
reintroduce the exception variable if needed for logging but behavior should
still return the original url on failure).

34-37: 🧹 Nitpick | 🔵 Trivial

FixIndent class duplicated across modules.

This class is identical to the one in gallerypage.py. Consider extracting to a shared utility module to avoid drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@galleries.py` around lines 34 - 37, The FixIndent class (subclassing
yaml.Dumper) and its increase_indent method are duplicated (also present in
gallerypage.py); extract this class into a single shared utility module (e.g.,
utils or yaml_helpers), export FixIndent from there, replace the local
definitions in galleries.py and gallerypage.py with an import of FixIndent, and
update any references so both modules use the single shared implementation to
avoid drift.

645-646: ⚠️ Potential issue | 🟡 Minor

Redundant dict comprehension — filtering applied twice.

Line 645 filters row_data but the result is discarded. Line 646 repeats the same operation.

🐛 Proposed fix
-            row_data = {k: v for k, v in row_data.items() if v is not None}
-            floated_img_blocks.append({k: v for k, v in row_data.items() if v is not None})
+            floated_img_blocks.append({k: v for k, v in row_data.items() if v is not None})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@galleries.py` around lines 645 - 646, The code performs the same
None-filtering of row_data twice; remove the redundant comprehension by either
assigning the filtered dict to a temp (e.g., filtered_row_data) and use that
when appending to floated_img_blocks, or simply drop the first line and keep the
single append that builds the filtered dict; update references to use row_data
only where unmodified and floated_img_blocks.append(...) to receive the filtered
dict.
gallerypage.py (1)

1073-1093: ⚠️ Potential issue | 🟠 Major

Remote URLs should not be prefixed with site.baseurl.

Line 1084 prepends {{ site.baseurl }} to URLs from remote_version_info, which are absolute remote URLs (e.g., https://photojournal.jpl.nasa.gov/...). This produces invalid paths like /subpathhttps://....

However, per past discussion, this code path is deprecated and not actively used. Consider adding a comment or removing the block entirely to prevent confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gallerypage.py` around lines 1073 - 1093, The loop over remote_keys is
incorrectly prefixing absolute remote URLs with '{{ site.baseurl }}', causing
invalid links; in the block that writes the link (using
self.remote_version_info, remote_keys and local variable url) remove the
unconditional '{{ site.baseurl }}' prefix and instead write the raw url, or
conditionally prepend site.baseurl only when url is a relative path (e.g.,
startswith '/'); since this path is deprecated you may alternatively add a
clarifying comment above the loop or remove the entire block to avoid future
confusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@galleries.py`:
- Line 622: The loop uses the variable name `id`, which shadows the built-in
id() function; rename the loop variable (e.g., to `product_id` or `pid`) in the
`for id in product_ids[filename]:` statement and update all references to that
variable within the loop body (search for uses of `id` in the same scope) so the
logic and behavior remain identical but no longer shadow the builtin.

In `@gallerypage.py`:
- Around line 7-11: Update the obsolescence notice comment in gallerypage.py by
fixing the typos: change "obseleted" to "obsoleted" (first sentence) and change
"obselete" to "obsolete" (last sentence) so the explanatory comment reads
correctly; locate the multi-line comment block at the top of the file and apply
these two word corrections.

In `@piapage/repairs.py`:
- Line 25: Remove the obsolete commented conditional checks that reference
missions[0] (e.g., the lines beginning with "#if missions[0] == 'Magellan' and
len(hosts) > 1:" found in repairs.py) — delete both occurrences (around the
earlier block and the second occurrence mentioned) so the function no longer
contains dead/commented logic referencing missions and hosts; ensure no other
code depends on those commented lines before removing them.
- Around line 11-19: The current membership checks overwrite entire lists;
instead normalize and filter only offending entries: for systems, replace any
'Sol' with 'Sun' but keep other values (e.g., systems = [ 'Sun' if s == 'Sol'
else s for s in systems ]), remove any 'Unknown' entries (filter them out), and
if the resulting systems list is empty then set systems = ['']; for targets,
replace 'Sol' with 'Sun' similarly (e.g., targets = [ 'Sun' if t == 'Sol' else t
for t in targets ]), and if any 'Sun' appears in targets ensure systems contains
'Sun' (append if missing) rather than wiping the whole list.

---

Outside diff comments:
In `@gallerypage.py`:
- Around line 320-326: The comment "Failing that, see if there is exactly" is
incomplete; update it to clearly state the condition being checked in the else
block: that we look for exactly one unique base keyword (prefix before '+') in
self._keywords_with_suffixes_from_background that contains the given suffix and,
if exactly one exists, set primary to that base keyword (variables involved:
self._keywords_with_suffixes_from_background, suffix, test, primary). Make the
comment a single concise sentence describing this exact check and outcome.

In `@piapage/__init__.py`:
- Around line 472-476: page_filepath_for_id currently concatenates
GalleryPage.DOCUMENTS_FILE_ROOT_ with PiaPage.local_page_url_for_id(id) but
local_page_url_for_id now returns a protocol-relative URL (e.g., //host/...),
producing invalid filesystem paths; fix by normalizing the value returned by
PiaPage.local_page_url_for_id(id) inside page_filepath_for_id: detect and strip
any leading protocol or host portion (remove leading '//' or any '://.../host'
prefix) so you keep only the local path segment (e.g., /press_releases/...),
then safely join it with GalleryPage.DOCUMENTS_FILE_ROOT_ (or remove/adjust
duplicate slashes) so the final filesystem path is valid; reference:
page_filepath_for_id, PiaPage.local_page_url_for_id,
GalleryPage.DOCUMENTS_FILE_ROOT_.
- Around line 424-428: The method local_page_url_for_id uses
GalleryPage.PDS_PHOTOJOURNAL_URL which doesn't exist; change it to use the
module-level constant PDS_PHOTOJOURNAL_URL instead so it returns
PDS_PHOTOJOURNAL_URL + id + '.html' after resolving id with PiaPage.get_id(id);
update the reference in local_page_url_for_id accordingly.

---

Duplicate comments:
In `@galleries.py`:
- Around line 518-519: The docstring for the gallery function contains a stray
diff marker '+' in the parameter description for resolve_urls; remove the
literal '+' character so the sentence reads correctly (e.g., "resolve NASA URLs
to their final redirected location...") in the docstring associated with
_gallery (or the function/class where resolve_urls is documented) to prevent the
'+' from appearing in help() and generated docs.
- Line 506: The _gallery function declares a resolve_urls parameter but never
uses it; either re-enable the URL resolution block or remove the parameter and
update callers/docs. Fix by either (A) uncommenting and wiring the existing
resolution code inside _gallery so that when resolve_urls is True it runs the
URL resolution logic (ensure any helper functions it calls are
imported/available and that the conditional checks resolve_urls), or (B) remove
the resolve_urls parameter from _gallery signature, delete the commented
resolution block, and update any callers and the function docstring to remove
references to resolve_urls. Make sure to keep the function name _gallery and
update tests/call sites accordingly.
- Around line 47-56: Before calling
urllib.request.Request(url)/opener.open(request) validate the URL scheme using
urllib.parse.urlparse(url).scheme and only allow 'http' and 'https'; if the
scheme is missing or not allowed return the original url. Insert this check at
the start of the try block (before building the opener/request) surrounding the
existing opener.open(request)/response.geturl() logic, and keep the existing
exception handling that returns url on urllib.error.URLError (you may
reintroduce the exception variable if needed for logging but behavior should
still return the original url on failure).
- Around line 34-37: The FixIndent class (subclassing yaml.Dumper) and its
increase_indent method are duplicated (also present in gallerypage.py); extract
this class into a single shared utility module (e.g., utils or yaml_helpers),
export FixIndent from there, replace the local definitions in galleries.py and
gallerypage.py with an import of FixIndent, and update any references so both
modules use the single shared implementation to avoid drift.
- Around line 645-646: The code performs the same None-filtering of row_data
twice; remove the redundant comprehension by either assigning the filtered dict
to a temp (e.g., filtered_row_data) and use that when appending to
floated_img_blocks, or simply drop the first line and keep the single append
that builds the filtered dict; update references to use row_data only where
unmodified and floated_img_blocks.append(...) to receive the filtered dict.

In `@gallerypage.py`:
- Around line 1073-1093: The loop over remote_keys is incorrectly prefixing
absolute remote URLs with '{{ site.baseurl }}', causing invalid links; in the
block that writes the link (using self.remote_version_info, remote_keys and
local variable url) remove the unconditional '{{ site.baseurl }}' prefix and
instead write the raw url, or conditionally prepend site.baseurl only when url
is a relative path (e.g., startswith '/'); since this path is deprecated you may
alternatively add a clarifying comment above the loop or remove the entire block
to avoid future confusion.

In `@piapage/__init__.py`:
- Around line 726-734: In get_release_date(), the code reads the non-existent
attribute 'datatime' from time_element which causes errors; update the attribute
access to use the correct HTML5 attribute 'datetime' and make the access
defensive (use time_element.get('datetime') or similar) so get_release_date()
returns an empty string instead of raising KeyError when the attribute is
missing.
- Around line 770-781: The code assumes header_element exists before calling
header_element.find_next_sibling(), which will raise AttributeError if no <h2
string="Description"> is found; update the logic in the block that creates
header_element, caption_as_soup, background_as_soup and the while loop to first
check if header_element is None and short-circuit (e.g., leave
caption_as_soup/background_as_soup empty or return/skip processing) so you only
call header_element.find_next_sibling() and iterate current_node when
header_element is truthy; reference the variables header_element, section_soup,
caption_as_soup, background_as_soup, and current_node when making the guard.
- Around line 180-376: The constructor body is accidentally wrapped in a
triple-quoted string so self.section_soup (and other attributes) are never
assigned, causing AttributeError in methods like description_and_background,
get_title, get_credit, and get_release_date; fix by either restoring the
constructor block (remove the triple quotes) so the code that assigns
self.article_soup and self.section_soup runs, or if the code must remain
disabled, explicitly initialize self.section_soup (and any other attributes
those methods expect) and raise a clear exception (e.g., NotImplementedError)
from __init__ to fail fast; reference the class constructor, self.section_soup,
description_and_background(), get_title(), get_credit(), and get_release_date()
when making the change.
- Around line 843-852: There are two bugs in the block working with items: fix
the typo `.strop()` to `.strip()` on the `item` string and move the `for part in
item.split(','):` loop (and its `new_items.append(part.strip())`) inside the
`for item in items:` loop so each item is split and added rather than only the
last one; ensure you preserve the existing `if not item: continue` behavior and
that `pair.append(new_items)` runs after processing all items for that pair.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 873b6080-e188-4eae-8c15-944d7f6235f7

📥 Commits

Reviewing files that changed from the base of the PR and between 2ac6a26 and dcd1d29.

📒 Files selected for processing (5)
  • README.md
  • galleries.py
  • gallerypage.py
  • piapage/__init__.py
  • piapage/repairs.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@galleries.py`:
- Around line 115-117: The tuples list built from catalog.items() (tuples = [];
for (product_id, page) in catalog.items(): tuples.append((page.release_date,
product_id))) is not sorted, so release-date galleries inherit insertion order;
fix by sorting the tuples before grouping—call tuples.sort() (or
tuples.sort(key=lambda t: t[0]) for explicit date key) immediately after the
loop so entries are deterministic and chronological when later processing
product_id and page.release_date.
- Around line 52-59: The URL resolution call using urllib.request.build_opener()
and opener.open(request) must include a timeout to avoid indefinite blocking
when resolve_urls is enabled; modify the code that creates the opener/request
(the opener.open(request) call) to pass a sensible timeout value (e.g.,
timeout=10) — preferably a configurable constant — and update the exception
handling to also catch socket.timeout (or treat timeout as a URLError) so
stalled requests fail fast instead of hanging gallery generation.
- Around line 660-661: The gallery HTML output is currently appended using
Markdown tokens (yaml_output in galleries.py), causing literal "# {{ page.title
}}" and '---' to appear in .html files; update the appended string that builds
yaml_output to use HTML tags instead (replace the Markdown heading with an H1
element for {{ page.title }} and the '---' separators with <hr />) so the
generated .html pages render correctly without relying on Jekyll markdown
conversion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d44c388d-f5bb-47b7-80c1-efc2d9bec5a5

📥 Commits

Reviewing files that changed from the base of the PR and between dcd1d29 and a7b3872.

📒 Files selected for processing (1)
  • galleries.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
piapage/repairs.py (2)

34-35: 🧹 Nitpick | 🔵 Trivial

Remove the stale commented condition.

Line 34 documents the old missions[0] check and now just adds noise next to the fixed condition below it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/repairs.py` around lines 34 - 35, Remove the stale commented-out
condition "#if missions[0] == 'Magellan' and 'Radar System' in instruments"
which is redundant noise next to the active check using "if 'Magellan' in
missions and 'Radar System' in instruments:"; simply delete that commented line
so only the correct conditional (using missions membership) remains, keeping
references to the "missions" and "instruments" variables intact.

11-19: ⚠️ Potential issue | 🟠 Major

Sun normalization still drops valid system metadata.

After Line 11 strips 'Sun' from systems, Lines 15-19 only restore it when the original target was 'Sol'. That makes already-normalized inputs like targets=['Sun'] collapse to systems=[''], while the equivalent targets=['Sol'] path keeps ['Sun'].

🐛 Proposed fix
     systems = [s for s in systems if s not in ('Sol', 'Sun', 'Unknown')]
     if not systems:
         systems = ['']

-    if 'Sol' in targets:
-        targets = ['Sun' if t == 'Sol' else t for t in targets]
-        # Preserve existing non-empty system values; only backfill if blank.
-        if systems == ['']:
-           systems = ['Sun']
+    targets = ['Sun' if t == 'Sol' else t for t in targets]
+    if 'Sun' in targets and 'Sun' not in systems:
+        if systems == ['']:
+            systems = ['Sun']
+        else:
+            systems.append('Sun')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/repairs.py` around lines 11 - 19, The code removes 'Sun' from systems
then only restores it when an original target was 'Sol', causing inputs with
targets=['Sun'] to lose valid system metadata; fix by normalizing targets first
(replace 'Sol' with 'Sun') or, after the existing replacement block, check
against the normalized targets (not just original) and if systems == [''] and
any(t == 'Sun' for t in targets) set systems = ['Sun']; ensure you preserve
existing non-empty system values and reference the variables systems and targets
in repairs.py (the replacement list comprehension and the systems backfill
logic).
galleries.py (1)

657-663: ⚠️ Potential issue | 🟠 Major

Use HTML, not Markdown, in generated .html pages.

write_yaml_file() appends Markdown (# {{ page.title }} and ---) to files that are written with a .html extension. Unless the layout explicitly runs {{ content | markdownify }}, Jekyll will leave those tokens literal, so gallery headings/separators won't render.

🔧 Proposed fix
-        yaml_output += "\n# {{ page.title }}\n\n---\n\n{% include gallery.html %}\n"
+        yaml_output += "\n<h1>{{ page.title }}</h1>\n\n<hr />\n\n{% include gallery.html %}\n"
Does Jekyll convert Markdown inside files with a `.html` extension when they contain YAML front matter, or is Markdown conversion limited to `.md` / `.markdown` files unless the layout explicitly uses `markdownify`?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@galleries.py` around lines 657 - 663, The write_yaml_file function is
appending Markdown tokens into files that will be served as .html; update
write_yaml_file to emit HTML instead of Markdown by replacing the "# {{
page.title }}" line with an HTML heading (e.g., <h1>{{ page.title }}</h1>) and
replace the Markdown separators ("---") used as visual breaks with HTML
equivalents (e.g., <hr/> or appropriate div wrappers) so yaml_output contains
valid HTML content; ensure you still preserve the YAML front matter boundaries
and the "{% include gallery.html %}" include remains after the HTML content.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@galleries.py`:
- Around line 657-663: The write_yaml_file function is appending Markdown tokens
into files that will be served as .html; update write_yaml_file to emit HTML
instead of Markdown by replacing the "# {{ page.title }}" line with an HTML
heading (e.g., <h1>{{ page.title }}</h1>) and replace the Markdown separators
("---") used as visual breaks with HTML equivalents (e.g., <hr/> or appropriate
div wrappers) so yaml_output contains valid HTML content; ensure you still
preserve the YAML front matter boundaries and the "{% include gallery.html %}"
include remains after the HTML content.

In `@piapage/repairs.py`:
- Around line 34-35: Remove the stale commented-out condition "#if missions[0]
== 'Magellan' and 'Radar System' in instruments" which is redundant noise next
to the active check using "if 'Magellan' in missions and 'Radar System' in
instruments:"; simply delete that commented line so only the correct conditional
(using missions membership) remains, keeping references to the "missions" and
"instruments" variables intact.
- Around line 11-19: The code removes 'Sun' from systems then only restores it
when an original target was 'Sol', causing inputs with targets=['Sun'] to lose
valid system metadata; fix by normalizing targets first (replace 'Sol' with
'Sun') or, after the existing replacement block, check against the normalized
targets (not just original) and if systems == [''] and any(t == 'Sun' for t in
targets) set systems = ['Sun']; ensure you preserve existing non-empty system
values and reference the variables systems and targets in repairs.py (the
replacement list comprehension and the systems backfill logic).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 431f5b86-b904-468e-93bb-bd9f881b024c

📥 Commits

Reviewing files that changed from the base of the PR and between a7b3872 and bd404ca.

📒 Files selected for processing (4)
  • galleries.py
  • gallerypage.py
  • piapage/piapage_galleries.py
  • piapage/repairs.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
piapage/__init__.py (1)

376-403: ⚠️ Potential issue | 🔴 Critical

Fix the broken cross-file symbol references in the remote URL helpers.

These methods reference Galleries.PHOTOJOURNAL_URL, but this file imports galleries, and galleries.py exposes PHOTOJOURNAL_DOMAIN / PHOTOJOURNAL_URL_ instead. url_from_source() calls remote_page_url_for_id() during __init__, so Line 380 raises before any PiaPage can be constructed.

Suggested fix
-        return Galleries.PHOTOJOURNAL_URL + '/catalog/' + id
+        return galleries.PHOTOJOURNAL_URL_ + id

Apply the same symbol fix to the other remote URL builders in this file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 376 - 403, The remote URL helpers
remote_page_url_for_id, remote_thumbnail_url_for_id, and remote_small_url_for_id
reference Galleries.PHOTOJOURNAL_URL which doesn't exist; change those
references to the exported symbol from the galleries module (use
galleries.PHOTOJOURNAL_URL_ or galleries.PHOTOJOURNAL_DOMAIN + appropriate path
as the project exposes) so url_from_source() no longer raises during __init__;
update all three methods to use the correct galleries.* symbol consistently.
♻️ Duplicate comments (3)
piapage/__init__.py (2)

175-184: ⚠️ Potential issue | 🔴 Critical

The triple-quoted block disables the rest of __init__.

Everything from Line 176 through Line 370 is currently a string literal, so self.section_soup, self.title, self.release_date, self.is_planetary, and the local/remote URL fields never initialize. build_catalog() later reads p.is_planetary, so a successful constructor will still fail immediately.

Suggested fix
-        """ This code is currently underconstruction for the individual PIA pages
-        self.article_soup = self.soup.find('article')
-        self.section_soup = self.article_soup.find('section')
+        # This code is currently under construction for the individual PIA pages.
+        self.article_soup = self.soup.find('article')
+        self.section_soup = self.article_soup.find('section') if self.article_soup else self.soup
...
-    """

Also applies to: 371-371

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 175 - 184, The constructor (__init__)
contains an unintended triple-quoted string that turns lines initializing
self.article_soup, self.section_soup and all subsequent attribute assignments
(e.g., self.title, self.release_date, self.is_planetary, local/remote URL
fields) into a string literal; remove or close the stray triple-quote so those
lines execute normally and ensure description_and_background() is called to set
self.caption_soup and self.background_soup; verify build_catalog() usage of
p.is_planetary now reads a properly initialized attribute.

838-845: ⚠️ Potential issue | 🔴 Critical

Fix the strop() typo and keep per-item splitting inside the loop.

Line 840 raises AttributeError, and because Lines 844-845 are outside the for item in items loop only the last list entry is preserved.

Suggested fix
-                for item in items:
-                    item = item.strip()
-                    item = item.replace('\\r','').replace('\\n','').strop() # Fix HTML
-                    if not item:
-                        continue
-
-                for part in item.split(','):
-                    new_items.append(part.strip())
+                for item in items:
+                    item = item.strip()
+                    item = item.replace('\\r', '').replace('\\n', '').strip()
+                    if not item:
+                        continue
+                    for part in item.split(','):
+                        new_items.append(part.strip())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 838 - 845, There is a typo calling strop()
on item and the per-item splitting loop is placed outside the items loop so only
the last item is processed; fix by changing item =
item.replace('\\r','').replace('\\n','').strop() to use the correct method chain
(e.g., .strip()) and move the subsequent "for part in item.split(',')" block
inside the "for item in items:" loop so each stripped item is split and appended
to new_items (use item, items, new_items as the identifiers shown).
galleries.py (1)

662-663: ⚠️ Potential issue | 🟠 Major

Use HTML, not Markdown, in generated .html pages.

write_yaml_file() appends # {{ page.title }} and --- to files named *.html. Jekyll will process the front matter, but it will not Markdown-convert .html page bodies, so those tokens render literally unless the layout explicitly markdownifys them.

Suggested fix
-        yaml_output += "\n# {{ page.title }}\n\n---\n\n{% include gallery.html %}\n"
+        yaml_output += "\n<h1>{{ page.title }}</h1>\n\n<hr />\n\n{% include gallery.html %}\n"
Does Jekyll convert Markdown syntax inside files with a `.html` extension when the file has YAML front matter, or only in `.md`/`.markdown` pages?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@galleries.py` around lines 662 - 663, write_yaml_file is appending Markdown
tokens ("# {{ page.title }}" and extra '---') into files with a .html extension
which won't be Markdown-converted by Jekyll; update the code that builds
yaml_output (the yaml_output string in write_yaml_file) to append HTML instead:
replace the Markdown heading with an HTML heading tag (e.g., <h1>{{ page.title
}}</h1>) and remove or replace the trailing Markdown front-matter divider so the
body contains valid HTML before the "{% include gallery.html %}" include.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@piapage/__init__.py`:
- Around line 420-423: local_page_url_for_id currently returns
GalleryPage.PDS_PHOTOJOURNAL_URL + id + '.html' which produces a flat URL;
restore the bucketed directory segment so the URL matches where pages are
written (jekyll_filepath_for_id writes to pages/%sxxx/%s.html and galleries.py
links to that path). Update local_page_url_for_id (and references to
PiaPage.get_id and GalleryPage.PDS_PHOTOJOURNAL_URL) to insert the "%sxxx/"
directory prefix derived from the numeric id (the same bucketing logic used by
jekyll_filepath_for_id) before the filename so the returned URL points to
pages/%sxxx/<id>.html.

---

Outside diff comments:
In `@piapage/__init__.py`:
- Around line 376-403: The remote URL helpers remote_page_url_for_id,
remote_thumbnail_url_for_id, and remote_small_url_for_id reference
Galleries.PHOTOJOURNAL_URL which doesn't exist; change those references to the
exported symbol from the galleries module (use galleries.PHOTOJOURNAL_URL_ or
galleries.PHOTOJOURNAL_DOMAIN + appropriate path as the project exposes) so
url_from_source() no longer raises during __init__; update all three methods to
use the correct galleries.* symbol consistently.

---

Duplicate comments:
In `@galleries.py`:
- Around line 662-663: write_yaml_file is appending Markdown tokens ("# {{
page.title }}" and extra '---') into files with a .html extension which won't be
Markdown-converted by Jekyll; update the code that builds yaml_output (the
yaml_output string in write_yaml_file) to append HTML instead: replace the
Markdown heading with an HTML heading tag (e.g., <h1>{{ page.title }}</h1>) and
remove or replace the trailing Markdown front-matter divider so the body
contains valid HTML before the "{% include gallery.html %}" include.

In `@piapage/__init__.py`:
- Around line 175-184: The constructor (__init__) contains an unintended
triple-quoted string that turns lines initializing self.article_soup,
self.section_soup and all subsequent attribute assignments (e.g., self.title,
self.release_date, self.is_planetary, local/remote URL fields) into a string
literal; remove or close the stray triple-quote so those lines execute normally
and ensure description_and_background() is called to set self.caption_soup and
self.background_soup; verify build_catalog() usage of p.is_planetary now reads a
properly initialized attribute.
- Around line 838-845: There is a typo calling strop() on item and the per-item
splitting loop is placed outside the items loop so only the last item is
processed; fix by changing item =
item.replace('\\r','').replace('\\n','').strop() to use the correct method chain
(e.g., .strip()) and move the subsequent "for part in item.split(',')" block
inside the "for item in items:" loop so each stripped item is split and appended
to new_items (use item, items, new_items as the identifiers shown).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3734c3da-30fd-4976-8212-ed301b699f1e

📥 Commits

Reviewing files that changed from the base of the PR and between bd404ca and 6c47b13.

📒 Files selected for processing (2)
  • galleries.py
  • piapage/__init__.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
piapage/__init__.py (1)

376-423: ⚠️ Potential issue | 🔴 Critical

Fix undefined symbol references in URL helper methods.

Galleries.PHOTOJOURNAL_URL references a non-existent class—galleries is imported as a module, and the correct constant is PHOTOJOURNAL_URL_ (with underscore). Similarly, GalleryPage.PDS_PHOTOJOURNAL_URL is not defined as a class attribute on GalleryPage; it exists as a module-level constant in galleries.py. Any invocation of these methods raises AttributeError before building a URL.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 376 - 423, All URL helper methods
(remote_page_url_for_id, remote_thumbnail_url_for_id, remote_small_url_for_id,
remote_medium_url_for_id, local_page_url_for_id) currently reference
non-existent class attributes (Galleries.PHOTOJOURNAL_URL and
GalleryPage.PDS_PHOTOJOURNAL_URL); change those to use the correct module-level
constants exported by the galleries module (e.g. replace
Galleries.PHOTOJOURNAL_URL with galleries.PHOTOJOURNAL_URL_ and replace
GalleryPage.PDS_PHOTOJOURNAL_URL with the PDS constant from the galleries module
such as galleries.PDS_PHOTOJOURNAL_URL), updating all occurrences in those
functions so the module-level constants are referenced directly.
♻️ Duplicate comments (2)
piapage/__init__.py (2)

175-185: ⚠️ Potential issue | 🔴 Critical

Remove the triple-quoted constructor block.

Line 175 starts a multiline string that comments out the self.article_soup / self.section_soup setup and every later assignment in __init__ through Line 371. Instances created here never populate fields like title, xrefs, remote_version_info, or local_page_url, and any method that expects self.section_soup will fail once it is called.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 175 - 185, The constructor currently has a
multiline string blocking the assignments (self.article_soup, self.section_soup,
the call to description_and_background, and subsequent field initializations),
causing instance fields like title, xrefs, remote_version_info, and
local_page_url to remain unset; remove the opening and closing triple quotes
around that block in __init__ so the statements (including self.article_soup =
self.soup.find('article'), self.section_soup =
self.article_soup.find('section'), (self.caption_soup, self.background_soup) =
self.description_and_background(), and the later initializations) execute
normally and restore expected behavior for methods that rely on
self.section_soup.

840-850: ⚠️ Potential issue | 🔴 Critical

Fix the item-normalization loop before table parsing ships.

str.strop() raises immediately, and the comma-splitting loop still runs after for item in items, so only the last cleaned entry survives. Multi-valued fields like missions or targets will parse incorrectly even after the typo is fixed.

🛠️ Proposed fix
                 new_items = []
                 for item in items:
-                    item = item.strip()
-                    item = item.replace('\\r','').replace('\\n','').strop() # Fix HTML
-                    if not item:
+                    cleaned_item = item.strip()
+                    cleaned_item = cleaned_item.replace('\\r', '').replace('\\n', '').strip()
+                    if not cleaned_item:
                         continue
-
-                for part in item.split(','):
-                    new_items.append(part.strip())
+                    for part in cleaned_item.split(','):
+                        new_items.append(part.strip())
 
                 pair.append(new_items)
#!/bin/bash
set -euo pipefail

sed -n '840,850p' piapage/__init__.py
python - <<'PY'
print("has strop:", hasattr("", "strop"))

text = "alpha\nbeta,gamma"
items = text.split("\n")
new_items = []
for item in items:
    item = item.strip()
    if not item:
        continue
for part in item.split(","):
    new_items.append(part.strip())
print("current loop output:", new_items)
PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 840 - 850, The loop currently calls the
nonexistent method strop() and mis-indents the comma-splitting so only the last
item is used; change item = item.replace('\\r','').replace('\\n','').strop() to
item = item.replace('\\r','').replace('\\n','').strip() (or simply item =
item.strip()) and move the inner loop that splits on ',' (for part in
item.split(','): new_items.append(part.strip())) inside the outer for item in
items: loop so each cleaned item is split and appended to new_items, then keep
pair.append(new_items) after the outer loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@piapage/__init__.py`:
- Around line 706-709: get_title currently reads the <title> from
self.section_soup which is scoped to the article section and thus empty; update
get_title to look up the document title from the full parsed document (use
self.soup.find('title')) or return the real heading element instead (e.g., find
an <h1> in the section) and pass that node to GalleryPage.soup_as_text so the
title is derived from the correct DOM (refer to get_title, self.section_soup,
self.soup, and GalleryPage.soup_as_text).

---

Outside diff comments:
In `@piapage/__init__.py`:
- Around line 376-423: All URL helper methods (remote_page_url_for_id,
remote_thumbnail_url_for_id, remote_small_url_for_id, remote_medium_url_for_id,
local_page_url_for_id) currently reference non-existent class attributes
(Galleries.PHOTOJOURNAL_URL and GalleryPage.PDS_PHOTOJOURNAL_URL); change those
to use the correct module-level constants exported by the galleries module (e.g.
replace Galleries.PHOTOJOURNAL_URL with galleries.PHOTOJOURNAL_URL_ and replace
GalleryPage.PDS_PHOTOJOURNAL_URL with the PDS constant from the galleries module
such as galleries.PDS_PHOTOJOURNAL_URL), updating all occurrences in those
functions so the module-level constants are referenced directly.

---

Duplicate comments:
In `@piapage/__init__.py`:
- Around line 175-185: The constructor currently has a multiline string blocking
the assignments (self.article_soup, self.section_soup, the call to
description_and_background, and subsequent field initializations), causing
instance fields like title, xrefs, remote_version_info, and local_page_url to
remain unset; remove the opening and closing triple quotes around that block in
__init__ so the statements (including self.article_soup =
self.soup.find('article'), self.section_soup =
self.article_soup.find('section'), (self.caption_soup, self.background_soup) =
self.description_and_background(), and the later initializations) execute
normally and restore expected behavior for methods that rely on
self.section_soup.
- Around line 840-850: The loop currently calls the nonexistent method strop()
and mis-indents the comma-splitting so only the last item is used; change item =
item.replace('\\r','').replace('\\n','').strop() to item =
item.replace('\\r','').replace('\\n','').strip() (or simply item = item.strip())
and move the inner loop that splits on ',' (for part in item.split(','):
new_items.append(part.strip())) inside the outer for item in items: loop so each
cleaned item is split and appended to new_items, then keep
pair.append(new_items) after the outer loop.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1c6820bf-b173-49ac-b2f3-bec9f0a09ffa

📥 Commits

Reviewing files that changed from the base of the PR and between 6c47b13 and 74aaed1.

📒 Files selected for processing (1)
  • piapage/__init__.py

Comment on lines 706 to +709
def get_title(self):
"""Return the title of the page. Example: Artemis Corona"""

title = GalleryPage.soup_as_text(self.soup.find('b'))
if title[:8] == self.id and title[8] == ':':
title = title[9:].strip()

return title
return GalleryPage.soup_as_text(self.section_soup.find('title'))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t read <title> from self.section_soup.

<title> lives under <head>, not inside the article <section>. Once self.section_soup is actually narrowed to page content, this returns an empty title for every page. Use self.soup.find('title') or the real heading element instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 706 - 709, get_title currently reads the
<title> from self.section_soup which is scoped to the article section and thus
empty; update get_title to look up the document title from the full parsed
document (use self.soup.find('title')) or return the real heading element
instead (e.g., find an <h1> in the section) and pass that node to
GalleryPage.soup_as_text so the title is derived from the correct DOM (refer to
get_title, self.section_soup, self.soup, and GalleryPage.soup_as_text).

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
piapage/__init__.py (1)

376-417: ⚠️ Potential issue | 🔴 Critical

Change Galleries to galleries in all remote URL helper methods.

This file imports galleries at Line 12, but Lines 380, 389, 391, 400, 402, 413, and 417 reference Galleries (uppercase). Each helper will raise NameError when called.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 376 - 417, The NameError comes from using
the undefined symbol Galleries (capital G) in the URL helper methods; update all
references to the imported lowercase module galleries inside
remote_page_url_for_id, remote_thumbnail_url_for_id, remote_small_url_for_id,
and remote_medium_url_for_id (replace Galleries.PHOTOJOURNAL_URL with
galleries.PHOTOJOURNAL_URL) while keeping the existing path concatenation and
suffix logic unchanged.
♻️ Duplicate comments (4)
piapage/__init__.py (4)

175-371: ⚠️ Potential issue | 🔴 Critical

The triple-quoted block comments out the rest of __init__.

Everything from self.article_soup through self.thumbnail_shape is inside a string literal. section_soup, parsed metadata, local URLs, and is_planetary never initialize, so downstream code like build_catalog() will hit missing attributes immediately.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 175 - 371, The rest of PiaPage.__init__ is
accidentally enclosed in a triple-quoted string, so attributes like
self.article_soup, self.section_soup, self.is_planetary and self.thumbnail_shape
are never initialized; fix by removing the opening or closing triple quote that
turns the block into a string literal (restore normal code execution) so the
code from self.article_soup through self.thumbnail_shape runs as statements
inside __init__; verify the triple quote around the large comment is deleted (or
converted to a true docstring placed at the top of __init__) and run tests that
exercise PiaPage.__init__ and build_catalog() to ensure attributes are created.

706-709: ⚠️ Potential issue | 🟠 Major

Read the title from the full document, not self.section_soup.

<title> lives under <head>, so once section_soup is scoped to article content this returns an empty title for every page.

🛠️ Suggested fix
-        return GalleryPage.soup_as_text(self.section_soup.find('title'))
+        return GalleryPage.soup_as_text(self.soup.find('title'))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 706 - 709, get_title currently reads the
<title> from self.section_soup (scoped to article body) which is empty; update
get_title to read the title from the full document soup instead (e.g. use
self.soup.find('title') or the class's root/document soup) and fall back to
GalleryPage.soup_as_text(self.section_soup.find('title')) only if the
document-level title is missing; adjust the call to GalleryPage.soup_as_text
accordingly in the get_title method.

169-173: ⚠️ Potential issue | 🟠 Major

Guard the figcaption lookup and remove the debug probe.

Line 171 can still raise when .find('figcaption') returns None, and Line 173 prints the builtin id instead of a page identifier. This block is still on the constructor path, so it can fail before any page parsing happens.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 169 - 173, The block using caption_soup
risks AttributeError when caption_soup.find('figcaption') returns None and also
contains a stray debug print that prints the builtin id; change it to first
check caption_soup is truthy then assign figcap =
caption_soup.find('figcaption') and only access figcap.text if figcap is not
None (use this guarded value to set piatext), and remove the debug print (or
replace it with a proper logger message that references the intended page
identifier such as self.soup.id or self.id if that is the correct field) to
avoid printing the builtin id.

840-850: ⚠️ Potential issue | 🔴 Critical

This table parser still crashes and only preserves the last item.

Line 843 calls str.strop(), which does not exist, so any non-empty entry raises immediately. Even after fixing that typo, Lines 847-848 are outside the for item in items loop, so only the final list entry is split and stored.

🛠️ Suggested fix
                 new_items = []
                 for item in items:
                     item = item.strip()
-                    item = item.replace('\\r','').replace('\\n','').strop() # Fix HTML
+                    item = item.replace('\\r', '').replace('\\n', '').strip() # Fix HTML
                     if not item:
                         continue
 
-                for part in item.split(','):
-                    new_items.append(part.strip())
+                    for part in item.split(','):
+                        new_items.append(part.strip())
 
                 pair.append(new_items)
#!/bin/bash
sed -n '840,850p' piapage/__init__.py
python - <<'PY'
print("has_str_strop =", hasattr(str, "strop"))
PY

Expected result: the snippet shows the comma-splitting loop is dedented outside for item in items, and Python prints has_str_strop = False.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 840 - 850, Fix the typo str.strop() ->
str.strip() and move the comma-splitting loop inside the item iteration so every
entry is split instead of only the last one: in the block using new_items,
items, item and pair, call item =
item.replace('\\r','').replace('\\n','').strip(), then inside the for item in
items loop run for part in item.split(','): new_items.append(part.strip());
leave pair.append(new_items) after the outer loop so pair collects all parts
from every item.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@piapage/__init__.py`:
- Around line 420-423: local_page_url_for_id currently returns a
PDS_PHOTOJOURNAL_URL-based absolute path (and references
GalleryPage.PDS_PHOTOJOURNAL_URL) which doesn't match the site-relative path
shape used by jekyll_filepath_for_id and page_filepath_for_id; update
local_page_url_for_id to return the site-relative pages path used elsewhere (the
'/<PRESS_RELEASES_SUBDIR_>/pages/.../...' shape) instead of using
GalleryPage.PDS_PHOTOJOURNAL_URL, and if needed import the correct constant
(e.g. PRESS_RELEASES_SUBDIR_ or PDS_PHOTOJOURNAL_URL) from galleries.py; ensure
the returned string matches the pattern produced by jekyll_filepath_for_id and
consumed by page_filepath_for_id (use the same id slicing id[:5]xxx/{id}.html).
- Around line 774-779: The loop that copies nodes after header_element using
current_node = header_element.find_next_sibling() reparents nodes when calling
caption_as_soup.append(current_node), so you must save the next sibling before
appending; change the loop to obtain next_node = current_node.next_sibling at
the top (or just before calling caption_as_soup.append(current_node)), then
append current_node, and finally set current_node = next_node so the traversal
doesn't stop early.

---

Outside diff comments:
In `@piapage/__init__.py`:
- Around line 376-417: The NameError comes from using the undefined symbol
Galleries (capital G) in the URL helper methods; update all references to the
imported lowercase module galleries inside remote_page_url_for_id,
remote_thumbnail_url_for_id, remote_small_url_for_id, and
remote_medium_url_for_id (replace Galleries.PHOTOJOURNAL_URL with
galleries.PHOTOJOURNAL_URL) while keeping the existing path concatenation and
suffix logic unchanged.

---

Duplicate comments:
In `@piapage/__init__.py`:
- Around line 175-371: The rest of PiaPage.__init__ is accidentally enclosed in
a triple-quoted string, so attributes like self.article_soup, self.section_soup,
self.is_planetary and self.thumbnail_shape are never initialized; fix by
removing the opening or closing triple quote that turns the block into a string
literal (restore normal code execution) so the code from self.article_soup
through self.thumbnail_shape runs as statements inside __init__; verify the
triple quote around the large comment is deleted (or converted to a true
docstring placed at the top of __init__) and run tests that exercise
PiaPage.__init__ and build_catalog() to ensure attributes are created.
- Around line 706-709: get_title currently reads the <title> from
self.section_soup (scoped to article body) which is empty; update get_title to
read the title from the full document soup instead (e.g. use
self.soup.find('title') or the class's root/document soup) and fall back to
GalleryPage.soup_as_text(self.section_soup.find('title')) only if the
document-level title is missing; adjust the call to GalleryPage.soup_as_text
accordingly in the get_title method.
- Around line 169-173: The block using caption_soup risks AttributeError when
caption_soup.find('figcaption') returns None and also contains a stray debug
print that prints the builtin id; change it to first check caption_soup is
truthy then assign figcap = caption_soup.find('figcaption') and only access
figcap.text if figcap is not None (use this guarded value to set piatext), and
remove the debug print (or replace it with a proper logger message that
references the intended page identifier such as self.soup.id or self.id if that
is the correct field) to avoid printing the builtin id.
- Around line 840-850: Fix the typo str.strop() -> str.strip() and move the
comma-splitting loop inside the item iteration so every entry is split instead
of only the last one: in the block using new_items, items, item and pair, call
item = item.replace('\\r','').replace('\\n','').strip(), then inside the for
item in items loop run for part in item.split(','):
new_items.append(part.strip()); leave pair.append(new_items) after the outer
loop so pair collects all parts from every item.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2d911a41-2bc9-4e0f-bf82-7c2ae586b264

📥 Commits

Reviewing files that changed from the base of the PR and between 6c47b13 and 74aaed1.

📒 Files selected for processing (1)
  • piapage/__init__.py

Comment on lines 420 to +423
def local_page_url_for_id(id):

id = PiaPage.get_id(id)
return '/' + GalleryPage.PRESS_RELEASES_SUBDIR_ + \
'pages/%sxxx/%s.html' % (id[:5], id)
return GalleryPage.PDS_PHOTOJOURNAL_URL + f'{id[:5]}xxx/{id}.html'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

local_page_url_for_id() no longer matches the page paths written elsewhere.

page_filepath_for_id() concatenates this return value onto GalleryPage.DOCUMENTS_FILE_ROOT_[:-1], but this helper now returns a PDS_PHOTOJOURNAL_URL-based path instead of the site-relative /<PRESS_RELEASES_SUBDIR_>/pages/... shape that jekyll_filepath_for_id() writes. In the provided context, PDS_PHOTOJOURNAL_URL also lives in galleries.py, not on GalleryPage.

🛠️ Suggested fix
     def local_page_url_for_id(id):
 
         id = PiaPage.get_id(id)
-        return GalleryPage.PDS_PHOTOJOURNAL_URL + f'{id[:5]}xxx/{id}.html'
+        return '/' + GalleryPage.PRESS_RELEASES_SUBDIR_ + \
+                        f'pages/{id[:5]}xxx/{id}.html'
🧰 Tools
🪛 Ruff (0.15.9)

[warning] 420-420: Missing return type annotation for staticmethod local_page_url_for_id

(ANN205)


[error] 420-420: Function argument id is shadowing a Python builtin

(A002)


[error] 422-422: Variable id is shadowing a Python builtin

(A001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 420 - 423, local_page_url_for_id currently
returns a PDS_PHOTOJOURNAL_URL-based absolute path (and references
GalleryPage.PDS_PHOTOJOURNAL_URL) which doesn't match the site-relative path
shape used by jekyll_filepath_for_id and page_filepath_for_id; update
local_page_url_for_id to return the site-relative pages path used elsewhere (the
'/<PRESS_RELEASES_SUBDIR_>/pages/.../...' shape) instead of using
GalleryPage.PDS_PHOTOJOURNAL_URL, and if needed import the correct constant
(e.g. PRESS_RELEASES_SUBDIR_ or PDS_PHOTOJOURNAL_URL) from galleries.py; ensure
the returned string matches the pattern produced by jekyll_filepath_for_id and
consumed by page_filepath_for_id (use the same id slicing id[:5]xxx/{id}.html).

Comment on lines +774 to +779
current_node = header_element.find_next_sibling()
while current_node is not None:
if isinstance(current_node, (Tag, NavigableString)):
caption_as_soup.append(current_node)

filtered_paragraphs_in_soup.append(paragraphs_in_soup[k])
filtered_paragraphs.append(paragraphs[k])
current_node = current_node.next_sibling
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
from bs4 import BeautifulSoup, NavigableString, Tag

html = "<section><h2>Description</h2><p>one</p><p>two</p><p>three</p></section>"
section = BeautifulSoup(html, "lxml").find("section")
caption = BeautifulSoup("", "lxml")

node = section.find("h2", string="Description").find_next_sibling()
while node is not None:
    if isinstance(node, (Tag, NavigableString)):
        caption.append(node)
    node = node.next_sibling

print("captured_paragraphs =", len(caption.find_all("p")))
PY

Repository: SETI/rms-galleries

Length of output: 390


🏁 Script executed:

#!/bin/bash
python3 << 'PY'
from bs4 import BeautifulSoup, NavigableString, Tag

# Test 1: Original code (without saving next_sibling)
html = "<section><h2>Description</h2><p>one</p><p>two</p><p>three</p></section>"
section = BeautifulSoup(html, "html.parser").find("section")
caption = BeautifulSoup("", "html.parser")

node = section.find("h2", string="Description").find_next_sibling()
while node is not None:
    if isinstance(node, (Tag, NavigableString)):
        caption.append(node)
    node = node.next_sibling

print("Test 1 (Original): captured_paragraphs =", len(caption.find_all("p")))

# Test 2: Fixed code (saving next_sibling before append)
section = BeautifulSoup(html, "html.parser").find("section")
caption = BeautifulSoup("", "html.parser")

node = section.find("h2", string="Description").find_next_sibling()
while node is not None:
    next_node = node.next_sibling
    if isinstance(node, (Tag, NavigableString)):
        caption.append(node)
    node = next_node

print("Test 2 (Fixed):    captured_paragraphs =", len(caption.find_all("p")))
PY

Repository: SETI/rms-galleries

Length of output: 147


🏁 Script executed:

sed -n '770,785p' piapage/__init__.py

Repository: SETI/rms-galleries

Length of output: 733


Save next_sibling before moving nodes into the new soup.

caption_as_soup.append(current_node) reparents the node. Reading current_node.next_sibling after that terminates the walk early, so only the first element after "Description" gets copied.

🛠️ Suggested fix
         current_node = header_element.find_next_sibling()
         while current_node is not None:
+            next_node = current_node.next_sibling
             if isinstance(current_node, (Tag, NavigableString)):
                 caption_as_soup.append(current_node)
 
-            current_node = current_node.next_sibling
+            current_node = next_node
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@piapage/__init__.py` around lines 774 - 779, The loop that copies nodes after
header_element using current_node = header_element.find_next_sibling() reparents
nodes when calling caption_as_soup.append(current_node), so you must save the
next sibling before appending; change the loop to obtain next_node =
current_node.next_sibling at the top (or just before calling
caption_as_soup.append(current_node)), then append current_node, and finally set
current_node = next_node so the traversal doesn't stop early.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants