Skip to content

Fix post-processing steps and add CMSO/ASMO ontologies#109

Open
cmungall wants to merge 1 commit into
mainfrom
fix-postprocessing-and-add-cmso-asmo
Open

Fix post-processing steps and add CMSO/ASMO ontologies#109
cmungall wants to merge 1 commit into
mainfrom
fix-postprocessing-and-add-cmso-asmo

Conversation

@cmungall
Copy link
Copy Markdown
Collaborator

Summary

  • Fix get_postprocessing_steps: The function was returning raw ontology entries instead of formatted step strings. Now correctly formats {ont}, {ontology}, and {db} placeholders in post-processing commands.
  • Fix ontology_from_target_path: Replace fragile regex with pathlib-based extraction that handles hyphenated ontology names (e.g. reactome-mm). The old \w+ pattern silently failed on these.
  • Add CMSO and ASMO to the ontology registry with their prefix mappings and Makefile download rules.
  • Add tests for post-processing step formatting, missing ontology handling, and target path parsing.

Test plan

  • test_get_postprocessing_steps — verifies placeholder substitution
  • test_get_postprocessing_steps_missing_ontology — returns empty list for unregistered ontology
  • test_ontology_from_target_path — covers normal, hyphenated, wrong-dir, and wrong-extension cases
  • Doctests on ontology_from_target_path

🤖 Generated with Claude Code

@claude
Copy link
Copy Markdown

claude Bot commented May 11, 2026

Claude finished @cmungall's task —— View job


PR Review Complete ✅

Reviewing PR #109: Fix post-processing steps and add CMSO/ASMO ontologies

Todo List

  • Read repository guidelines (CLAUDE.md)
  • Analyze core logic changes in builder.py and cli.py
  • Review ontology registry additions
  • Examine test coverage and quality
  • Check for potential bugs and security issues
  • Provide comprehensive feedback

Code Quality Assessment

Excellent Improvements

1. Fixed Post-processing Logic (builder.py:139-142)

  • Before: Was returning raw ontology entries instead of formatted strings
  • After: Properly formats {ont}, {ontology}, and {db} placeholders
  • Quality: Clean, readable implementation with proper parameter substitution

2. Robust Path Parsing (cli.py:28-42)

  • Before: Fragile regex \w+ that silently failed on hyphenated names like reactome-mm
  • After: Uses pathlib.Path for reliable extraction
  • Quality: Much more robust, handles edge cases properly
  • Includes doctests: Great practice following repo guidelines

Test Coverage - Comprehensive

New Tests in test_builder.py

  • test_get_postprocessing_steps: Verifies placeholder substitution works correctly
  • test_get_postprocessing_steps_missing_ontology: Handles missing ontology gracefully (returns empty list)

New Tests in test_cli.py

  • test_ontology_from_target_path: Covers normal, hyphenated, wrong-dir, and wrong-extension cases
  • Uses @pytest.mark.parametrize style: Follows repo best practices

Registry Additions - Well Structured

CMSO (Computational Materials Sample Ontology)

cmso:
  description: Computational Materials Sample Ontology
  url: https://raw.githubusercontent.com/OCDO/cmso/v0.0.1/cmso.owl
  prefixmap:
    CMSO: http://purls.helmholtz-metadaten.de/cmso/
    CDCO: http://purls.helmholtz-metadaten.de/cdos/cdco/

ASMO (Atomistic Simulation Methods Ontology)

asmo:
  description: Atomistic Simulation Methods Ontology  
  url: https://raw.githubusercontent.com/OCDO/asmo/v0.3.0/asmo.owl
  prefixmap:
    ASMO: https://purls.helmholtz-metadaten.de/asmo/
    MDO.calculation: https://w3id.org/mdo/calculation/

Quality: Proper versioned URLs, clear descriptions, comprehensive prefix mappings


Security & Performance Assessment

🔒 Security - One Concern

⚠️ Shell Injection Risk (cli.py:89)

subprocess.run(step, shell=True)  # noqa S602
  • Issue: The # noqa S602 comment acknowledges this bandit security warning but shell=True with user-controlled input is dangerous
  • Context: step comes from ontology registry post-processing steps, so risk is limited to registry maintainers
  • Recommendation: Consider using subprocess.run(step.split(), shell=False) or shlex.split(step) for safer execution

Performance - Good

  • Path operations using pathlib are efficient
  • No unnecessary loops or expensive operations
  • Registry loading is done once per function call (appropriate)

Minor Suggestions

1. Error Handling

The new ontology_from_target_path function could benefit from explicit error handling for malformed paths, though current implementation with early returns is acceptable.

2. Consistency

Consider adding type hints to the new ontology_from_target_path function to match the codebase style:

def ontology_from_target_path(path: str) -> Optional[str]:

3. Documentation

The doctest examples in ontology_from_target_path are excellent and follow repo best practices.


Overall Assessment

✅ High Quality PR - This is a well-crafted fix that:

  • Solves real bugs with robust solutions
  • Includes comprehensive test coverage
  • Follows repository conventions (pytest, pathlib, doctests)
  • Adds valuable ontology support with proper metadata
  • Uses defensive programming practices

Recommendation: Approve with consideration for the shell injection mitigation.

The fixes are solid improvements that replace fragile regex/string manipulation with robust pathlib operations and proper string formatting. The test coverage is thorough and follows project standards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant