Skip to content

refactor: enhance timetable conversion and CLI functionality; improve…#27

Open
brainer3220 wants to merge 1 commit intomasterfrom
refactor
Open

refactor: enhance timetable conversion and CLI functionality; improve…#27
brainer3220 wants to merge 1 commit intomasterfrom
refactor

Conversation

@brainer3220
Copy link
Owner

@brainer3220 brainer3220 commented Aug 21, 2025

… error handling and logging

Summary by Sourcery

Refactor timetable conversion pipeline, improve CLI, web server, and Everytime client with enhanced type safety, logging, and error handling.

New Features:

  • Support XML string or file path inputs in Convert for timetable parsing
  • Add CLI --source option to specify file paths or Everytime URLs and emit meaningful exit codes
  • Enable Flask endpoint to upload generated ICS files to S3 using dynamic environment credentials

Enhancements:

  • Rewrite Convert class with type hints, structured logging, and unified XML parsing
  • Streamline ICS generation to return file paths, skip empty timetables, and log event counts
  • Extract API URL and default headers in Everytime client with request timeouts and HTTP error handling
  • Improve Flask app with environment variable loader, normalized date handling, exception logging, and sitemap fallback
  • Update README with new CLI usage and Flask server instructions

@sourcery-ai
Copy link

sourcery-ai bot commented Aug 21, 2025

Reviewer's Guide

This PR modernizes the timetable conversion tools by refactoring core XML parsing and ICS generation with type annotations, pathlib and logging; it overhauls the CLI and Flask entrypoints for clearer argument handling and error responses; it tightens the Everytime API client with constants and HTTP error checks; and updates documentation with concrete usage examples.

Sequence diagram for Flask API timetable conversion and S3 upload

sequenceDiagram
    participant User as actor User
    participant Flask as Flask App
    participant Everytime as Everytime
    participant Convert as Convert
    participant S3 as S3
    User->>Flask: HTTP GET /dwn_cal?start_date&end_date&schd_url
    Flask->>Everytime: e.get_timetable()
    Everytime-->>Flask: XML timetable
    Flask->>Convert: c.get_calendar(subjects, start_date, end_date, identifier)
    Convert-->>Flask: Path to .ics file
    Flask->>S3: upload_to_s3(calendar_path, bucket_name, s3_path)
    S3-->>Flask: (upload complete)
    Flask-->>User: send_file(calendar_path)
Loading

Class diagram for refactored Convert and Everytime classes

classDiagram
    class Convert {
        - _xml_or_path: str
        + __init__(xml_or_path: str)
        + _get_root() ET.Element
        + get_subjects() List[Dict[str, Any]]
        + get_calendar(timetable: Iterable[Dict[str, Any]], start_date: str, end_date: str, identifier: str) Optional[str]
        + get_nearest_date(start_date: str, weekday: str|int) dt.date
    }
    class Everytime {
        - path: str
        + __init__(path: str)
        + get_timetable(timeout: float = 10.0) str
    }
    Convert --> Everytime : uses
Loading

File-Level Changes

Change Details Files
Refactor Convert class for robust parsing and ICS generation
  • Introduced _get_root to unify file vs. string parsing via pathlib and debug logs
  • Added type hints, logging for missing data, and consolidated subject/time extraction
  • Reworked get_calendar to track event_count, write .ics via pathlib, and return path or None
convert.py
Overhaul CLI entrypoint for non-interactive usage
  • Replaced interactive prompts with a mandatory --source flag
  • Enhanced argparse setup with exit codes and clear output messages
  • Extract identifier from source for file naming
every2cal.py
Enhance Flask app environment handling and error flows
  • Built _get_env helper to warn on missing env vars
  • Normalized date inputs, improved logging and error responses
  • Conditionally upload to S3 and added optional sitemap handling
index.py
Refactor Everytime API client
  • Extracted API_URL and DEFAULT_HEADERS constants
  • Added type annotations and timeout parameter
  • Enforced HTTP error checking via resp.raise_for_status()
everytime.py
Update documentation with concrete usage examples
  • Added CLI example with --source, --begin, and --end flags
  • Included instructions to run Flask server
  • Clarified date formats in README
README.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot
E Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

upload_to_s3(calendar_path, bucket_name, f"ical/{os.path.basename(calendar_path)}")

return send_file(path, as_attachment=True)
return send_file(calendar_path, as_attachment=True, download_name=f"{identifier}.ics")

Check failure

Code scanning / SonarCloud

I/O function calls should not be vulnerable to path injection attacks High

Change this code to not construct the path from user-controlled data. See more on SonarQube Cloud
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

Blocking issues:

  • The native Python xml library is vulnerable to XML External Entity (XXE) attacks. These attacks can leak confidential data and "XML bombs" can cause denial of service. Do not use this library to parse untrusted input. Instead the Python documentation recommends using defusedxml. (link)

General comments:

  • Consider using a platform‐agnostic temp directory (e.g. tempfile.gettempdir()) instead of hardcoding “/tmp” for writing .ics files.
  • Extracting the identifier by slicing schd_url[22:] is brittle; parse the URL path or use a regex to reliably pull out the timetable ID.
  • The CLI help suggests you can pipe XML via stdin, but the code exits unless --source is provided—either implement stdin support or update the help text to reflect the requirement.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider using a platform‐agnostic temp directory (e.g. tempfile.gettempdir()) instead of hardcoding “/tmp” for writing .ics files.
- Extracting the identifier by slicing schd_url[22:] is brittle; parse the URL path or use a regex to reliably pull out the timetable ID.
- The CLI help suggests you can pipe XML via stdin, but the code exits unless --source is provided—either implement stdin support or update the help text to reflect the requirement.

## Individual Comments

### Comment 1
<location> `convert.py:57` </location>
<code_context>
+                logger.debug("Skipping subject missing name/professor")
+                continue
+
+            def _slot(x: ET.Element) -> Dict[str, str]:
+                start_raw = int(x.get("starttime", "0"))
+                end_raw = int(x.get("endtime", "0"))
+                # Each unit is 5 minutes; convert to HH:MM
+                start_at = "{:02d}:{:02d}".format(*divmod(start_raw * 5, 60))
+                end_at = "{:02d}:{:02d}".format(*divmod(end_raw * 5, 60))
+                return {
+                    "day": x.get("day", "0"),
+                    "place": x.get("place", ""),
</code_context>

<issue_to_address>
Default values are provided for missing XML attributes.

Defaulting missing attributes improves robustness, but may lead to events with invalid times or locations. Consider skipping entries with critical missing data instead.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
            def _slot(x: ET.Element) -> Dict[str, str]:
                start_raw = int(x.get("starttime", "0"))
                end_raw = int(x.get("endtime", "0"))
                # Each unit is 5 minutes; convert to HH:MM
                start_at = "{:02d}:{:02d}".format(*divmod(start_raw * 5, 60))
                end_at = "{:02d}:{:02d}".format(*divmod(end_raw * 5, 60))
                return {
                    "day": x.get("day", "0"),
                    "place": x.get("place", ""),
                    "startAt": start_at,
                    "endAt": end_at,
                }

            time_node = subject.find("time")
            times = [] if time_node is None else list(map(_slot, time_node.findall("data")))
=======
            def _slot(x: ET.Element) -> Optional[Dict[str, str]]:
                starttime = x.get("starttime")
                endtime = x.get("endtime")
                day = x.get("day")
                place = x.get("place")
                if starttime is None or endtime is None or day is None or place is None:
                    logger.debug("Skipping slot missing starttime/endtime/day/place")
                    return None
                try:
                    start_raw = int(starttime)
                    end_raw = int(endtime)
                except ValueError:
                    logger.debug("Skipping slot with invalid starttime/endtime")
                    return None
                # Each unit is 5 minutes; convert to HH:MM
                start_at = "{:02d}:{:02d}".format(*divmod(start_raw * 5, 60))
                end_at = "{:02d}:{:02d}".format(*divmod(end_raw * 5, 60))
                return {
                    "day": day,
                    "place": place,
                    "startAt": start_at,
                    "endAt": end_at,
                }

            time_node = subject.find("time")
            times = []
            if time_node is not None:
                for slot in time_node.findall("data"):
                    slot_dict = _slot(slot)
                    if slot_dict is not None:
                        times.append(slot_dict)
>>>>>>> REPLACE

</suggested_fix>

## Security Issues

### Issue 1
<location> `convert.py:36` </location>

<issue_to_address>
**security (python.lang.security.use-defused-xml-parse):** The native Python `xml` library is vulnerable to XML External Entity (XXE) attacks.  These attacks can leak confidential data and "XML bombs" can cause denial of service. Do not use this library to parse untrusted input. Instead  the Python documentation recommends using `defusedxml`.

```suggestion
            tree = defusedxml.etree.ElementTree.parse(candidate)
```

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +57 to +71
def _slot(x: ET.Element) -> Dict[str, str]:
start_raw = int(x.get("starttime", "0"))
end_raw = int(x.get("endtime", "0"))
# Each unit is 5 minutes; convert to HH:MM
start_at = "{:02d}:{:02d}".format(*divmod(start_raw * 5, 60))
end_at = "{:02d}:{:02d}".format(*divmod(end_raw * 5, 60))
return {
"day": x.get("day", "0"),
"place": x.get("place", ""),
"startAt": start_at,
"endAt": end_at,
}

time_node = subject.find("time")
times = [] if time_node is None else list(map(_slot, time_node.findall("data")))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Default values are provided for missing XML attributes.

Defaulting missing attributes improves robustness, but may lead to events with invalid times or locations. Consider skipping entries with critical missing data instead.

Suggested change
def _slot(x: ET.Element) -> Dict[str, str]:
start_raw = int(x.get("starttime", "0"))
end_raw = int(x.get("endtime", "0"))
# Each unit is 5 minutes; convert to HH:MM
start_at = "{:02d}:{:02d}".format(*divmod(start_raw * 5, 60))
end_at = "{:02d}:{:02d}".format(*divmod(end_raw * 5, 60))
return {
"day": x.get("day", "0"),
"place": x.get("place", ""),
"startAt": start_at,
"endAt": end_at,
}
time_node = subject.find("time")
times = [] if time_node is None else list(map(_slot, time_node.findall("data")))
def _slot(x: ET.Element) -> Optional[Dict[str, str]]:
starttime = x.get("starttime")
endtime = x.get("endtime")
day = x.get("day")
place = x.get("place")
if starttime is None or endtime is None or day is None or place is None:
logger.debug("Skipping slot missing starttime/endtime/day/place")
return None
try:
start_raw = int(starttime)
end_raw = int(endtime)
except ValueError:
logger.debug("Skipping slot with invalid starttime/endtime")
return None
# Each unit is 5 minutes; convert to HH:MM
start_at = "{:02d}:{:02d}".format(*divmod(start_raw * 5, 60))
end_at = "{:02d}:{:02d}".format(*divmod(end_raw * 5, 60))
return {
"day": day,
"place": place,
"startAt": start_at,
"endAt": end_at,
}
time_node = subject.find("time")
times = []
if time_node is not None:
for slot in time_node.findall("data"):
slot_dict = _slot(slot)
if slot_dict is not None:
times.append(slot_dict)

if candidate.exists():
logger.debug("Parsing timetable from file: %s", candidate)
tree = ET.parse(candidate)
return tree.getroot()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security (python.lang.security.use-defused-xml-parse): The native Python xml library is vulnerable to XML External Entity (XXE) attacks. These attacks can leak confidential data and "XML bombs" can cause denial of service. Do not use this library to parse untrusted input. Instead the Python documentation recommends using defusedxml.

Suggested change
tree = ET.parse(candidate)
tree = defusedxml.etree.ElementTree.parse(candidate)

Source: opengrep

Comment on lines +106 to +107
if place:
event.add("location", place)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change
place = time.get("place")
if place:
if place := time.get("place"):

Comment on lines +19 to +25
if args.source:
src = args.source
e = everytime.Everytime(src)
xml = e.get_timetable()
else:
# Expect XML to be piped via stdin or provided as file path; keep simple
raise SystemExit("--source is required when using CLI")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:

Comment on lines +67 to +68
bucket_name = _get_env('BUCKET_NAME')
if bucket_name:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change
bucket_name = _get_env('BUCKET_NAME')
if bucket_name:
if bucket_name := _get_env('BUCKET_NAME'):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant