refactor: enhance timetable conversion and CLI functionality; improve…#27
refactor: enhance timetable conversion and CLI functionality; improve…#27brainer3220 wants to merge 1 commit intomasterfrom
Conversation
… error handling and logging
Reviewer's GuideThis PR modernizes the timetable conversion tools by refactoring core XML parsing and ICS generation with type annotations, pathlib and logging; it overhauls the CLI and Flask entrypoints for clearer argument handling and error responses; it tightens the Everytime API client with constants and HTTP error checks; and updates documentation with concrete usage examples. Sequence diagram for Flask API timetable conversion and S3 uploadsequenceDiagram
participant User as actor User
participant Flask as Flask App
participant Everytime as Everytime
participant Convert as Convert
participant S3 as S3
User->>Flask: HTTP GET /dwn_cal?start_date&end_date&schd_url
Flask->>Everytime: e.get_timetable()
Everytime-->>Flask: XML timetable
Flask->>Convert: c.get_calendar(subjects, start_date, end_date, identifier)
Convert-->>Flask: Path to .ics file
Flask->>S3: upload_to_s3(calendar_path, bucket_name, s3_path)
S3-->>Flask: (upload complete)
Flask-->>User: send_file(calendar_path)
Class diagram for refactored Convert and Everytime classesclassDiagram
class Convert {
- _xml_or_path: str
+ __init__(xml_or_path: str)
+ _get_root() ET.Element
+ get_subjects() List[Dict[str, Any]]
+ get_calendar(timetable: Iterable[Dict[str, Any]], start_date: str, end_date: str, identifier: str) Optional[str]
+ get_nearest_date(start_date: str, weekday: str|int) dt.date
}
class Everytime {
- path: str
+ __init__(path: str)
+ get_timetable(timeout: float = 10.0) str
}
Convert --> Everytime : uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
| upload_to_s3(calendar_path, bucket_name, f"ical/{os.path.basename(calendar_path)}") | ||
|
|
||
| return send_file(path, as_attachment=True) | ||
| return send_file(calendar_path, as_attachment=True, download_name=f"{identifier}.ics") |
Check failure
Code scanning / SonarCloud
I/O function calls should not be vulnerable to path injection attacks High
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
Blocking issues:
- The native Python
xmllibrary is vulnerable to XML External Entity (XXE) attacks. These attacks can leak confidential data and "XML bombs" can cause denial of service. Do not use this library to parse untrusted input. Instead the Python documentation recommends usingdefusedxml. (link)
General comments:
- Consider using a platform‐agnostic temp directory (e.g. tempfile.gettempdir()) instead of hardcoding “/tmp” for writing .ics files.
- Extracting the identifier by slicing schd_url[22:] is brittle; parse the URL path or use a regex to reliably pull out the timetable ID.
- The CLI help suggests you can pipe XML via stdin, but the code exits unless --source is provided—either implement stdin support or update the help text to reflect the requirement.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider using a platform‐agnostic temp directory (e.g. tempfile.gettempdir()) instead of hardcoding “/tmp” for writing .ics files.
- Extracting the identifier by slicing schd_url[22:] is brittle; parse the URL path or use a regex to reliably pull out the timetable ID.
- The CLI help suggests you can pipe XML via stdin, but the code exits unless --source is provided—either implement stdin support or update the help text to reflect the requirement.
## Individual Comments
### Comment 1
<location> `convert.py:57` </location>
<code_context>
+ logger.debug("Skipping subject missing name/professor")
+ continue
+
+ def _slot(x: ET.Element) -> Dict[str, str]:
+ start_raw = int(x.get("starttime", "0"))
+ end_raw = int(x.get("endtime", "0"))
+ # Each unit is 5 minutes; convert to HH:MM
+ start_at = "{:02d}:{:02d}".format(*divmod(start_raw * 5, 60))
+ end_at = "{:02d}:{:02d}".format(*divmod(end_raw * 5, 60))
+ return {
+ "day": x.get("day", "0"),
+ "place": x.get("place", ""),
</code_context>
<issue_to_address>
Default values are provided for missing XML attributes.
Defaulting missing attributes improves robustness, but may lead to events with invalid times or locations. Consider skipping entries with critical missing data instead.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
def _slot(x: ET.Element) -> Dict[str, str]:
start_raw = int(x.get("starttime", "0"))
end_raw = int(x.get("endtime", "0"))
# Each unit is 5 minutes; convert to HH:MM
start_at = "{:02d}:{:02d}".format(*divmod(start_raw * 5, 60))
end_at = "{:02d}:{:02d}".format(*divmod(end_raw * 5, 60))
return {
"day": x.get("day", "0"),
"place": x.get("place", ""),
"startAt": start_at,
"endAt": end_at,
}
time_node = subject.find("time")
times = [] if time_node is None else list(map(_slot, time_node.findall("data")))
=======
def _slot(x: ET.Element) -> Optional[Dict[str, str]]:
starttime = x.get("starttime")
endtime = x.get("endtime")
day = x.get("day")
place = x.get("place")
if starttime is None or endtime is None or day is None or place is None:
logger.debug("Skipping slot missing starttime/endtime/day/place")
return None
try:
start_raw = int(starttime)
end_raw = int(endtime)
except ValueError:
logger.debug("Skipping slot with invalid starttime/endtime")
return None
# Each unit is 5 minutes; convert to HH:MM
start_at = "{:02d}:{:02d}".format(*divmod(start_raw * 5, 60))
end_at = "{:02d}:{:02d}".format(*divmod(end_raw * 5, 60))
return {
"day": day,
"place": place,
"startAt": start_at,
"endAt": end_at,
}
time_node = subject.find("time")
times = []
if time_node is not None:
for slot in time_node.findall("data"):
slot_dict = _slot(slot)
if slot_dict is not None:
times.append(slot_dict)
>>>>>>> REPLACE
</suggested_fix>
## Security Issues
### Issue 1
<location> `convert.py:36` </location>
<issue_to_address>
**security (python.lang.security.use-defused-xml-parse):** The native Python `xml` library is vulnerable to XML External Entity (XXE) attacks. These attacks can leak confidential data and "XML bombs" can cause denial of service. Do not use this library to parse untrusted input. Instead the Python documentation recommends using `defusedxml`.
```suggestion
tree = defusedxml.etree.ElementTree.parse(candidate)
```
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def _slot(x: ET.Element) -> Dict[str, str]: | ||
| start_raw = int(x.get("starttime", "0")) | ||
| end_raw = int(x.get("endtime", "0")) | ||
| # Each unit is 5 minutes; convert to HH:MM | ||
| start_at = "{:02d}:{:02d}".format(*divmod(start_raw * 5, 60)) | ||
| end_at = "{:02d}:{:02d}".format(*divmod(end_raw * 5, 60)) | ||
| return { | ||
| "day": x.get("day", "0"), | ||
| "place": x.get("place", ""), | ||
| "startAt": start_at, | ||
| "endAt": end_at, | ||
| } | ||
|
|
||
| time_node = subject.find("time") | ||
| times = [] if time_node is None else list(map(_slot, time_node.findall("data"))) |
There was a problem hiding this comment.
suggestion: Default values are provided for missing XML attributes.
Defaulting missing attributes improves robustness, but may lead to events with invalid times or locations. Consider skipping entries with critical missing data instead.
| def _slot(x: ET.Element) -> Dict[str, str]: | |
| start_raw = int(x.get("starttime", "0")) | |
| end_raw = int(x.get("endtime", "0")) | |
| # Each unit is 5 minutes; convert to HH:MM | |
| start_at = "{:02d}:{:02d}".format(*divmod(start_raw * 5, 60)) | |
| end_at = "{:02d}:{:02d}".format(*divmod(end_raw * 5, 60)) | |
| return { | |
| "day": x.get("day", "0"), | |
| "place": x.get("place", ""), | |
| "startAt": start_at, | |
| "endAt": end_at, | |
| } | |
| time_node = subject.find("time") | |
| times = [] if time_node is None else list(map(_slot, time_node.findall("data"))) | |
| def _slot(x: ET.Element) -> Optional[Dict[str, str]]: | |
| starttime = x.get("starttime") | |
| endtime = x.get("endtime") | |
| day = x.get("day") | |
| place = x.get("place") | |
| if starttime is None or endtime is None or day is None or place is None: | |
| logger.debug("Skipping slot missing starttime/endtime/day/place") | |
| return None | |
| try: | |
| start_raw = int(starttime) | |
| end_raw = int(endtime) | |
| except ValueError: | |
| logger.debug("Skipping slot with invalid starttime/endtime") | |
| return None | |
| # Each unit is 5 minutes; convert to HH:MM | |
| start_at = "{:02d}:{:02d}".format(*divmod(start_raw * 5, 60)) | |
| end_at = "{:02d}:{:02d}".format(*divmod(end_raw * 5, 60)) | |
| return { | |
| "day": day, | |
| "place": place, | |
| "startAt": start_at, | |
| "endAt": end_at, | |
| } | |
| time_node = subject.find("time") | |
| times = [] | |
| if time_node is not None: | |
| for slot in time_node.findall("data"): | |
| slot_dict = _slot(slot) | |
| if slot_dict is not None: | |
| times.append(slot_dict) |
| if candidate.exists(): | ||
| logger.debug("Parsing timetable from file: %s", candidate) | ||
| tree = ET.parse(candidate) | ||
| return tree.getroot() |
There was a problem hiding this comment.
security (python.lang.security.use-defused-xml-parse): The native Python xml library is vulnerable to XML External Entity (XXE) attacks. These attacks can leak confidential data and "XML bombs" can cause denial of service. Do not use this library to parse untrusted input. Instead the Python documentation recommends using defusedxml.
| tree = ET.parse(candidate) | |
| tree = defusedxml.etree.ElementTree.parse(candidate) |
Source: opengrep
| if place: | ||
| event.add("location", place) |
There was a problem hiding this comment.
suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)
| place = time.get("place") | |
| if place: | |
| if place := time.get("place"): |
| if args.source: | ||
| src = args.source | ||
| e = everytime.Everytime(src) | ||
| xml = e.get_timetable() | ||
| else: | ||
| # Expect XML to be piped via stdin or provided as file path; keep simple | ||
| raise SystemExit("--source is required when using CLI") |
There was a problem hiding this comment.
issue (code-quality): We've found these issues:
- Swap if/else branches (
swap-if-else-branches) - Remove unnecessary else after guard condition (
remove-unnecessary-else)
| bucket_name = _get_env('BUCKET_NAME') | ||
| if bucket_name: |
There was a problem hiding this comment.
suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)
| bucket_name = _get_env('BUCKET_NAME') | |
| if bucket_name: | |
| if bucket_name := _get_env('BUCKET_NAME'): |




… error handling and logging
Summary by Sourcery
Refactor timetable conversion pipeline, improve CLI, web server, and Everytime client with enhanced type safety, logging, and error handling.
New Features:
--sourceoption to specify file paths or Everytime URLs and emit meaningful exit codesEnhancements: