Skip to content

Conversation

@itskavin
Copy link
Member

This pull request introduces major improvements to CI/CD automation, dependency management, environment configuration, and documentation for the Thinkific Downloader project. The changes add a comprehensive CI pipeline with multi-platform testing, code quality checks, Docker image verification, security scanning, and documentation validation. Automated dependency security checks are now scheduled weekly. The .env.example file is reorganized for clarity and feature grouping. Documentation and development guides are updated to reflect the new atomic resume/backup system and the migration from file_validator.py to resume_tracker.py.

Continuous Integration & Automation

  • Added a full-featured CI/CD pipeline in .github/workflows/ci.yml covering code formatting, linting, multi-platform testing, Docker build verification, security scans, documentation checks, and summary reporting.
  • Introduced a scheduled workflow in .github/workflows/dependencies.yml for automated dependency security checks, update suggestions, and informational pull requests.

Environment Configuration

  • Reorganized .env.example for clarity, grouping settings by authentication, basic, enhanced, advanced, and optional features; added new options for validation, resume, rate limiting, debug mode, and improved documentation.

Documentation & Developer Experience

  • Updated README.md and DEVELOPMENT.md to highlight the new atomic resume/backup system, clarify quick start instructions, and reflect the migration from file_validator.py to resume_tracker.py in both architecture and test suite. [1] [2] [3] [4] [5] [6]
  • Added example test for atomic save and backup in resume_tracker.py to DEVELOPMENT.md.

These changes greatly improve project reliability, maintainability, and developer onboarding.

@gemini-code-assist
Copy link

Summary of Changes

Hello @itskavin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the Thinkific Downloader project by introducing parallel download capabilities with a modern, rich terminal user interface. It refactors the core download logic for improved reliability and performance, alongside implementing an atomic resume and backup system to prevent data loss during interruptions. Additionally, the project's development lifecycle is bolstered with new CI/CD pipelines for automated quality assurance and security, and all user-facing documentation has been thoroughly updated to guide users through the enhanced features and setup processes.

Highlights

  • Parallel Downloads: Implemented a new DownloadManager to support concurrent file downloads, significantly improving download efficiency and overall performance.
  • Rich UI Integration: Integrated the rich library for enhanced terminal output, providing dynamic progress bars, informative banners, and clearer status updates during downloads.
  • Atomic Resume & Backup System: Introduced a robust, cross-platform atomic resume and backup system for downloads, ensuring data integrity and seamless continuation of interrupted downloads.
  • Refactored Download Logic: Centralized download handling into a new download_manager.py and refactored downloader.py to queue all tasks before parallel execution, leading to a more modular and maintainable codebase.
  • Improved CI/CD & Security: Added comprehensive CI/CD pipelines for multi-platform testing, code quality checks, Docker image verification, security scanning, and documentation validation, along with scheduled dependency security checks.
  • Enhanced Configuration: Reorganized and expanded .env.example for clearer environment variable management, including new options for rate limiting, validation, resume behavior, and debug mode.
  • Documentation Updates: Updated README.md, DEVELOPMENT.md, and SETUP.md to reflect new features, installation methods, and the migration from file_validator.py to resume_tracker.py.
Ignored Files
  • Ignored by pattern: .github/workflows/** (3)
    • .github/workflows/ci.yml
    • .github/workflows/dependencies.yml
    • .github/workflows/release.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a major refactoring to support parallel downloads, a more robust resume/backup system, and an improved user interface using rich. The changes are extensive and significantly improve the project's architecture by introducing a DownloadManager and a two-phase process of collecting tasks then executing them. My review focuses on addressing significant maintainability issues in downloader.py, such as the heavy reliance on global variables and overly complex functions, by suggesting a move to a class-based structure. I've also included feedback on standardizing HTTP libraries, improving dependency management, and correcting documentation to ensure consistency and clarity.

Comment on lines 16 to +25
# Globals to mirror PHP behavior
ROOT_PROJECT_DIR = Path.cwd()
COURSE_CONTENTS: List[Dict[str, Any]] = []
SETTINGS: Optional[Settings] = None
BASE_HOST: Optional[str] = None
DOWNLOAD_MANAGER: Optional[DownloadManager] = None
DOWNLOAD_TASKS: List[Dict[str, Any]] = [] # Collect all download tasks for parallel execution
CONTENT_PROCESSOR: Optional[ContentProcessor] = None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The heavy use of global variables (SETTINGS, DOWNLOAD_MANAGER, DOWNLOAD_TASKS, etc.) makes the application state difficult to manage, test, and reason about. It creates tight coupling between functions and can lead to unpredictable behavior.

This should be refactored to an object-oriented approach. Encapsulating the state and logic within a class (e.g., CourseDownloader) would make the code more modular, maintainable, and robust.

Comment on lines +378 to +651
def collect_chapter_tasks(content_ids: Iterable[Any], chapter_path: Path):
"""Collect download tasks for a specific chapter."""
from .wistia_downloader import video_downloader_wistia, video_downloader_videoproxy
global COURSE_CONTENTS, SETTINGS, DOWNLOAD_TASKS

index = 1
for content_id in content_ids:
match = next((c for c in COURSE_CONTENTS if c['id'] == content_id), None)
if not match:
print(f" ⚠️ No content found for id {content_id}")
index += 1
continue

ctype = match.get('contentable_type') or match.get('default_lesson_type_label')
print(f" 🔍 Found {ctype}: {match.get('name')}")

# HTML Item (Notes) - Collect download tasks
if ctype == 'HtmlItem':
fname = filter_filename(f"{match['slug']}.html")
dc = chapter_path / filter_filename(f"{index}. {match['name']} Text")
dc.mkdir(exist_ok=True)

if not (dc / fname).exists():
j = api_get(f"/api/course_player/v2/html_items/{match['contentable']}")
if j:
html_text = j.get('html_item', {}).get('html_text', '')
decoded = unicode_decode(html_text)

# Collect MP3 audio files
mp3_matches = MP3_PATTERN.findall(decoded)
if mp3_matches:
for audio_url in set(mp3_matches):
audio_name = filter_filename(Path(urlparse(audio_url).path).name)
add_download_task(audio_url, dc / audio_name, "audio")

# Save HTML content to file
fname = fname.replace(" ", "-")
(dc / fname).write_text(decoded, encoding='utf-8', errors='replace')

# Collect video download tasks
videoproxy_matches = VIDEOPROXY_PATTERN.findall(decoded)
if videoproxy_matches:
for video_url in set(videoproxy_matches):
collect_video_task_videoproxy(video_url, filter_filename(match['name']), dc)

wistia_matches = WISTIA_PATTERN.findall(decoded)
if wistia_matches:
for wistia_id in set(wistia_matches):
collect_video_task_wistia(wistia_id, filter_filename(match['name']), dc)

index += 1
continue

# Multimedia (iframe) - Collect download tasks
if match.get('default_lesson_type_label') == 'Multimedia':
dc = chapter_path / filter_filename(f"{index}. {match['name']} Multimedia")
dc.mkdir(exist_ok=True)

j = api_get(f"/api/course_player/v2/iframes/{match['contentable']}")
file_contents = ''
if j:
src_url = unicode_decode(j.get('iframe', {}).get('source_url') or '')
if re.search(r"(\.md|\.html|/)$", src_url):
try:
file_contents = http_get(src_url)
except Exception:
file_contents = src_url
else:
file_contents = src_url

# Collect attached files
if j.get('download_files'):
for download_file in j['download_files']:
download_file_name = filter_filename(download_file.get('label') or 'file')
download_file_url = download_file.get('download_url')
if download_file_url:
add_download_task(download_file_url, dc / download_file_name, "file")

# Save HTML file
fname = f"{match['name']}.html"
fname = re.sub(r"[^A-Za-z0-9\_\-\. \?]", '', fname)
fname = filter_filename(fname)
(dc / fname).write_text(file_contents, encoding='utf-8', errors='replace')

index += 1
continue

# Lesson (videos + html + attachments) - Collect download tasks
if ctype == 'Lesson':
dc = chapter_path / filter_filename(f"{index}. {match['name']} Lesson")
dc.mkdir(exist_ok=True)
vname = filter_filename(match['name'])

j = api_get(f"/api/course_player/v2/lessons/{match['contentable']}")
if j:
# Collect video download tasks
videos = j.get('videos') or []
if videos:
for video in videos:
storage = video.get('storage_location')
identifier = video.get('identifier')
if storage == 'wistia' and identifier:
collect_video_task_wistia(identifier, vname, dc)
elif storage == 'videoproxy' and identifier:
collect_video_task_videoproxy(f"https://platform.thinkific.com/videoproxy/v1/play/{identifier}", vname, dc)
else:
direct = video.get('url')
if direct:
add_download_task(direct, dc / f"{vname}.mp4", "video")

# Save lesson HTML content
lesson_info = j.get('lesson', {})
html_text = lesson_info.get('html_text') if isinstance(lesson_info, dict) else None
if html_text and html_text.strip():
html_filename = f"{vname}.html"
(dc / html_filename).write_text(html_text, encoding='utf-8', errors='replace')

# Collect attached files
for dlf in j.get('download_files', []) or []:
download_file_name = filter_filename(dlf.get('label') or 'file')
download_file_url = dlf.get('download_url')
if download_file_url:
add_download_task(download_file_url, dc / download_file_name, "file")

index += 1
continue

# PDF - Collect download tasks
if ctype == 'Pdf':
dc = chapter_path / filter_filename(f"{index}. {match['name']}")
dc.mkdir(exist_ok=True)

j = api_get(f"/api/course_player/v2/pdfs/{match['contentable']}")
if j:
pdf = j.get('pdf', {})
pdf_url = pdf.get('url')
if pdf_url:
fname = filter_filename(Path(urlparse(pdf_url).path).name)
add_download_task(pdf_url, dc / fname, "pdf")

index += 1
continue

# Download (shared files) - Collect download tasks
if ctype == 'Download':
dc = chapter_path / filter_filename(f"{index}. {match['name']}")
dc.mkdir(exist_ok=True)

j = api_get(f"/api/course_player/v2/downloads/{match['contentable']}")
if j:
for dlf in j.get('download_files', []) or []:
label = filter_filename(dlf.get('label') or 'file')
url = dlf.get('download_url')
if url:
add_download_task(url, dc / label, "file")

index += 1
continue

# Audio - Collect download tasks
if ctype == 'Audio':
dc = chapter_path / filter_filename(f"{index}. {match['name']}")
dc.mkdir(exist_ok=True)

j = api_get(f"/api/course_player/v2/audio/{match['contentable']}")
if j:
audio = j.get('audio', {})
audio_url = audio.get('url')
if audio_url:
fname = filter_filename(Path(urlparse(audio_url).path).name)
add_download_task(audio_url, dc / fname, "audio")

index += 1
continue

# Presentation - Collect download tasks
if ctype == 'Presentation':
dc = chapter_path / filter_filename(f"{index}. {match['name']}")
dc.mkdir(exist_ok=True)

j = api_get(f"/api/course_player/v2/presentations/{match['contentable']}")
if j:
pres = j.get('presentation', {})
pdf_url = pres.get('source_file_url')
pdf_name = filter_filename(pres.get('source_file_name') or 'slides.pdf')
if pdf_url:
add_download_task(pdf_url, dc / pdf_name, "presentation")

# Handle presentation merging - collect slide assets
merge_flag = SETTINGS.ffmpeg_presentation_merge if SETTINGS else False
if merge_flag:
from shutil import which
if which('ffmpeg'):
items = j.get('presentation_items') or []
for it in items:
pos = it.get('position')
img_url = it.get('image_file_url')
aud_url = it.get('audio_file_url')
if img_url:
img_url = 'https:' + img_url if img_url.startswith('//') else img_url
img_name = filter_filename(f"{pos}{it.get('image_file_name','slide.png')}")
add_download_task(img_url, dc / img_name, "image")
if aud_url:
aud_url = 'https:' + aud_url if aud_url.startswith('//') else aud_url
aud_name = filter_filename(f"{pos}{it.get('audio_file_name','audio.m4a')}")
add_download_task(aud_url, dc / aud_name, "audio")

index += 1
continue

# Quiz - Handle separately (complex logic)
if ctype == 'Quiz':
dc = chapter_path / filter_filename(f"{index}. {match['name']} Quiz")
dc.mkdir(exist_ok=True)

fname = filter_filename(f"{match['name']} Answers.html")
qname = filter_filename(f"{match['name']} Questions.html")

result = api_get(f"/api/course_player/v2/quizzes/{match['contentable']}")
if result:
file_contents_with_answers = "<h3 style='color: red;'>Answers of this Quiz are marked in RED </h3>"
file_contents_with_questions = ""

for qs in result.get("questions", []):
choice = 'A'
position = qs.get("position", 0) + 1
prompt = unicode_decode(qs.get("prompt", ""))
explanation = unicode_decode(qs.get("text_explanation", ""))

file_contents_with_answers += f"{position}) <strong>{prompt}</strong> Explanation: {explanation}<br><br>"

# Collect embedded video tasks
wistia_matches = WISTIA_PATTERN.findall(prompt)
if wistia_matches:
for wistia_match in set(wistia_matches):
collect_video_task_wistia(wistia_match, f"QA Video {position}", dc)

file_contents_with_questions += f"{position}) <strong>{prompt}</strong><br><br>"

for ch in result.get("choices", []):
if ch.get("question_id") == qs.get("id"):
try:
import base64
ans = base64.b64decode(ch.get("credited", "")).decode('utf-8', 'ignore')
ans = re.sub(r'\d', '', ans)
except Exception:
ans = ""

choice_text = unicode_decode(ch.get("text", ""))
if ans == "true":
file_contents_with_questions += f"{choice}) {choice_text}<br>"
file_contents_with_answers += f"<em style='color: red;'>{choice}) {choice_text}</em><br>"
else:
file_contents_with_questions += f"{choice}) {choice_text}<br>"
file_contents_with_answers += f"{choice}) {choice_text}<br>"

choice = chr(ord(choice) + 1)

file_contents_with_questions += "<br>"
file_contents_with_answers += "<br>"

(dc / qname).write_text(file_contents_with_questions, encoding='utf-8', errors='replace')
(dc / fname).write_text(file_contents_with_answers, encoding='utf-8', errors='replace')

index += 1
continue

# Assignment/Survey placeholders
if ctype in ['Assignment', 'Survey']:
print(f" ⚠️ {ctype} content type not yet implemented: {match['name']}")
index += 1
continue

index += 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The function collect_chapter_tasks is excessively long and complex, handling the logic for every content type. This violates the Single Responsibility Principle and makes the code very difficult to read, maintain, and test.

This function should be broken down into smaller, more focused functions, each responsible for a single content type (e.g., _collect_lesson_tasks, _collect_pdf_tasks, etc.). A dispatcher can then call the appropriate function based on the content type, which will greatly improve readability and maintainability.

Comment on lines +345 to +355
def test_resume_tracker_atomic_save(self):
from thinkific_downloader.resume_tracker import ResumeTracker
import tempfile
with tempfile.TemporaryDirectory() as tmpdir:
status_file = Path(tmpdir) / ".download_status.json"
tracker = ResumeTracker(str(status_file))
tracker.status_data["test"] = {"status": "completed"}
tracker._save_status()
assert status_file.exists()
backup_file = status_file.with_suffix('.json.bak')
assert backup_file.exists()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example test test_resume_tracker_atomic_save is placed within the TestProgressManager class. For better test organization and clarity, tests related to ResumeTracker should be in their own test class, such as TestResumeTracker, and ideally in a separate test_resume_tracker.py file.

Comment on lines 186 to 187
```
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a Markdown syntax error here with a duplicate ``` which will cause rendering issues.

Suggested change
```
```
FFMPEG_PRESENTATION_MERGE=false # Enable FFmpeg presentation merging

Comment on lines 30 to 35
install_requires=[
"requests>=2.31.0",
"rich>=13.0.0",
"tqdm>=4.65.0",
"urllib3>=2.0.0",
],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be an inconsistency between the dependencies listed in setup.py and requirements.txt. For example, requests has different version specifiers in each file. To ensure consistency and a single source of truth, it's best to define abstract dependencies in install_requires here. requirements.txt can then be used for development dependencies or for pinning exact versions for deployment. A common practice for development is to have requirements.txt simply contain -e . to install the package in editable mode along with its dependencies from setup.py.

Comment on lines +64 to +66
from rich.console import Console
from .config import Settings
from .file_utils import filter_filename

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to PEP 8, all imports should be at the top of the file. These imports are currently in the middle of the module and should be moved to the top with the other imports.

@itskavin itskavin merged commit 409ce19 into main Sep 24, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants