Multi dwl #1

itskavin · 2025-09-24T08:53:48Z

This pull request introduces major improvements to CI/CD automation, dependency management, environment configuration, and documentation for the Thinkific Downloader project. The changes add a comprehensive CI pipeline with multi-platform testing, code quality checks, Docker image verification, security scanning, and documentation validation. Automated dependency security checks are now scheduled weekly. The .env.example file is reorganized for clarity and feature grouping. Documentation and development guides are updated to reflect the new atomic resume/backup system and the migration from file_validator.py to resume_tracker.py.

Continuous Integration & Automation

Added a full-featured CI/CD pipeline in .github/workflows/ci.yml covering code formatting, linting, multi-platform testing, Docker build verification, security scans, documentation checks, and summary reporting.
Introduced a scheduled workflow in .github/workflows/dependencies.yml for automated dependency security checks, update suggestions, and informational pull requests.

Environment Configuration

Reorganized .env.example for clarity, grouping settings by authentication, basic, enhanced, advanced, and optional features; added new options for validation, resume, rate limiting, debug mode, and improved documentation.

Documentation & Developer Experience

Updated README.md and DEVELOPMENT.md to highlight the new atomic resume/backup system, clarify quick start instructions, and reflect the migration from file_validator.py to resume_tracker.py in both architecture and test suite. [1] [2] [3] [4] [5] [6]
Added example test for atomic save and backup in resume_tracker.py to DEVELOPMENT.md.

These changes greatly improve project reliability, maintainability, and developer onboarding.

gemini-code-assist · 2025-09-24T08:56:03Z

Summary of Changes

Hello @itskavin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the Thinkific Downloader project by introducing parallel download capabilities with a modern, rich terminal user interface. It refactors the core download logic for improved reliability and performance, alongside implementing an atomic resume and backup system to prevent data loss during interruptions. Additionally, the project's development lifecycle is bolstered with new CI/CD pipelines for automated quality assurance and security, and all user-facing documentation has been thoroughly updated to guide users through the enhanced features and setup processes.

Highlights

Parallel Downloads: Implemented a new DownloadManager to support concurrent file downloads, significantly improving download efficiency and overall performance.
Rich UI Integration: Integrated the rich library for enhanced terminal output, providing dynamic progress bars, informative banners, and clearer status updates during downloads.
Atomic Resume & Backup System: Introduced a robust, cross-platform atomic resume and backup system for downloads, ensuring data integrity and seamless continuation of interrupted downloads.
Refactored Download Logic: Centralized download handling into a new download_manager.py and refactored downloader.py to queue all tasks before parallel execution, leading to a more modular and maintainable codebase.
Improved CI/CD & Security: Added comprehensive CI/CD pipelines for multi-platform testing, code quality checks, Docker image verification, security scanning, and documentation validation, along with scheduled dependency security checks.
Enhanced Configuration: Reorganized and expanded .env.example for clearer environment variable management, including new options for rate limiting, validation, resume behavior, and debug mode.
Documentation Updates: Updated README.md, DEVELOPMENT.md, and SETUP.md to reflect new features, installation methods, and the migration from file_validator.py to resume_tracker.py.

Ignored Files

Ignored by pattern: .github/workflows/** (3)
- .github/workflows/ci.yml
- .github/workflows/dependencies.yml
- .github/workflows/release.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a major refactoring to support parallel downloads, a more robust resume/backup system, and an improved user interface using rich. The changes are extensive and significantly improve the project's architecture by introducing a DownloadManager and a two-phase process of collecting tasks then executing them. My review focuses on addressing significant maintainability issues in downloader.py, such as the heavy reliance on global variables and overly complex functions, by suggesting a move to a class-based structure. I've also included feedback on standardizing HTTP libraries, improving dependency management, and correcting documentation to ensure consistency and clarity.

gemini-code-assist · 2025-09-24T08:57:42Z

thinkific_downloader/downloader.py

 # Globals to mirror PHP behavior
 ROOT_PROJECT_DIR = Path.cwd()
 COURSE_CONTENTS: List[Dict[str, Any]] = []
 SETTINGS: Optional[Settings] = None
 BASE_HOST: Optional[str] = None
+DOWNLOAD_MANAGER: Optional[DownloadManager] = None
+DOWNLOAD_TASKS: List[Dict[str, Any]] = []  # Collect all download tasks for parallel execution
+CONTENT_PROCESSOR: Optional[ContentProcessor] = None


The heavy use of global variables (SETTINGS, DOWNLOAD_MANAGER, DOWNLOAD_TASKS, etc.) makes the application state difficult to manage, test, and reason about. It creates tight coupling between functions and can lead to unpredictable behavior.

This should be refactored to an object-oriented approach. Encapsulating the state and logic within a class (e.g., CourseDownloader) would make the code more modular, maintainable, and robust.

gemini-code-assist · 2025-09-24T08:57:42Z

thinkific_downloader/downloader.py

+def collect_chapter_tasks(content_ids: Iterable[Any], chapter_path: Path):
+    """Collect download tasks for a specific chapter."""
+    from .wistia_downloader import video_downloader_wistia, video_downloader_videoproxy
+    global COURSE_CONTENTS, SETTINGS, DOWNLOAD_TASKS
+
+    index = 1
+    for content_id in content_ids:
+        match = next((c for c in COURSE_CONTENTS if c['id'] == content_id), None)
+        if not match:
+            print(f"   ⚠️  No content found for id {content_id}")
+            index += 1
+            continue
+
+        ctype = match.get('contentable_type') or match.get('default_lesson_type_label')
+        print(f"   🔍 Found {ctype}: {match.get('name')}")
+
+        # HTML Item (Notes) - Collect download tasks
+        if ctype == 'HtmlItem':
+            fname = filter_filename(f"{match['slug']}.html")
+            dc = chapter_path / filter_filename(f"{index}. {match['name']} Text")
+            dc.mkdir(exist_ok=True)
+
+            if not (dc / fname).exists():
+                j = api_get(f"/api/course_player/v2/html_items/{match['contentable']}")
+                if j:
+                    html_text = j.get('html_item', {}).get('html_text', '')
+                    decoded = unicode_decode(html_text)
+
+                    # Collect MP3 audio files
+                    mp3_matches = MP3_PATTERN.findall(decoded)
+                    if mp3_matches:
+                        for audio_url in set(mp3_matches):
+                            audio_name = filter_filename(Path(urlparse(audio_url).path).name)
+                            add_download_task(audio_url, dc / audio_name, "audio")
+
+                    # Save HTML content to file  
+                    fname = fname.replace(" ", "-")
+                    (dc / fname).write_text(decoded, encoding='utf-8', errors='replace')
+
+                    # Collect video download tasks
+                    videoproxy_matches = VIDEOPROXY_PATTERN.findall(decoded)
+                    if videoproxy_matches:
+                        for video_url in set(videoproxy_matches):
+                            collect_video_task_videoproxy(video_url, filter_filename(match['name']), dc)
+
+                    wistia_matches = WISTIA_PATTERN.findall(decoded)
+                    if wistia_matches:
+                        for wistia_id in set(wistia_matches):
+                            collect_video_task_wistia(wistia_id, filter_filename(match['name']), dc)
+
+            index += 1
+            continue
+
+        # Multimedia (iframe) - Collect download tasks
+        if match.get('default_lesson_type_label') == 'Multimedia':
+            dc = chapter_path / filter_filename(f"{index}. {match['name']} Multimedia")
+            dc.mkdir(exist_ok=True)
+
+            j = api_get(f"/api/course_player/v2/iframes/{match['contentable']}")
+            file_contents = ''
+            if j:
+                src_url = unicode_decode(j.get('iframe', {}).get('source_url') or '')
+                if re.search(r"(\.md|\.html|/)$", src_url):
+                    try:
+                        file_contents = http_get(src_url)
+                    except Exception:
+                        file_contents = src_url
+                else:
+                    file_contents = src_url
+
+                # Collect attached files
+                if j.get('download_files'):
+                    for download_file in j['download_files']:
+                        download_file_name = filter_filename(download_file.get('label') or 'file')
+                        download_file_url = download_file.get('download_url')
+                        if download_file_url:
+                            add_download_task(download_file_url, dc / download_file_name, "file")
+
+            # Save HTML file
+            fname = f"{match['name']}.html"
+            fname = re.sub(r"[^A-Za-z0-9\_\-\. \?]", '', fname)
+            fname = filter_filename(fname)
+            (dc / fname).write_text(file_contents, encoding='utf-8', errors='replace')
+
+            index += 1
+            continue
+
+        # Lesson (videos + html + attachments) - Collect download tasks
+        if ctype == 'Lesson':
+            dc = chapter_path / filter_filename(f"{index}. {match['name']} Lesson")
+            dc.mkdir(exist_ok=True)
+            vname = filter_filename(match['name'])
+
+            j = api_get(f"/api/course_player/v2/lessons/{match['contentable']}")
+            if j:
+                # Collect video download tasks
+                videos = j.get('videos') or []
+                if videos:
+                    for video in videos:
+                        storage = video.get('storage_location')
+                        identifier = video.get('identifier')
+                        if storage == 'wistia' and identifier:
+                            collect_video_task_wistia(identifier, vname, dc)
+                        elif storage == 'videoproxy' and identifier:
+                            collect_video_task_videoproxy(f"https://platform.thinkific.com/videoproxy/v1/play/{identifier}", vname, dc)
+                        else:
+                            direct = video.get('url')
+                            if direct:
+                                add_download_task(direct, dc / f"{vname}.mp4", "video")
+
+                # Save lesson HTML content
+                lesson_info = j.get('lesson', {})
+                html_text = lesson_info.get('html_text') if isinstance(lesson_info, dict) else None
+                if html_text and html_text.strip():
+                    html_filename = f"{vname}.html"
+                    (dc / html_filename).write_text(html_text, encoding='utf-8', errors='replace')
+
+                # Collect attached files
+                for dlf in j.get('download_files', []) or []:
+                    download_file_name = filter_filename(dlf.get('label') or 'file')
+                    download_file_url = dlf.get('download_url')
+                    if download_file_url:
+                        add_download_task(download_file_url, dc / download_file_name, "file")
+
+            index += 1
+            continue
+
+        # PDF - Collect download tasks
+        if ctype == 'Pdf':
+            dc = chapter_path / filter_filename(f"{index}. {match['name']}")
+            dc.mkdir(exist_ok=True)
+
+            j = api_get(f"/api/course_player/v2/pdfs/{match['contentable']}")
+            if j:
+                pdf = j.get('pdf', {})
+                pdf_url = pdf.get('url')
+                if pdf_url:
+                    fname = filter_filename(Path(urlparse(pdf_url).path).name)
+                    add_download_task(pdf_url, dc / fname, "pdf")
+
+            index += 1
+            continue
+
+        # Download (shared files) - Collect download tasks
+        if ctype == 'Download':
+            dc = chapter_path / filter_filename(f"{index}. {match['name']}")
+            dc.mkdir(exist_ok=True)
+
+            j = api_get(f"/api/course_player/v2/downloads/{match['contentable']}")
+            if j:
+                for dlf in j.get('download_files', []) or []:
+                    label = filter_filename(dlf.get('label') or 'file')
+                    url = dlf.get('download_url')
+                    if url:
+                        add_download_task(url, dc / label, "file")
+
+            index += 1
+            continue
+
+        # Audio - Collect download tasks
+        if ctype == 'Audio':
+            dc = chapter_path / filter_filename(f"{index}. {match['name']}")
+            dc.mkdir(exist_ok=True)
+
+            j = api_get(f"/api/course_player/v2/audio/{match['contentable']}")
+            if j:
+                audio = j.get('audio', {})
+                audio_url = audio.get('url')
+                if audio_url:
+                    fname = filter_filename(Path(urlparse(audio_url).path).name)
+                    add_download_task(audio_url, dc / fname, "audio")
+
+            index += 1
+            continue
+
+        # Presentation - Collect download tasks
+        if ctype == 'Presentation':
+            dc = chapter_path / filter_filename(f"{index}. {match['name']}")
+            dc.mkdir(exist_ok=True)
+
+            j = api_get(f"/api/course_player/v2/presentations/{match['contentable']}")
+            if j:
+                pres = j.get('presentation', {})
+                pdf_url = pres.get('source_file_url')
+                pdf_name = filter_filename(pres.get('source_file_name') or 'slides.pdf')
+                if pdf_url:
+                    add_download_task(pdf_url, dc / pdf_name, "presentation")
+
+                # Handle presentation merging - collect slide assets
+                merge_flag = SETTINGS.ffmpeg_presentation_merge if SETTINGS else False
+                if merge_flag:
+                    from shutil import which
+                    if which('ffmpeg'):
+                        items = j.get('presentation_items') or []
+                        for it in items:
+                            pos = it.get('position')
+                            img_url = it.get('image_file_url')
+                            aud_url = it.get('audio_file_url')
+                            if img_url:
+                                img_url = 'https:' + img_url if img_url.startswith('//') else img_url
+                                img_name = filter_filename(f"{pos}{it.get('image_file_name','slide.png')}")
+                                add_download_task(img_url, dc / img_name, "image")
+                            if aud_url:
+                                aud_url = 'https:' + aud_url if aud_url.startswith('//') else aud_url
+                                aud_name = filter_filename(f"{pos}{it.get('audio_file_name','audio.m4a')}")
+                                add_download_task(aud_url, dc / aud_name, "audio")
+
+            index += 1
+            continue
+
+        # Quiz - Handle separately (complex logic)
+        if ctype == 'Quiz':
+            dc = chapter_path / filter_filename(f"{index}. {match['name']} Quiz")
+            dc.mkdir(exist_ok=True)
+
+            fname = filter_filename(f"{match['name']} Answers.html")
+            qname = filter_filename(f"{match['name']} Questions.html")
+
+            result = api_get(f"/api/course_player/v2/quizzes/{match['contentable']}")
+            if result:
+                file_contents_with_answers = "<h3 style='color: red;'>Answers of this Quiz are marked in RED </h3>"
+                file_contents_with_questions = ""
+
+                for qs in result.get("questions", []):
+                    choice = 'A'
+                    position = qs.get("position", 0) + 1
+                    prompt = unicode_decode(qs.get("prompt", ""))
+                    explanation = unicode_decode(qs.get("text_explanation", ""))
+
+                    file_contents_with_answers += f"{position}) <strong>{prompt}</strong> Explanation: {explanation}<br><br>"
+
+                    # Collect embedded video tasks
+                    wistia_matches = WISTIA_PATTERN.findall(prompt)
+                    if wistia_matches:
+                        for wistia_match in set(wistia_matches):
+                            collect_video_task_wistia(wistia_match, f"QA Video {position}", dc)
+
+                    file_contents_with_questions += f"{position}) <strong>{prompt}</strong><br><br>"
+
+                    for ch in result.get("choices", []):
+                        if ch.get("question_id") == qs.get("id"):
+                            try:
+                                import base64
+                                ans = base64.b64decode(ch.get("credited", "")).decode('utf-8', 'ignore')
+                                ans = re.sub(r'\d', '', ans)
+                            except Exception:
+                                ans = ""
+
+                            choice_text = unicode_decode(ch.get("text", ""))
+                            if ans == "true":
+                                file_contents_with_questions += f"{choice}) {choice_text}<br>"
+                                file_contents_with_answers += f"<em style='color: red;'>{choice}) {choice_text}</em><br>"
+                            else:
+                                file_contents_with_questions += f"{choice}) {choice_text}<br>"
+                                file_contents_with_answers += f"{choice}) {choice_text}<br>"
+
+                            choice = chr(ord(choice) + 1)
+
+                    file_contents_with_questions += "<br>"
+                    file_contents_with_answers += "<br>"
+
+                (dc / qname).write_text(file_contents_with_questions, encoding='utf-8', errors='replace')
+                (dc / fname).write_text(file_contents_with_answers, encoding='utf-8', errors='replace')
+
+            index += 1
+            continue
+
+        # Assignment/Survey placeholders
+        if ctype in ['Assignment', 'Survey']:
+            print(f"   ⚠️  {ctype} content type not yet implemented: {match['name']}")
+            index += 1
+            continue
+
+        index += 1


The function collect_chapter_tasks is excessively long and complex, handling the logic for every content type. This violates the Single Responsibility Principle and makes the code very difficult to read, maintain, and test.

This function should be broken down into smaller, more focused functions, each responsible for a single content type (e.g., _collect_lesson_tasks, _collect_pdf_tasks, etc.). A dispatcher can then call the appropriate function based on the content type, which will greatly improve readability and maintainability.

gemini-code-assist · 2025-09-24T08:57:42Z

DEVELOPMENT.md

+    def test_resume_tracker_atomic_save(self):
+        from thinkific_downloader.resume_tracker import ResumeTracker
+        import tempfile
+        with tempfile.TemporaryDirectory() as tmpdir:
+            status_file = Path(tmpdir) / ".download_status.json"
+            tracker = ResumeTracker(str(status_file))
+            tracker.status_data["test"] = {"status": "completed"}
+            tracker._save_status()
+            assert status_file.exists()
+            backup_file = status_file.with_suffix('.json.bak')
+            assert backup_file.exists()


The example test test_resume_tracker_atomic_save is placed within the TestProgressManager class. For better test organization and clarity, tests related to ResumeTracker should be in their own test class, such as TestResumeTracker, and ideally in a separate test_resume_tracker.py file.

gemini-code-assist · 2025-09-24T08:57:42Z

README.md

 ```
 ```


There's a Markdown syntax error here with a duplicate ``` which will cause rendering issues.

Suggested change

```

```

FFMPEG_PRESENTATION_MERGE=false # Enable FFmpeg presentation merging

gemini-code-assist · 2025-09-24T08:57:43Z

setup.py

    install_requires=[
+        "requests>=2.31.0",
+        "rich>=13.0.0",
        "tqdm>=4.65.0",
+        "urllib3>=2.0.0",
    ],


There appears to be an inconsistency between the dependencies listed in setup.py and requirements.txt. For example, requests has different version specifiers in each file. To ensure consistency and a single source of truth, it's best to define abstract dependencies in install_requires here. requirements.txt can then be used for development dependencies or for pinning exact versions for deployment. A common practice for development is to have requirements.txt simply contain -e . to install the package in editable mode along with its dependencies from setup.py.

gemini-code-assist · 2025-09-24T08:57:43Z

thinkific_downloader/download_manager.py

+from rich.console import Console
+from .config import Settings
+from .file_utils import filter_filename


According to PEP 8, all imports should be at the top of the file. These imports are currently in the middle of the module and should be moved to the top with the other imports.

itskavin added 4 commits September 23, 2025 20:38

added progress and speed columns

5c8d20c

okay added multi dwln

19eda74

added image

83da4a0

okay fyn

f4cc7c2

gemini-code-assist bot reviewed Sep 24, 2025

View reviewed changes

okay last one

af80ec0

itskavin merged commit 409ce19 into main Sep 24, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi dwl #1

Multi dwl #1

Uh oh!

itskavin commented Sep 24, 2025

Uh oh!

gemini-code-assist bot commented Sep 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Uh oh!

gemini-code-assist bot Sep 24, 2025

Uh oh!

gemini-code-assist bot Sep 24, 2025

Uh oh!

gemini-code-assist bot Sep 24, 2025

Uh oh!

gemini-code-assist bot Sep 24, 2025

Uh oh!

gemini-code-assist bot Sep 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	```
	```
	FFMPEG_PRESENTATION_MERGE=false # Enable FFmpeg presentation merging

Uh oh!

Multi dwl #1

Multi dwl #1

Uh oh!

Conversation

itskavin commented Sep 24, 2025

Uh oh!

gemini-code-assist bot commented Sep 24, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants