Conversation
- Add archive_watchlist.py: single-shot script that fetches the top-100
most-starred GitHub repos and archives any new or updated ones via
iagitup. Features local state cache (watchlist_state.json) to skip
repos unchanged since last run, enriched IA metadata (stars, forks,
language, topics, rank), --dry-run and --top-n flags, cron-ready.
- Add tests/test_archive_watchlist.py: 30 tests covering load/save state,
build_custom_meta, fetch_top_repos, and archive_repo (skip, dry-run,
success, failure, and cleanup paths).
- Add [tool.pytest.ini_options] pythonpath to pyproject.toml so the
root-level script is importable from the test suite.
release v3.0.0 — modernize project
- Migrate packaging from setup.py/setup.cfg to pyproject.toml (PEP 517/621)
- Drop Python 2 compat shims; require Python 3.10+
- Slim dependencies to 4 direct deps (was 27)
- Rewrite core library: type hints, f-strings, pathlib, logging, custom
exceptions (no more exit() in library code), proper URL parsing via
urllib.parse, GitHub token auth via GITHUB_TOKEN env var
- Implement wiki archiving (bundle + upload + description link)
- Fix argparse running at import time in __main__.py
- Add tests/ suite covering URL parsing and README detection
- Update README: Python 3.10+, GITHUB_TOKEN docs, remove Python 2 notes
Author: gio <giovanni@archive.org>
Date: Wed Feb 19 19:35:06 2020 -0800
updated setup with configparser req
commit 7b4ab22
Author: gio <giovanni@archive.org>
Date: Wed Feb 19 19:32:28 2020 -0800
requirements updated
commit 2a4ed75
Author: gio <giovanni@archive.org>
Date: Wed Feb 19 18:21:22 2020 -0800
added TODO
commit c106453
Author: gio <giovanni@archive.org>
Date: Wed Feb 19 17:52:17 2020 -0800
license updated
commit 6f71ca4
Author: gio <giovanni@archive.org>
Date: Wed Feb 19 17:52:05 2020 -0800
minor
commit 83f215b
Author: gio <giovanni@archive.org>
Date: Wed Feb 19 17:37:10 2020 -0800
fixed issue with import
commit d95156a
Author: gio <giovanni@archive.org>
Date: Wed Feb 19 17:30:38 2020 -0800
minor descriptions
commit c797e04
Author: gio <giovanni@archive.org>
Date: Wed Feb 19 17:27:02 2020 -0800
added ia session
commit c54f3cb
Author: gio <giovanni@archive.org>
Date: Wed Feb 19 16:58:13 2020 -0800
print error when account creation fails
commit 89ab281
Author: gio <giovanni@archive.org>
Date: Wed Feb 19 16:53:08 2020 -0800
using proper tmp dir
commit 7374301
Author: gio <giovanni@archive.org>
Date: Wed Feb 19 16:52:57 2020 -0800
start rewriting
commit c3e61ba
Author: gio <giovanni@archive.org>
Date: Wed Feb 19 16:46:00 2020 -0800
start refactoring
Bugs fixed - upload_ia: item.exists check now happens before any heavy work (avatar download, wiki clone, bundle creation), avoiding wasted effort when the same snapshot is already on IA. - All callers (archive_repo, __main__): clean repo_folder.parent (the mkdtemp root) instead of repo_folder only, so the wiki/ subdirectory and the temp dir itself are no longer leaked on disk. - fetch_top_repos: status code is now checked before reading X-RateLimit-Remaining, preventing a silent fallback to 9999 on errors. - archive_watchlist: save_state is now skipped for "skipped" results, reducing unnecessary disk I/O. Parallelism - upload_ia: avatar download and wiki clone now run concurrently via ThreadPoolExecutor(max_workers=2) — both are independent network calls. - archive_watchlist: repos are now archived in parallel using ThreadPoolExecutor with a configurable --workers flag (default: 4). State writes are protected by a threading.Lock. Other - _download_avatar extracted as a standalone function (cleaner, testable). - All functions have thorough docstrings and inline comments. - README updated: archive_watchlist documented, --workers flag, duplicate prevention explained, state file format shown. - Tests updated to use the correct mkdtemp-root cleanup structure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Table of contents with anchor links - iagitup: all CLI options, custom metadata, GitHub auth, IA credentials setup, step-by-step description of what gets archived, IA item structure table, automatic metadata fields table, duplicate prevention explained - archive_watchlist: full options table with defaults, duplicate prevention two-layer table, extra metadata fields table, cron setup with GITHUB_TOKEN, state file format and how to force re-archive - Restore section with download + git clone steps Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move archive_watchlist.py into the iagitup package and register it as an `archive-watchlist` console entry point so both commands are available after `pip install iagitup`. Root script kept as thin backward-compat wrapper. Update README, tests, and bump version to 3.1.1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add .gitlab-ci.yml with lint (py_compile) and test (pytest across Python 3.10–3.13) stages. Bump version to 3.1.2. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ~/.config/internetarchive/ia.ini to credential search paths (ia v5.x) - Set GIT_TERMINAL_PROMPT=0 for wiki clones to prevent auth prompts - Cast year metadata to str to fix upload TypeError - Rewrite README with badges, clear sections, and professional formatting - Add comments throughout source code for maintainability - Add 15 new tests covering _github_headers, create_bundle, _download_wiki, _download_avatar, and new credential config path
Detect LFS via .gitattributes, fetch objects with git lfs fetch --all, create a separate _lfs.tar.gz archive, and upload it alongside the bundle. Warns (doesn't error) when git-lfs is not installed. Includes tests, README updates, and --help description update.
…p (v3.3.0) Support GitLab, Bitbucket, Codeberg, self-hosted Gitea, and any HTTPS git URL. GitHub keeps its rich API path; all other platforms clone directly and extract metadata from local git history. Clone now shows real-time progress via git clone --progress. Temp files are always cleaned up, even on Ctrl+C.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
new version