Skip to content

Main#25

Closed
gdamdam wants to merge 17 commits into
masterfrom
main
Closed

Main#25
gdamdam wants to merge 17 commits into
masterfrom
main

Conversation

@gdamdam

@gdamdam gdamdam commented Mar 11, 2026

Copy link
Copy Markdown
Owner

new version

gdamdam and others added 17 commits February 28, 2026 14:19
    - Add archive_watchlist.py: single-shot script that fetches the top-100
      most-starred GitHub repos and archives any new or updated ones via
      iagitup. Features local state cache (watchlist_state.json) to skip
      repos unchanged since last run, enriched IA metadata (stars, forks,
      language, topics, rank), --dry-run and --top-n flags, cron-ready.
    - Add tests/test_archive_watchlist.py: 30 tests covering load/save state,
      build_custom_meta, fetch_top_repos, and archive_repo (skip, dry-run,
      success, failure, and cleanup paths).
    - Add [tool.pytest.ini_options] pythonpath to pyproject.toml so the
      root-level script is importable from the test suite.

    release v3.0.0 — modernize project

    - Migrate packaging from setup.py/setup.cfg to pyproject.toml (PEP 517/621)
    - Drop Python 2 compat shims; require Python 3.10+
    - Slim dependencies to 4 direct deps (was 27)
    - Rewrite core library: type hints, f-strings, pathlib, logging, custom
      exceptions (no more exit() in library code), proper URL parsing via
      urllib.parse, GitHub token auth via GITHUB_TOKEN env var
    - Implement wiki archiving (bundle + upload + description link)
    - Fix argparse running at import time in __main__.py
    - Add tests/ suite covering URL parsing and README detection
    - Update README: Python 3.10+, GITHUB_TOKEN docs, remove Python 2 notes

Author: gio <giovanni@archive.org>
Date:   Wed Feb 19 19:35:06 2020 -0800

    updated setup with configparser req

commit 7b4ab22
Author: gio <giovanni@archive.org>
Date:   Wed Feb 19 19:32:28 2020 -0800

    requirements updated

commit 2a4ed75
Author: gio <giovanni@archive.org>
Date:   Wed Feb 19 18:21:22 2020 -0800

    added TODO

commit c106453
Author: gio <giovanni@archive.org>
Date:   Wed Feb 19 17:52:17 2020 -0800

    license updated

commit 6f71ca4
Author: gio <giovanni@archive.org>
Date:   Wed Feb 19 17:52:05 2020 -0800

    minor

commit 83f215b
Author: gio <giovanni@archive.org>
Date:   Wed Feb 19 17:37:10 2020 -0800

    fixed issue with import

commit d95156a
Author: gio <giovanni@archive.org>
Date:   Wed Feb 19 17:30:38 2020 -0800

    minor descriptions

commit c797e04
Author: gio <giovanni@archive.org>
Date:   Wed Feb 19 17:27:02 2020 -0800

    added ia session

commit c54f3cb
Author: gio <giovanni@archive.org>
Date:   Wed Feb 19 16:58:13 2020 -0800

    print error when account creation fails

commit 89ab281
Author: gio <giovanni@archive.org>
Date:   Wed Feb 19 16:53:08 2020 -0800

    using proper tmp dir

commit 7374301
Author: gio <giovanni@archive.org>
Date:   Wed Feb 19 16:52:57 2020 -0800

    start rewriting

commit c3e61ba
Author: gio <giovanni@archive.org>
Date:   Wed Feb 19 16:46:00 2020 -0800

    start refactoring
Bugs fixed
- upload_ia: item.exists check now happens before any heavy work (avatar
  download, wiki clone, bundle creation), avoiding wasted effort when the
  same snapshot is already on IA.
- All callers (archive_repo, __main__): clean repo_folder.parent (the
  mkdtemp root) instead of repo_folder only, so the wiki/ subdirectory
  and the temp dir itself are no longer leaked on disk.
- fetch_top_repos: status code is now checked before reading
  X-RateLimit-Remaining, preventing a silent fallback to 9999 on errors.
- archive_watchlist: save_state is now skipped for "skipped" results,
  reducing unnecessary disk I/O.

Parallelism
- upload_ia: avatar download and wiki clone now run concurrently via
  ThreadPoolExecutor(max_workers=2) — both are independent network calls.
- archive_watchlist: repos are now archived in parallel using
  ThreadPoolExecutor with a configurable --workers flag (default: 4).
  State writes are protected by a threading.Lock.

Other
- _download_avatar extracted as a standalone function (cleaner, testable).
- All functions have thorough docstrings and inline comments.
- README updated: archive_watchlist documented, --workers flag, duplicate
  prevention explained, state file format shown.
- Tests updated to use the correct mkdtemp-root cleanup structure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Table of contents with anchor links
- iagitup: all CLI options, custom metadata, GitHub auth, IA credentials
  setup, step-by-step description of what gets archived, IA item
  structure table, automatic metadata fields table, duplicate prevention
  explained
- archive_watchlist: full options table with defaults, duplicate
  prevention two-layer table, extra metadata fields table, cron setup
  with GITHUB_TOKEN, state file format and how to force re-archive
- Restore section with download + git clone steps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move archive_watchlist.py into the iagitup package and register it as
an `archive-watchlist` console entry point so both commands are available
after `pip install iagitup`. Root script kept as thin backward-compat wrapper.
Update README, tests, and bump version to 3.1.1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add .gitlab-ci.yml with lint (py_compile) and test (pytest across
Python 3.10–3.13) stages. Bump version to 3.1.2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ~/.config/internetarchive/ia.ini to credential search paths (ia v5.x)
- Set GIT_TERMINAL_PROMPT=0 for wiki clones to prevent auth prompts
- Cast year metadata to str to fix upload TypeError
- Rewrite README with badges, clear sections, and professional formatting
- Add comments throughout source code for maintainability
- Add 15 new tests covering _github_headers, create_bundle, _download_wiki,
  _download_avatar, and new credential config path
Detect LFS via .gitattributes, fetch objects with git lfs fetch --all,
create a separate _lfs.tar.gz archive, and upload it alongside the bundle.
Warns (doesn't error) when git-lfs is not installed. Includes tests,
README updates, and --help description update.
…p (v3.3.0)

Support GitLab, Bitbucket, Codeberg, self-hosted Gitea, and any HTTPS git URL.
GitHub keeps its rich API path; all other platforms clone directly and extract
metadata from local git history. Clone now shows real-time progress via
git clone --progress. Temp files are always cleaned up, even on Ctrl+C.
@gdamdam gdamdam closed this Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant