🌐 English · 한국어
Turn changing documentation sites into maintained, versioned NotebookLM source packs.
notebooklm-source-pack-builder crawls or imports product documentation, normalizes it into Markdown, bundles the content into NotebookLM-friendly source files, and tracks enough metadata to keep those packs updated over time without deleting older sources.
Status: early local MVP. The core local workflow, NotebookLM deploy path, and non-destructive versioned refresh flow are implemented and tested. General-purpose web crawling is still being expanded; the stable tested ingestion path is currently
--from-markdown-dir.
| Area | Summary |
|---|---|
| Input | Docs roots, llms.txt, sitemap.xml, sidebar links, same-site links, or pre-cleaned Markdown directories. |
| Output | NotebookLM-sized Markdown bundles plus a manifest that records notebook/source IDs and statuses. |
| Refresh model | Non-destructive dated snapshots; changed docs are added without deleting old sources. |
| Operations | Optional NotebookLM deploy, auth preflight, source readiness wait, quiet no-change refreshes, and Hermes cron installation. |
| Safety | Local/private root URLs are blocked by default; cookies, tokens, and generated private packs stay out of the repo. |
Create a tiny local pack without Google login, then list it:
mkdir -p /tmp/nlm-pack-demo && printf '# Demo docs\n\nHello NotebookLM.\n' > /tmp/nlm-pack-demo/index.md
uv run nlm-pack create --url https://docs.example.com --title "Demo Docs" --no-upload
uv run nlm-pack sync demo-docs --from-markdown-dir /tmp/nlm-pack-demo --no-upload
uv run nlm-pack list | grep demo-docsAfter a local sync, the pack directory contains a manifest and NotebookLM-sized Markdown bundles:
~/.notebooklm-source-packs/example-docs/
config.yaml
manifest.json
bundles/
bundle-001.md
The image above summarizes the supported production path: discover or import docs, bundle them with nlm-pack, deploy sources to NotebookLM, and add non-destructive dated snapshots only when content changes.
NotebookLM is useful for source-grounded product and developer documentation, but maintaining sources manually becomes painful when docs change. This tool is built around a simple operating model:
docs -> clean Markdown -> bundled sources -> NotebookLM notebook -> dated incremental snapshots
Instead of replacing old NotebookLM sources, versioned refreshes add dated snapshot bundles only when upstream content changes. That preserves historical command/API context, so later answers can distinguish “old docs said X” from “current docs say Y”.
- Create local source-pack configs and manifests.
- Discover URLs from
llms-full.txt,llms.txt,sitemap.xml, root-page sidebar/navigation links, and same-site links. - Import cleaned Markdown from a local directory, including nested directories with stable source URLs.
- Generate NotebookLM-sized Markdown bundles.
- Track page and bundle content hashes for change detection.
- Deploy bundles to NotebookLM through the
notebooklmCLI. - Check NotebookLM authentication before deploy and stop before modifying manifests if auth is missing.
- Wait for uploaded NotebookLM sources to become ready.
- Run non-destructive versioned refreshes with dated snapshots.
- Support quiet scheduled refreshes that emit nothing when no content changed.
- Optionally install a Hermes cron job during deploy.
- Preserve existing
notebook_id,source_id, ready statuses, and historical snapshot entries across syncs. - Guard against exceeding NotebookLM's 50-source notebook limit during deploy.
- Reject localhost/private-network root URLs by default to reduce SSRF risk.
Requirements:
- Python 3.11+
uv- Optional for uploads: an authenticated
notebooklmCLI - Optional for scheduled refreshes: Hermes Agent cron support
You can run this project by itself; it does not require the Yorha/Hermes all-in-one stack.
git clone https://github.com/yelixir-dev/notebooklm-source-pack-builder.git
cd notebooklm-source-pack-builder
uv sync
uv run nlm-pack doctorTo install the CLI directly from GitHub:
uv tool install git+https://github.com/yelixir-dev/notebooklm-source-pack-builder.git
nlm-pack doctorLocal pack creation works without Google login. Uploading to NotebookLM requires the external NotebookLM CLI and a browser login, because NotebookLM is a Google service and this repo must not ship cookies or storage_state.json.
uv tool install notebooklm-py
notebooklm login
notebooklm auth check --testAfter that, nlm-pack deploy and nlm-pack refresh --upload can add sources to NotebookLM.
If you already use the Yorha/Hermes memory-stack all-in-one setup, connect this tool there for scheduled refreshes, inventory sync, and agent-operated maintenance:
https://github.com/yelixir-dev/hermes-memory-stack
Standalone is fine; the all-in-one stack is just the nicer operations layer.
uv run nlm-pack create \
--url https://docs.example.com \
--title "Example Docs" \
--no-upload
uv run nlm-pack sync example-docs \
--from-markdown-dir ./docs-markdown \
--no-upload
uv run nlm-pack listGenerated files are stored under:
~/.notebooklm-source-packs/example-docs/
config.yaml
manifest.json
bundles/
bundle-001.md
Nested Markdown directories are preserved in generated source URLs. For example, ./docs-markdown/getting-started/intro.md becomes https://docs.example.com/getting-started/intro, while ./docs-markdown/reference/index.md becomes https://docs.example.com/reference.
First make sure the external NotebookLM CLI is authenticated:
notebooklm auth status --jsonThen deploy a pack:
uv run nlm-pack deploy example-docsdeploy will:
- load the local pack manifest,
- verify the external NotebookLM CLI has an authenticated session,
- check that the notebook will not exceed the 50-source NotebookLM limit,
- create a NotebookLM notebook if the manifest does not already have one,
- upload missing bundle files,
- wait for each source to become ready,
- write
notebook_id,source_id, and source status back intomanifest.json, and - ask whether to install a versioned incremental refresh cron.
To install the cron non-interactively, provide the Markdown source directory used for future refreshes:
uv run nlm-pack deploy example-docs \
--install-refresh-cron \
--refresh-markdown-dir ./docs-markdown \
--schedule "0 9 * * 1"Use refresh --versioned to compare current content against the latest recorded hash. If nothing changed, it exits without creating a new snapshot. If content changed, it writes a dated snapshot and, with --upload, adds the new bundle to NotebookLM without deleting older sources.
uv run nlm-pack refresh example-docs \
--from-markdown-dir ./docs-markdown \
--versioned \
--uploadFor scheduled jobs, add --quiet to suppress output when there are no changes:
uv run nlm-pack refresh example-docs \
--from-markdown-dir ./docs-markdown \
--versioned \
--quietSnapshot layout:
~/.notebooklm-source-packs/example-docs/
snapshots/
YYYY-MM-DD/
pages/
bundles/
example-docs-docs-YYYY-MM-DD-bundle-001.md
history.jsonl
If more than one changed snapshot is created on the same day, suffixes such as YYYY-MM-DD-01 are used to avoid manifest collisions.
uv run nlm-pack discover https://docs.example.comCurrent discovery sources:
llms-full.txtllms.txtsitemap.xml- root-page sidebar/navigation links
- bounded same-site HTML links
The broader crawler and site-specific extraction strategies are still evolving. For reliable production ingestion today, pre-clean docs into a Markdown directory and use sync --from-markdown-dir / refresh --from-markdown-dir.
Before publishing or pushing a pack builder workspace, double-check that the repository does not contain:
- Google/NotebookLM cookies (
SID,HSID,SSID, etc.) - browser profiles or
storage_state.json .envfiles- API keys or tokens
- generated private source packs from
~/.notebooklm-source-packs/ - local cron scripts from
~/.hermes/scripts/ - Python caches or virtual environments
This repository should contain source code, tests, examples, and documentation only.
This tool handles documentation content and NotebookLM upload metadata, but it must not store or publish Google cookies, storage_state.json, API keys, generated private source packs, or local Hermes cron scripts. Local/private root URLs are rejected by default to reduce SSRF risk.
NotebookLM upload depends on the external notebooklm CLI and the user's Google session. Keep authentication material outside this repository and verify with notebooklm auth check --test before deploy or refresh operations. For private security reports, use the GitHub Security Advisory form at https://github.com/yelixir-dev/notebooklm-source-pack-builder/security/advisories/new or email yelixir.dev@gmail.com.
Contributions are welcome when they keep the local-first, source-safe workflow intact. Before opening a change, run:
uv run pytest -q
uv run ruff check .
uv run ruff format --check .For crawler, deploy, or refresh changes, include tests that prove manifests preserve notebook_id, source_id, ready statuses, and historical snapshot entries.
uv sync
uv run pytest -q
uv run ruff check .
uv run ruff format .- The most stable ingestion path is Markdown-directory based.
- NotebookLM upload support depends on the external
notebooklmCLI. - Refresh cron installation is Hermes-specific.
- Automatic archive-notebook rotation for long-running histories is not implemented yet.
- Detailed semantic diff summaries are not implemented yet.
MIT
