Skip to content

yelixir-dev/notebooklm-source-pack-builder

Repository files navigation

🌐 English · 한국어

notebooklm-source-pack-builder

NotebookLM Source Pack Builder flow overview

Turn changing documentation sites into maintained, versioned NotebookLM source packs.

MIT License Python 3.11+ nlm-pack CLI NotebookLM source packs

notebooklm-source-pack-builder crawls or imports product documentation, normalizes it into Markdown, bundles the content into NotebookLM-friendly source files, and tracks enough metadata to keep those packs updated over time without deleting older sources.

Status: early local MVP. The core local workflow, NotebookLM deploy path, and non-destructive versioned refresh flow are implemented and tested. General-purpose web crawling is still being expanded; the stable tested ingestion path is currently --from-markdown-dir.

At a glance

Area Summary
Input Docs roots, llms.txt, sitemap.xml, sidebar links, same-site links, or pre-cleaned Markdown directories.
Output NotebookLM-sized Markdown bundles plus a manifest that records notebook/source IDs and statuses.
Refresh model Non-destructive dated snapshots; changed docs are added without deleting old sources.
Operations Optional NotebookLM deploy, auth preflight, source readiness wait, quiet no-change refreshes, and Hermes cron installation.
Safety Local/private root URLs are blocked by default; cookies, tokens, and generated private packs stay out of the repo.

Quick start + verify

Create a tiny local pack without Google login, then list it:

mkdir -p /tmp/nlm-pack-demo && printf '# Demo docs\n\nHello NotebookLM.\n' > /tmp/nlm-pack-demo/index.md
uv run nlm-pack create --url https://docs.example.com --title "Demo Docs" --no-upload
uv run nlm-pack sync demo-docs --from-markdown-dir /tmp/nlm-pack-demo --no-upload
uv run nlm-pack list | grep demo-docs

First successful output

After a local sync, the pack directory contains a manifest and NotebookLM-sized Markdown bundles:

~/.notebooklm-source-packs/example-docs/
  config.yaml
  manifest.json
  bundles/
    bundle-001.md

Infographic overview

The image above summarizes the supported production path: discover or import docs, bundle them with nlm-pack, deploy sources to NotebookLM, and add non-destructive dated snapshots only when content changes.

Why this exists

NotebookLM is useful for source-grounded product and developer documentation, but maintaining sources manually becomes painful when docs change. This tool is built around a simple operating model:

docs -> clean Markdown -> bundled sources -> NotebookLM notebook -> dated incremental snapshots

Instead of replacing old NotebookLM sources, versioned refreshes add dated snapshot bundles only when upstream content changes. That preserves historical command/API context, so later answers can distinguish “old docs said X” from “current docs say Y”.

Features

  • Create local source-pack configs and manifests.
  • Discover URLs from llms-full.txt, llms.txt, sitemap.xml, root-page sidebar/navigation links, and same-site links.
  • Import cleaned Markdown from a local directory, including nested directories with stable source URLs.
  • Generate NotebookLM-sized Markdown bundles.
  • Track page and bundle content hashes for change detection.
  • Deploy bundles to NotebookLM through the notebooklm CLI.
  • Check NotebookLM authentication before deploy and stop before modifying manifests if auth is missing.
  • Wait for uploaded NotebookLM sources to become ready.
  • Run non-destructive versioned refreshes with dated snapshots.
  • Support quiet scheduled refreshes that emit nothing when no content changed.
  • Optionally install a Hermes cron job during deploy.
  • Preserve existing notebook_id, source_id, ready statuses, and historical snapshot entries across syncs.
  • Guard against exceeding NotebookLM's 50-source notebook limit during deploy.
  • Reject localhost/private-network root URLs by default to reduce SSRF risk.

Installation

Requirements:

  • Python 3.11+
  • uv
  • Optional for uploads: an authenticated notebooklm CLI
  • Optional for scheduled refreshes: Hermes Agent cron support

Standalone install

You can run this project by itself; it does not require the Yorha/Hermes all-in-one stack.

git clone https://github.com/yelixir-dev/notebooklm-source-pack-builder.git
cd notebooklm-source-pack-builder
uv sync
uv run nlm-pack doctor

To install the CLI directly from GitHub:

uv tool install git+https://github.com/yelixir-dev/notebooklm-source-pack-builder.git
nlm-pack doctor

NotebookLM authentication

Local pack creation works without Google login. Uploading to NotebookLM requires the external NotebookLM CLI and a browser login, because NotebookLM is a Google service and this repo must not ship cookies or storage_state.json.

uv tool install notebooklm-py
notebooklm login
notebooklm auth check --test

After that, nlm-pack deploy and nlm-pack refresh --upload can add sources to NotebookLM.

With the Yorha all-in-one stack

If you already use the Yorha/Hermes memory-stack all-in-one setup, connect this tool there for scheduled refreshes, inventory sync, and agent-operated maintenance:

https://github.com/yelixir-dev/hermes-memory-stack

Standalone is fine; the all-in-one stack is just the nicer operations layer.

Quick start: local pack

uv run nlm-pack create \
  --url https://docs.example.com \
  --title "Example Docs" \
  --no-upload

uv run nlm-pack sync example-docs \
  --from-markdown-dir ./docs-markdown \
  --no-upload

uv run nlm-pack list

Generated files are stored under:

~/.notebooklm-source-packs/example-docs/
  config.yaml
  manifest.json
  bundles/
    bundle-001.md

Nested Markdown directories are preserved in generated source URLs. For example, ./docs-markdown/getting-started/intro.md becomes https://docs.example.com/getting-started/intro, while ./docs-markdown/reference/index.md becomes https://docs.example.com/reference.

Deploy to NotebookLM

First make sure the external NotebookLM CLI is authenticated:

notebooklm auth status --json

Then deploy a pack:

uv run nlm-pack deploy example-docs

deploy will:

  1. load the local pack manifest,
  2. verify the external NotebookLM CLI has an authenticated session,
  3. check that the notebook will not exceed the 50-source NotebookLM limit,
  4. create a NotebookLM notebook if the manifest does not already have one,
  5. upload missing bundle files,
  6. wait for each source to become ready,
  7. write notebook_id, source_id, and source status back into manifest.json, and
  8. ask whether to install a versioned incremental refresh cron.

To install the cron non-interactively, provide the Markdown source directory used for future refreshes:

uv run nlm-pack deploy example-docs \
  --install-refresh-cron \
  --refresh-markdown-dir ./docs-markdown \
  --schedule "0 9 * * 1"

Versioned refresh

Use refresh --versioned to compare current content against the latest recorded hash. If nothing changed, it exits without creating a new snapshot. If content changed, it writes a dated snapshot and, with --upload, adds the new bundle to NotebookLM without deleting older sources.

uv run nlm-pack refresh example-docs \
  --from-markdown-dir ./docs-markdown \
  --versioned \
  --upload

For scheduled jobs, add --quiet to suppress output when there are no changes:

uv run nlm-pack refresh example-docs \
  --from-markdown-dir ./docs-markdown \
  --versioned \
  --quiet

Snapshot layout:

~/.notebooklm-source-packs/example-docs/
  snapshots/
    YYYY-MM-DD/
      pages/
      bundles/
        example-docs-docs-YYYY-MM-DD-bundle-001.md
  history.jsonl

If more than one changed snapshot is created on the same day, suffixes such as YYYY-MM-DD-01 are used to avoid manifest collisions.

URL discovery

uv run nlm-pack discover https://docs.example.com

Current discovery sources:

  1. llms-full.txt
  2. llms.txt
  3. sitemap.xml
  4. root-page sidebar/navigation links
  5. bounded same-site HTML links

The broader crawler and site-specific extraction strategies are still evolving. For reliable production ingestion today, pre-clean docs into a Markdown directory and use sync --from-markdown-dir / refresh --from-markdown-dir.

Safety and publishing checklist

Before publishing or pushing a pack builder workspace, double-check that the repository does not contain:

  • Google/NotebookLM cookies (SID, HSID, SSID, etc.)
  • browser profiles or storage_state.json
  • .env files
  • API keys or tokens
  • generated private source packs from ~/.notebooklm-source-packs/
  • local cron scripts from ~/.hermes/scripts/
  • Python caches or virtual environments

This repository should contain source code, tests, examples, and documentation only.

Security

This tool handles documentation content and NotebookLM upload metadata, but it must not store or publish Google cookies, storage_state.json, API keys, generated private source packs, or local Hermes cron scripts. Local/private root URLs are rejected by default to reduce SSRF risk.

NotebookLM upload depends on the external notebooklm CLI and the user's Google session. Keep authentication material outside this repository and verify with notebooklm auth check --test before deploy or refresh operations. For private security reports, use the GitHub Security Advisory form at https://github.com/yelixir-dev/notebooklm-source-pack-builder/security/advisories/new or email yelixir.dev@gmail.com.

Contributing

Contributions are welcome when they keep the local-first, source-safe workflow intact. Before opening a change, run:

uv run pytest -q
uv run ruff check .
uv run ruff format --check .

For crawler, deploy, or refresh changes, include tests that prove manifests preserve notebook_id, source_id, ready statuses, and historical snapshot entries.

Development

uv sync
uv run pytest -q
uv run ruff check .
uv run ruff format .

Current limitations

  • The most stable ingestion path is Markdown-directory based.
  • NotebookLM upload support depends on the external notebooklm CLI.
  • Refresh cron installation is Hermes-specific.
  • Automatic archive-notebook rotation for long-running histories is not implemented yet.
  • Detailed semantic diff summaries are not implemented yet.

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages