Skip to content

feat(xiaohongshu): paginate creator-notes past the 10-row /analyze/list cap#1736

Open
Benjamin-eecs wants to merge 2 commits into
jackwener:mainfrom
Benjamin-eecs:feat/xhs-creator-notes-pagination
Open

feat(xiaohongshu): paginate creator-notes past the 10-row /analyze/list cap#1736
Benjamin-eecs wants to merge 2 commits into
jackwener:mainfrom
Benjamin-eecs:feat/xhs-creator-notes-pagination

Conversation

@Benjamin-eecs
Copy link
Copy Markdown
Contributor

Description

The /api/galaxy/creator/datacenter/note/analyze/list endpoint serves 10 note rows per page, and the previous fetchCreatorNotesByApi only ever requested page 1 because the in-page direct fetch() bypassed xhs's signing interceptor and got HTTP 406 on subsequent pages. As a result opencli xiaohongshu creator-notes --limit 25 silently capped at 10 even for accounts with hundreds of notes.

Install the same window.__xhsCapture fetch + XHR hook used by creator-note-detail (#1732), SPA-navigate to /statistics/data-analysis so the dashboard fires its own signed page_num=1 request under the hook, then click .d-pagination-page buttons for pages 2..N to make the dashboard's React router fire successive signed requests. Dedupe by note.id and return up to --limit.

The pagination buttons render the page number duplicated in textContent ("22" for page 2 because of an inner accessibility span + visible span), so the click selector tolerates both the raw digit and the doubled form. CAPTURE_POLL_ATTEMPTS / CAPTURE_POLL_INTERVAL_S match the constant naming used by sibling delete-note.js.

Fresh-published notes return title: "" from /note/analyze/list (xhs's own data-analysis UI labels them 无标题笔记). To preserve the title coverage that the existing DOM-fallback path delivered, the helper enriches missing titles from the /new/note-manager card DOM — that view derives a title from the note's first line of content, which is what users actually see in their note-manager UI. Selectors are structural (scrollable-ancestor walk, indexed reads) rather than Chinese-text matches.

Related issue: Closes #1729. Reporter diagnosis: @ppop123 traced the signing-bypass + pagination click pattern and verified on 86-148-note accounts.

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 🌐 New site adapter
  • 📝 Documentation
  • ♻️ Refactor
  • 🔧 CI / build / tooling

Checklist

  • I ran the checks relevant to this PR
  • I updated tests or docs if needed
  • I included output or screenshots when useful

Documentation (if adding/modifying an adapter)

  • Added doc page under docs/adapters/ (if new adapter)
  • Updated docs/adapters/index.md table (if new adapter)
  • Updated sidebar in docs/.vitepress/config.mts (if new adapter)
  • Updated README.md / README.zh-CN.md when command discoverability changed
  • Used positional args for the command's primary subject unless a named flag is clearly better
  • Normalized expected adapter failures to CliError subclasses instead of raw Error

Screenshots / Output

Before (capped at 10 even on accounts with more notes):

$ opencli xiaohongshu creator-notes --limit 15
(returns 10 rows, no rank 11+)

After (live-verified on benjamin-eecs's account, 11 published notes, data-permission active):

$ opencli xiaohongshu creator-notes --limit 15
- rank: 1   id: 6a11ae4d...   title: 'OpenCLI #1729 verify test 11 ...'
- rank: 2   id: 6a11a752...   title: 'OpenCLI #1729 verify test 10 ...'
- rank: 3   id: 6a11a68f...   title: 'OpenCLI #1729 verify test 9 ...'
...
- rank: 10  id: 6a11a077...   title: 'OpenCLI #1729 verify test 1 ...'
- rank: 11  id: 6a1195a8...   title: 'OpenCLI #1728 fix verification ...'

All 11 rows returned with titles populated. Rank 11 is the oldest note (past the first hydration batch of note-manager) and required the inner-list scroll-load to surface its title; the simpler implementation that didn't scroll returned its title empty.

…st cap

The /api/galaxy/creator/datacenter/note/analyze/list endpoint serves 10
note rows per page, and the previous fetchCreatorNotesByApi only ever
requested page 1 because the in-page direct fetch() bypassed xhs's
signing interceptor and returned HTTP 406 for subsequent pages. As a
result `opencli xiaohongshu creator-notes --limit 25` silently capped
at 10 even for accounts with hundreds of notes.

Install the same window.__xhsCapture fetch + XHR hook used by
creator-note-detail (jackwener#1732), SPA-navigate to /statistics/data-analysis
so the dashboard fires its own signed page_num=1 request under the
hook, then click .d-pagination-page buttons for pages 2..N to make the
dashboard's React router fire successive signed requests. Dedupe by
note.id and return up to --limit.

Pagination buttons render the page number duplicated in textContent
("22" for page 2 because of an inner accessibility span + visible
span), so the click selector tolerates both the raw digit and the
doubled form. CAPTURE_POLL_ATTEMPTS / CAPTURE_POLL_INTERVAL_S match
the constant naming used by sibling delete-note.js.

Fresh notes whose title field is still empty in the API response get
enriched from the note-manager card DOM (which derives a title from the
content's first line), so the pre-existing title coverage is preserved
for the rows the API surfaces empty.

Live-verified on benjamin-eecs's 圣诞薯 account (11 published notes,
data-analysis permission active): creator-notes --limit 15 now returns
all 11 rows, with 10 titles enriched via note-manager and 1 left empty
because that note is older than note-manager's first 10 visible cards.
For real-world use (e.g. @ppop123's reported 148-note account), all
titles populate directly from the API.

Closes jackwener#1729.
The /note/analyze/list endpoint returns title: "" for notes whose title
field xhs has not yet populated (the dashboard's own data-analysis table
labels these "无标题笔记"). Scrape the /new/note-manager card DOM as a
secondary source — its cards derive a title from the note's first line
of content, which is what users actually see in their note-manager UI.

The card list lazy-loads past the first hydration batch, so the helper
polls for renders and scrolls the first scrollable ancestor of a card
to its bottom to trigger load-more. Tab selection is left at the
default (全部笔记 / all-notes view) which already covers every state;
the previous tab-click attempt was unnecessary.

Selectors are structural (".d-tabs-headers > div:first-child" style
indexed reads, scrollable-ancestor walk) rather than Chinese-text
matches, to avoid the brittleness of localized UI strings.

Live re-verified on benjamin-eecs's 圣诞薯 account: creator-notes
--limit 15 now returns 11 rows with all 11 titles populated, including
the older 1728 test post that lives past the first hydration batch.
@Benjamin-eecs Benjamin-eecs marked this pull request as ready for review May 24, 2026 07:27
Copilot AI review requested due to automatic review settings May 24, 2026 07:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the xiaohongshu/creator-notes adapter to return more than the first 10 notes by leveraging the creator dashboard’s own signed requests (captured via a fetch/XHR hook) and driving pagination through the UI, with an additional fallback to enrich missing titles from the note-manager DOM.

Changes:

  • Add a window.__xhsCapture fetch + XHR hook and polling/harvesting logic to capture signed /api/galaxy/.../note/analyze/list responses.
  • Paginate the data-analysis table by clicking .d-pagination-page buttons to collect pages 2..N (deduping by note.id) up to --limit.
  • Enrich empty title fields by scraping titles from /new/note-manager (including scroll-triggered lazy-load).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// since a direct fetch() from page.evaluate bypasses the x-s signing and gets 406.
async function installXhsFetchCaptureHook(page) {
await page.evaluate(`(() => {
window.__xhsCapture = {};
Comment on lines +118 to +126
const origFetch = window.fetch;
window.fetch = async function(...args) {
const resp = await origFetch.apply(this, args);
try {
const url = typeof args[0] === 'string' ? args[0] : (args[0] && args[0].url) || '';
if (url.includes('/api/galaxy/')) {
resp.clone().text().then((body) => {
try { window.__xhsCapture[url] = { status: resp.status, ok: resp.ok, body }; } catch (_) {}
}).catch(() => {});
Comment on lines +256 to +271
for (let pageNum = 2; pageNum <= neededPages && items.length < limit; pageNum++) {
const clicked = await page.evaluate(`(() => {
const target = String(${pageNum});
// .d-pagination-page renders the page number doubled (a visible span +
// an accessibility span), so textContent for page 2 reads "22". Match
// both the raw digit and the doubled form to tolerate either render.
const btns = Array.from(document.querySelectorAll('.d-pagination-page'));
const match = btns.find((btn) => {
const text = (btn.textContent || '').trim();
return text === target || text === target + target;
});
if (match) { match.click(); return true; }
return false;
})()`);
if (!clicked) break;
const before = items.length;
const notes = mapAnalyzeItems(items).slice(0, limit);
const missingTitles = notes.filter((note) => !note.title).length;
if (missingTitles > 0) {
const titleMap = await fetchNoteManagerTitleMap(page, notes.length);
Comment on lines +211 to +217
return page.evaluate(`(() => {
const firstCard = document.querySelector('div.note[data-impression]');
let el = firstCard && firstCard.parentElement;
while (el) {
const s = window.getComputedStyle(el);
if ((s.overflowY === 'auto' || s.overflowY === 'scroll') && el.scrollHeight > el.clientHeight + 10) {
el.scrollTop = el.scrollHeight;
Comment on lines +242 to +295
async function fetchCreatorNotesByCapture(page, limit) {
// Land on dashboard root before installing the hook so the data-analysis
// SPA navigation fires page_num=1's signed request UNDER the hook.
await page.goto('https://creator.xiaohongshu.com/statistics');
await installXhsFetchCaptureHook(page);
await page.evaluate(`(() => {
history.pushState({}, '', '/statistics/data-analysis?source=official');
window.dispatchEvent(new PopStateEvent('popstate'));
})()`);
let captureMap = await pollCaptureMap(page);
let { items, total } = harvestAnalyzeListCaptures(captureMap);
if (items.length === 0) return [];
const totalPages = total > 0 ? Math.ceil(total / NOTE_ANALYZE_PAGE_SIZE) : 1;
const neededPages = Math.min(totalPages, Math.ceil(limit / NOTE_ANALYZE_PAGE_SIZE));
for (let pageNum = 2; pageNum <= neededPages && items.length < limit; pageNum++) {
const clicked = await page.evaluate(`(() => {
const target = String(${pageNum});
// .d-pagination-page renders the page number doubled (a visible span +
// an accessibility span), so textContent for page 2 reads "22". Match
// both the raw digit and the doubled form to tolerate either render.
const btns = Array.from(document.querySelectorAll('.d-pagination-page'));
const match = btns.find((btn) => {
const text = (btn.textContent || '').trim();
return text === target || text === target + target;
});
if (match) { match.click(); return true; }
return false;
})()`);
if (!clicked) break;
const before = items.length;
for (let attempt = 0; attempt < CAPTURE_POLL_ATTEMPTS; attempt++) {
await page.wait(CAPTURE_POLL_INTERVAL_S);
const raw = await page.evaluate('JSON.stringify(window.__xhsCapture || {})');
captureMap = typeof raw === 'string' ? JSON.parse(raw) : {};
const harvested = harvestAnalyzeListCaptures(captureMap);
if (harvested.items.length > before) {
items = harvested.items;
total = Math.max(total, harvested.total);
break;
}
}
}
const notes = mapAnalyzeItems(items).slice(0, limit);
const missingTitles = notes.filter((note) => !note.title).length;
if (missingTitles > 0) {
const titleMap = await fetchNoteManagerTitleMap(page, notes.length);
for (const note of notes) {
if (!note.title && note.id && titleMap.has(note.id)) {
note.title = titleMap.get(note.id);
}
}
}
return notes;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] xiaohongshu/creator-notes-all — paginate the full published-notes list (today caps at 10)

2 participants