feat(xiaohongshu): paginate creator-notes past the 10-row /analyze/list cap#1736
Open
Benjamin-eecs wants to merge 2 commits into
Open
feat(xiaohongshu): paginate creator-notes past the 10-row /analyze/list cap#1736Benjamin-eecs wants to merge 2 commits into
Benjamin-eecs wants to merge 2 commits into
Conversation
…st cap The /api/galaxy/creator/datacenter/note/analyze/list endpoint serves 10 note rows per page, and the previous fetchCreatorNotesByApi only ever requested page 1 because the in-page direct fetch() bypassed xhs's signing interceptor and returned HTTP 406 for subsequent pages. As a result `opencli xiaohongshu creator-notes --limit 25` silently capped at 10 even for accounts with hundreds of notes. Install the same window.__xhsCapture fetch + XHR hook used by creator-note-detail (jackwener#1732), SPA-navigate to /statistics/data-analysis so the dashboard fires its own signed page_num=1 request under the hook, then click .d-pagination-page buttons for pages 2..N to make the dashboard's React router fire successive signed requests. Dedupe by note.id and return up to --limit. Pagination buttons render the page number duplicated in textContent ("22" for page 2 because of an inner accessibility span + visible span), so the click selector tolerates both the raw digit and the doubled form. CAPTURE_POLL_ATTEMPTS / CAPTURE_POLL_INTERVAL_S match the constant naming used by sibling delete-note.js. Fresh notes whose title field is still empty in the API response get enriched from the note-manager card DOM (which derives a title from the content's first line), so the pre-existing title coverage is preserved for the rows the API surfaces empty. Live-verified on benjamin-eecs's 圣诞薯 account (11 published notes, data-analysis permission active): creator-notes --limit 15 now returns all 11 rows, with 10 titles enriched via note-manager and 1 left empty because that note is older than note-manager's first 10 visible cards. For real-world use (e.g. @ppop123's reported 148-note account), all titles populate directly from the API. Closes jackwener#1729.
The /note/analyze/list endpoint returns title: "" for notes whose title
field xhs has not yet populated (the dashboard's own data-analysis table
labels these "无标题笔记"). Scrape the /new/note-manager card DOM as a
secondary source — its cards derive a title from the note's first line
of content, which is what users actually see in their note-manager UI.
The card list lazy-loads past the first hydration batch, so the helper
polls for renders and scrolls the first scrollable ancestor of a card
to its bottom to trigger load-more. Tab selection is left at the
default (全部笔记 / all-notes view) which already covers every state;
the previous tab-click attempt was unnecessary.
Selectors are structural (".d-tabs-headers > div:first-child" style
indexed reads, scrollable-ancestor walk) rather than Chinese-text
matches, to avoid the brittleness of localized UI strings.
Live re-verified on benjamin-eecs's 圣诞薯 account: creator-notes
--limit 15 now returns 11 rows with all 11 titles populated, including
the older 1728 test post that lives past the first hydration batch.
There was a problem hiding this comment.
Pull request overview
This PR enhances the xiaohongshu/creator-notes adapter to return more than the first 10 notes by leveraging the creator dashboard’s own signed requests (captured via a fetch/XHR hook) and driving pagination through the UI, with an additional fallback to enrich missing titles from the note-manager DOM.
Changes:
- Add a
window.__xhsCapturefetch + XHR hook and polling/harvesting logic to capture signed/api/galaxy/.../note/analyze/listresponses. - Paginate the data-analysis table by clicking
.d-pagination-pagebuttons to collect pages 2..N (deduping bynote.id) up to--limit. - Enrich empty
titlefields by scraping titles from/new/note-manager(including scroll-triggered lazy-load).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // since a direct fetch() from page.evaluate bypasses the x-s signing and gets 406. | ||
| async function installXhsFetchCaptureHook(page) { | ||
| await page.evaluate(`(() => { | ||
| window.__xhsCapture = {}; |
Comment on lines
+118
to
+126
| const origFetch = window.fetch; | ||
| window.fetch = async function(...args) { | ||
| const resp = await origFetch.apply(this, args); | ||
| try { | ||
| const url = typeof args[0] === 'string' ? args[0] : (args[0] && args[0].url) || ''; | ||
| if (url.includes('/api/galaxy/')) { | ||
| resp.clone().text().then((body) => { | ||
| try { window.__xhsCapture[url] = { status: resp.status, ok: resp.ok, body }; } catch (_) {} | ||
| }).catch(() => {}); |
Comment on lines
+256
to
+271
| for (let pageNum = 2; pageNum <= neededPages && items.length < limit; pageNum++) { | ||
| const clicked = await page.evaluate(`(() => { | ||
| const target = String(${pageNum}); | ||
| // .d-pagination-page renders the page number doubled (a visible span + | ||
| // an accessibility span), so textContent for page 2 reads "22". Match | ||
| // both the raw digit and the doubled form to tolerate either render. | ||
| const btns = Array.from(document.querySelectorAll('.d-pagination-page')); | ||
| const match = btns.find((btn) => { | ||
| const text = (btn.textContent || '').trim(); | ||
| return text === target || text === target + target; | ||
| }); | ||
| if (match) { match.click(); return true; } | ||
| return false; | ||
| })()`); | ||
| if (!clicked) break; | ||
| const before = items.length; |
| const notes = mapAnalyzeItems(items).slice(0, limit); | ||
| const missingTitles = notes.filter((note) => !note.title).length; | ||
| if (missingTitles > 0) { | ||
| const titleMap = await fetchNoteManagerTitleMap(page, notes.length); |
Comment on lines
+211
to
+217
| return page.evaluate(`(() => { | ||
| const firstCard = document.querySelector('div.note[data-impression]'); | ||
| let el = firstCard && firstCard.parentElement; | ||
| while (el) { | ||
| const s = window.getComputedStyle(el); | ||
| if ((s.overflowY === 'auto' || s.overflowY === 'scroll') && el.scrollHeight > el.clientHeight + 10) { | ||
| el.scrollTop = el.scrollHeight; |
Comment on lines
+242
to
+295
| async function fetchCreatorNotesByCapture(page, limit) { | ||
| // Land on dashboard root before installing the hook so the data-analysis | ||
| // SPA navigation fires page_num=1's signed request UNDER the hook. | ||
| await page.goto('https://creator.xiaohongshu.com/statistics'); | ||
| await installXhsFetchCaptureHook(page); | ||
| await page.evaluate(`(() => { | ||
| history.pushState({}, '', '/statistics/data-analysis?source=official'); | ||
| window.dispatchEvent(new PopStateEvent('popstate')); | ||
| })()`); | ||
| let captureMap = await pollCaptureMap(page); | ||
| let { items, total } = harvestAnalyzeListCaptures(captureMap); | ||
| if (items.length === 0) return []; | ||
| const totalPages = total > 0 ? Math.ceil(total / NOTE_ANALYZE_PAGE_SIZE) : 1; | ||
| const neededPages = Math.min(totalPages, Math.ceil(limit / NOTE_ANALYZE_PAGE_SIZE)); | ||
| for (let pageNum = 2; pageNum <= neededPages && items.length < limit; pageNum++) { | ||
| const clicked = await page.evaluate(`(() => { | ||
| const target = String(${pageNum}); | ||
| // .d-pagination-page renders the page number doubled (a visible span + | ||
| // an accessibility span), so textContent for page 2 reads "22". Match | ||
| // both the raw digit and the doubled form to tolerate either render. | ||
| const btns = Array.from(document.querySelectorAll('.d-pagination-page')); | ||
| const match = btns.find((btn) => { | ||
| const text = (btn.textContent || '').trim(); | ||
| return text === target || text === target + target; | ||
| }); | ||
| if (match) { match.click(); return true; } | ||
| return false; | ||
| })()`); | ||
| if (!clicked) break; | ||
| const before = items.length; | ||
| for (let attempt = 0; attempt < CAPTURE_POLL_ATTEMPTS; attempt++) { | ||
| await page.wait(CAPTURE_POLL_INTERVAL_S); | ||
| const raw = await page.evaluate('JSON.stringify(window.__xhsCapture || {})'); | ||
| captureMap = typeof raw === 'string' ? JSON.parse(raw) : {}; | ||
| const harvested = harvestAnalyzeListCaptures(captureMap); | ||
| if (harvested.items.length > before) { | ||
| items = harvested.items; | ||
| total = Math.max(total, harvested.total); | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| const notes = mapAnalyzeItems(items).slice(0, limit); | ||
| const missingTitles = notes.filter((note) => !note.title).length; | ||
| if (missingTitles > 0) { | ||
| const titleMap = await fetchNoteManagerTitleMap(page, notes.length); | ||
| for (const note of notes) { | ||
| if (!note.title && note.id && titleMap.has(note.id)) { | ||
| note.title = titleMap.get(note.id); | ||
| } | ||
| } | ||
| } | ||
| return notes; | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The
/api/galaxy/creator/datacenter/note/analyze/listendpoint serves 10 note rows per page, and the previousfetchCreatorNotesByApionly ever requested page 1 because the in-page directfetch()bypassed xhs's signing interceptor and got HTTP 406 on subsequent pages. As a resultopencli xiaohongshu creator-notes --limit 25silently capped at 10 even for accounts with hundreds of notes.Install the same
window.__xhsCapturefetch + XHR hook used bycreator-note-detail(#1732), SPA-navigate to/statistics/data-analysisso the dashboard fires its own signedpage_num=1request under the hook, then click.d-pagination-pagebuttons for pages 2..N to make the dashboard's React router fire successive signed requests. Dedupe bynote.idand return up to--limit.The pagination buttons render the page number duplicated in
textContent("22"for page 2 because of an inner accessibility span + visible span), so the click selector tolerates both the raw digit and the doubled form.CAPTURE_POLL_ATTEMPTS/CAPTURE_POLL_INTERVAL_Smatch the constant naming used by siblingdelete-note.js.Fresh-published notes return
title: ""from/note/analyze/list(xhs's own data-analysis UI labels them无标题笔记). To preserve the title coverage that the existing DOM-fallback path delivered, the helper enriches missing titles from the/new/note-managercard DOM — that view derives a title from the note's first line of content, which is what users actually see in their note-manager UI. Selectors are structural (scrollable-ancestor walk, indexed reads) rather than Chinese-text matches.Related issue: Closes #1729. Reporter diagnosis: @ppop123 traced the signing-bypass + pagination click pattern and verified on 86-148-note accounts.
Type of Change
Checklist
Documentation (if adding/modifying an adapter)
docs/adapters/(if new adapter)docs/adapters/index.mdtable (if new adapter)docs/.vitepress/config.mts(if new adapter)README.md/README.zh-CN.mdwhen command discoverability changedCliErrorsubclasses instead of rawErrorScreenshots / Output
Before (capped at 10 even on accounts with more notes):
After (live-verified on benjamin-eecs's account, 11 published notes, data-permission active):
All 11 rows returned with titles populated. Rank 11 is the oldest note (past the first hydration batch of note-manager) and required the inner-list scroll-load to surface its title; the simpler implementation that didn't scroll returned its title empty.