Implement updates tracking OWID grapher by xrendan · Pull Request #6 · BuildCanada/bcds

xrendan · 2026-02-26T18:55:17Z

Context

Links to issues, Figma, Slack, and a technical introduction to the work.

Screenshots / Videos / Diagrams

Add if relevant, i.e. might not be necessary when there are no UI changes.

Testing guidance

Step-by-step instructions on how to test this change

Does the change work in the archive?
Does the staging experience have sign-off from product stakeholders?

Reminder to annotate the PR diff with design notes, alternatives you considered, and any other helpful context.

Checklist

(delete all that do not apply)

Before merging

Google Analytics events were adapted to fit the changes in this PR
Changes to CSS/HTML were checked on Desktop and Mobile Safari at all three breakpoints
Changes to HTML were checked for accessibility concerns

If DB migrations exists:

If columns have been added/deleted, all necessary views were recreated
The DB type definitions have been updated
The DB types in the ETL have been updated
If tables/views were added/removed, the Datasette export has been updated to take this into account
Update the documentation in db/docs

After merging

If a table was touched that is synced to R2, the sync script to update R2 has been run

Mixed-content cells (e.g. a paragraph + a list) only indexed the list items, silently discarding all other visible text. Now each cell's HTML is converted to enriched blocks via htmlToEnrichedBlocks and processed through the same enrichedBlocksToIndexableText path as regular table cells. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use a generic type constraint instead of a union of specific interfaces, adding OwidGdocProfileInterface support for search indexing.

- Test "returns undefined for component blocks" now uses prominent-link (a true no-text block) instead of chart which can have a caption - Rename "skip component blocks" test to clarify it targets caption-less charts - Add new test verifying chart captions are included in indexable text Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…r search (#6103) ## Summary Replaces the indirect markdown-to-plaintext pipeline with a new `enrichedBlocksToIndexableText` module that converts enriched Gdoc blocks directly to plaintext for Algolia search indexing, with comprehensive test coverage. ## Rationale: why branch out of the markdown pipeline The previous approach to generating search-indexable text repurposed the markdown pipeline: enriched blocks → markdown (via `enrichedToMarkdown`) → strip custom component tags → strip markdown formatting → regex cleanup. This worked but had two structural issues: **The markdown round-trip is lossy and wasteful.** `spanToMarkdown` wraps formatting spans in markdown syntax (`**bold**`, `_italic_`, `[text](url)`) and `formatGdocMarkdown` immediately strips it back out — imprecisely, because `MarkdownTextWrap` couldn't handle all variants. The workaround was to remove all asterisks wholesale and use regex heuristics to strip footnote numbers (`word.1` → `word.`). These are fragile patches on a representation that discards the structural information (like `span-ref` for footnotes) that would have made clean extraction trivial. **The markdown pipeline's inclusion decisions didn't match the search use case.** Some blocks that aren't meaningful narrative content (e.g. `prominent-link` titles/URLs, `research-and-writing` URL lists, `pill-row` navigation links) were included in the markdown and survived the stripping pipeline into search results. This could have been fixed in `enrichedBlocksToMarkdown` itself, but it highlights that search was inheriting inclusion decisions that didn't match its scope. The new `enrichedBlocksToIndexableText` module sidesteps both issues by operating directly on the enriched block AST with an explicit, search-specific indexing policy: - **No round-trip:** formatting spans are unwrapped to text content in one step; footnote refs (`span-ref`) are simply skipped — no regex needed - **Explicit policy per block type:** narrative content is indexed (text, headings, blockquotes, callouts, lists, tables, key insights, captions); navigational/promotional/UI blocks are explicitly excluded - **Paragraph-aware chunking:** block boundaries are preserved as `\n\n` so `chunkParagraphs` can split semantically, then flattened per-chunk — the old pipeline collapsed all newlines before chunking - **No dependency on** **`gdoc.markdown`:** reads directly from `gdoc.content.body`, decoupling search indexing from the markdown pipeline Linked callout resolution carries over from the old pipeline — both resolve `span-callout` values identically. ## Test cases Issues failing on production, fixed on staging (points the next staging server up the stack to get better testing tools with admin preview): | Problem | Production | Staging | | --- | --- | --- | | Missing spaces between paragraphs with a chart in between | [link](https://ourworldindata.org/search?q=Complications+from+measles+are+most+severe&resultType=writing) — snippet shows `infection.Complications` with no space between sentences | [link](http://staging-site-feat-add-plain-text-preview/search?q=Complications+from+measles+are+most+severe&resultType=writing) — fixed: chart blocks return `undefined`, `joinBlocksAsParagraphs` inserts `\n\n` separators so output becomes `infection. Complications` | | Missing spaces/delimiters around cells of raw HTML tables | [link](https://ourworldindata.org/search?q=Armed+conflicts%3A+interstate%2C+intrastate%2C+extrastate&resultType=writing) — `toPlaintext()` strips HTML tags but adds no whitespace between cells, producing `UCDPArmed conflicts: interstate, intrastate, extrastate...` | [link](http://staging-site-feat-add-plain-text-preview/search?q=Armed+conflicts%3A+interstate%2C+intrastate%2C+extrastate&resultType=writing) — fixed: cheerio parses HTML tables, cells joined with `\|`, list items with `; ` | | Missing spaces around delimiters of regular tables | [link](https://ourworldindata.org/search?q=Estimate+of+the+effect+size&resultType=writing) — snippet shows `\|Intervention\|Estimate of the effect size\| \|Handwashing with soap\|48% risk …` with no spaces around pipe delimiters | [link](http://staging-site-feat-add-plain-text-preview/search?q=Estimate+of+the+effect+size&resultType=writing) — fixed: snippet shows `Intervention \| Estimate of the effect size \| Handwashing with soap \| 48% risk reduction` with proper spacing | | Missing spaces around headers | [link](https://ourworldindata.org/search?q=emissions+changed+over+time+in+the+visualizations+above&resultType=writing) — heading text runs into adjacent paragraph: `...visualizations above.How have emissions changed over time` | [link](http://staging-site-feat-add-plain-text-preview/search?q=emissions+changed+over+time+in+the+visualizations+above&resultType=writing) — fixed: `joinBlocksAsParagraphs` inserts `\n\n` between all blocks including headings | | Href of prominent links to non-gdoc URLs shown | [link](https://ourworldindata.org/search?q=childhood+stunting&resultType=writing) — snippet shows `What is childhood stunting?https://ourworldindata.org/stunting-definitionExplore our page on …` with raw URL leaked into text | [link](http://staging-site-feat-add-plain-text-preview/search?q=childhood+stunting&resultType=writing) — fixed: `prominent-link` blocks excluded entirely, no raw URLs in snippets | | Endnotes not being filtered out | [link](https://ourworldindata.org/search?q=L%C3%BChrmann%2C+Anna%2C+Marcus+Tannnberg%2C+and+Staffan+Lindberg&resultType=writing) — snippet shows endnote citation text: `Lührmann, Anna, Marcus Tannnberg, and Staffan Lindberg. 2018. Regimes of the World (RoW): Opening New Avenues …` | [link](http://staging-site-feat-add-plain-text-preview/search?q=L%C3%BChrmann%2C+Anna%2C+Marcus+Tannnberg%2C+and+Staffan+Lindberg&resultType=writing) — fixed: no results returned — endnote content no longer indexed (`span-ref` returns `""`) | | Footnote numbers not preceded by "." not excluded in body | [link](https://ourworldindata.org/search?q=and+distinguishes+between+two+types+of+democracies&resultType=writing) — snippet shows stray footnote number: `(V-Dem) project2 and distinguishes between two types of democracies` | [link](http://staging-site-feat-add-plain-text-preview/search?q=and+distinguishes+between+two+types+of+democracies&resultType=writing) — fixed: snippet shows `(V-Dem) project and distinguishes between two types of democracies` — stray `2` removed | | Missing spaces around headers | [link](https://ourworldindata.org/search?q=How+effective+is+the+measles+vaccine%2C+and+is+it+safe%3F&resultType=writing) — heading merges into paragraph: `...end of paragraph.How effective is the measles vaccine` | [link](http://staging-site-feat-add-plain-text-preview/search?q=How+effective+is+the+measles+vaccine%2C+and+is+it+safe%3F&resultType=writing) — fixed: `joinBlocksAsParagraphs` adds `\n\n` separators and `.` terminators, ensuring spaces around all headings | ## Test plan _A global before/after comparison would be too noisy to be useful. A more useful approach is to look at this from the perspective of what content should make it into the index, compare against the extraction rules and promote as a new baseline_ - [x] Run `yarn test run db/model/Gdoc/enrichedToIndexableText.test.ts` — all tests pass - [x] Run `yarn typecheck` — no type errors - [x] Verify search results on staging maintain proper formatting across paragraph/sentence boundaries - [x] Verify all failure modes before/after - [ ] Does the staging experience have sign-off from product stakeholders? 🤖 Generated with [Claude Code](https://claude.com/claude-code)

## Context This PR adds a new "Plain text" preview mode to the Algolia index preview drawer in the admin site. This allows users to view the raw text content extracted from a Google Doc, which can be useful for debugging search indexing issues. ## Screenshots / Videos / Diagrams ![Screenshot 2026-02-13 at 10.07.09.png](https://app.graphite.com/user-attachments/assets/34ee1f60-1e5f-400e-891d-f8e11d3f1ea5.png) ## Testing guidance 1. Open a Google Doc in the admin site 2. Click on the "Index" button to open the index preview drawer 3. Toggle between "Algolia records" and "Plain text" modes 4. Verify that both modes display the expected content 5. Verify that the loading states work correctly for both modes

## Context The BDD test "Search from homepage with country extraction" was flaky — it failed on 2 of 3 retries in [build #28275](https://buildkite.com/our-world-in-data/grapher-automated-staging-environment/builds/28275). **Root cause**: Country extraction and URL sanitization happen in a React `useEffect` that runs after the first paint. The test's synchronous `page.url()` check could see the stale URL (e.g. `?q=co2+france`) before the effect rewrote it to the sanitized form (`?q=co2&countries=France`). **Fix**: Replace all synchronous `page.url()` assertions with Playwright's polling `expect(page).toHaveURL()`, which retries until the URL matches. Consolidated all URL param helpers into a single generic `expectUrlParam` that handles exact match, absence, and multi-value `~`-separated params. ## Testing guidance - BDD tests should pass consistently without needing retries for the country extraction scenario.

…tributes

Improves accessibility and fixes #5930.

* Add containerTitle to charts search index This enables searching by multi-dim and explorer titles, which are sometimes different than the titles of their views. Test case queries: - multi-dim: `childhood vaccination coverage` - explorer: `species habitat availability` Fixes #5243 * 🐝 trigger CI --------- Co-authored-by: Marigold <mojmir.vinkler@gmail.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Resolves #5060

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Small Grapher refactors I did as part of the Causes of Death Treemap - Rename GrapherTooltipAnchor options — renames enum values in GrapherTypes.ts (e.g. for clarity) - Make TooltipValue component more flexible — adjusts TooltipContents.tsx to be more reusable - Extract sparkline component — moves the sparkline out of DataTable.tsx into a new sparkline/Sparkline.tsx file - Extract SASS variables — pulls shared SCSS variables (colors, sizes, etc.) from grapher.scss into a new core/variables.scss - Split tooltip components into separate files — breaks the monolithic Tooltip.tsx into TooltipCard.tsx and TooltipContainer.tsx - Drop NO_DATA_LABEL import from ColorScale — removes an unused import - Move makeAxisLabel to AxisUtils — relocates the helper from ChartUtils.tsx to axis/AxisUtils.ts

Refactors TextWrap and MarkdownTextWrap. The main motivation is to refactor MobX away so that we can use these utilities in bespoke projects, but I went a bit further and also split state from rendering and introduced a common interface for TextWrap and MarkdownTextWrap. In summary, - Removed MobX from `TextWrap` and `MarkdownTextWrap`, either dropping `@computed` or replacing it with `@imemo` - Convert MarkdownTextWrap from a React component to a plain class, removing the JSX rendering pattern - Separate state from rendering by extracting render methods into standalone React components: TextWrapSvg, TextWrapHtml, MarkdownTextWrapSvg, MarkdownTextWrapHtml - Introduce a shared TextWrap interface

This PR renders Grapher’s Dropdown component in the example bespoke viz, since most bespoke viz projects will likely need it, but there is a bit of setup required to make React Aria work with the Shadow DOM. I initially thought we’d need to portal the popover into the shadow DOM. But while porting this code over from the Causes of Death project, where I first experimented with this, I realised the solution is much simpler: it’s actually fine to attach the popover to the body outside the shadow DOM. What’s then missing are the popover styles, because they don’t exist on the demo page. But they should exist on any OWID page, since they’re bundled into owid.css, right? So the simplest fix seems to be to just import those styles on the demo page. Of course, this is a bit brittle, because we’re relying on the embedder to provide those styles. But I think it should be fine in practice, since we’ll always be embedding these in GDoc articles that live on our site.

Adds a few shared components for bespoke projects and adds an example chart to the example project. In summary, - Adds reusable components for bespoke projects: ChartHeader, ChartFooter, Frame, TimeSlider, and BezierArrow - Adds a shared useDimensions hook for responsive chart sizing via ResizeObserver - Adds a new "chart" variant to the example bespoke project that demonstrates how to compose these shared components using `@visx` packages - Improve the layout of the demo page (the boxes looked nice for the small examples, but I found it distracting when working on the Causes of Death Treemap)

Closes #6229

Explicitly pass ADMIN_SERVER_PORT, VITE_PORT, WRANGLER_PORT, and COMPOSE_PROJECT_NAME to tmux shell commands so user overrides are respected. Also use TMUX_SESSION_NAME in up.devcontainer instead of a hardcoded session name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Forward env vars to tmux subshells in Makefile

mlbrgl and others added 7 commits February 26, 2026 17:42

🔨🤖 Generalize getPreprocessedIndexableText to accept profile gdocs

16206fd

Use a generic type constraint instead of a union of specific interfaces, adding OwidGdocProfileInterface support for search indexing.

feat: add plain text preview mode to index preview

f7addc4

github-actions bot assigned xrendan Feb 26, 2026

sophiamersmann and others added 22 commits February 27, 2026 16:28

enhance: add none as peerCountryStrategy

10da5c5

Merge branch 'master' into copilot/test-datasetproducers-searchableat…

afa3815

…tributes

Autofocus search input only when it's empty (#6155)

fa5819f

Improves accessibility and fixes #5930.

trigger staging build

eeb0703

Merge pull request #6161 from owid/add-peer-strategy-none

136fa12

build(deps): bump rollup from 4.46.2 to 4.59.0 (#6167)

a95a9ea

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

build(deps): bump multer from 2.0.2 to 2.1.0 (#6168)

662b289

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Add option to force a chart to be a datapage (#6164)

ba9ca3d

Resolves #5060

🐛 fix prominent links in key insight blocks

296ae6b

Migrate from prettier to oxfmt (#6146)

9070d3d

Allow previewing chart with forceDatapage

eee06f6

chore(deps): update algolia

301f421

chore(deps): update sentry

13e3dcd

chore(deps): update typeorm, dayjs

f879aba

chore(deps): update glob

2759cb1

chore(deps): update node-gyp

152958d

chore(deps): update google packages

c46d8b1

build(deps): bump minimatch from 3.1.2 to 3.1.5 (#6173)

aaee256

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

chore(deps): update mysql2

03a3fab

chore(deps): update typescript native preview

36fe098

chore: fix tsgo issues

6ef0eca

sophiamersmann and others added 30 commits March 19, 2026 09:40

🔨 rename GrapherTooltipAnchor options

ee1f397

🔨 make TooltipValue component more flexible

285fa85

🔨 extract sparkline component

83bcaf1

🔨 extract sass variables

f71c490

🔨 split tooltip components into separate files

db2cd1a

🔨 drop NO_DATA_LABEL import from ColorScale

9c820ba

🔨 move makeAxisLabel to AxisUtils file

fd646c1

🔨 split tooltip components into more files

2502f22

🔨 drop mobx from TextWrap

acb6431

🔨 drop mobx from MarkdownTextWrap

2874bfd

🔨 separate state and rendering for TextWraps

6d4a7ba

🔨 separate state and rendering for MarkdownTextWrap

54dca31

🔨 use default options object for TextWrap classes

5d0cbd6

🔨 introduce shared TextWrap interface

5f8a422

🔨 drop unnecessary eslint-disable

bfa0399

🔨 change TextWrap extension to ts

0e2bf51

✨ add dropdown component to example bespoke project

1b4b2fa

🎉 add shared bespoke hooks

f43be59

🎉 (bespoke) add shared components and example chart

15ef71e

🔨 improve type safety

1af9436

🔨 improve type safety

e50dde1

🐛 fix cc tooltip in the footer

a261999

Fix invalid Algolia autocomplete CSS vars (#6258)

b18d0d3

Closes #6229

Merge pull request #6260 from mlbrgl/makefile-forward-env-vars-to-tmux

635cbbf

Forward env vars to tmux subshells in Makefile

✨ country-profile-selector anchor tag

788aada

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement updates tracking OWID grapher#6

Implement updates tracking OWID grapher#6
xrendan wants to merge 703 commits intoBuildCanada:masterfrom
owid:master

xrendan commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

xrendan commented Feb 26, 2026

Context

Screenshots / Videos / Diagrams

Testing guidance

Checklist

Before merging

After merging

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants