12030/fix/title search requires phrase match of all words#12416
12030/fix/title search requires phrase match of all words#12416AKADaniel-hub wants to merge 3 commits intointernetarchive:masterfrom
Conversation
|
Thanks for contributing to Open Library, @AKADaniel-hub! Welcome — this appears to be your first PR to the project. 🤖 Copilot has been assigned for an initial review. @cdrini is assigned to this PR and currently has:
PR triage checklist (maintainers / Pam)
Note This comment was automatically generated by Pam, Open Library's Project AI Manager, on behalf of @mekarpeles. Pam is designed to provide status visibility, perform basic project management functions and relevant codebase research, and provide actionable feedback so contributors aren't left waiting. |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull request overview
This PR updates the header “Title” search facet query marshaling to avoid Solr phrase-query behavior (order-dependent matches) by emitting grouped term queries (e.g. title:(term1 term2)) and adjusting URL parsing/tests accordingly.
Changes:
- Update
SearchBar.marshalBookSearchQueryto emittitle:(...)instead oftitle:"...". - Update
SearchBar.initFromUrlParams(and unit tests) to strip thetitle:prefix and surrounding parentheses.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| openlibrary/plugins/openlibrary/js/SearchBar.js | Changes title-facet query marshaling and URL query parsing for the search bar. |
| tests/unit/js/SearchBar.test.js | Updates unit tests to reflect the new title:(...) query format. |
Comments suppressed due to low confidence (1)
tests/unit/js/SearchBar.test.js:121
- The new
title:(...)marshaling behavior should be covered for inputs containing lucene special characters / boolean operators (e.g.C++ Primer,E-mail,A AND B) to ensure the generated query remains Solr-safe after removing the protective quotes.
describe('marshalBookSearchQuery', () => {
const fn = SearchBar.marshalBookSearchQuery;
test('Empty string', () => {
expect(fn('')).toBe('');
});
test('Adds title prefix to plain strings', () => {
expect(fn('Harry Potter')).toBe('title:(Harry Potter)');
});
test('Does not add title prefix to lucene-style queries', () => {
expect(fn('author:"Harry Potter"')).toBe('author:"Harry Potter"');
expect(fn('"Harry Potter"')).toBe('"Harry Potter"');
});
| if (urlParams.q && this.getCurUrl().pathname.match(/^\/search/)) { | ||
| let q = urlParams.q.replace(/\+/g, ' '); | ||
| if (this.facet.read() === 'title' && q.indexOf('title:') !== -1) { | ||
| const parts = q.split('"'); | ||
| if (parts.length === 3) { | ||
| q = parts[1]; | ||
| q = q.replace('title:', '').trim(); | ||
| if (q.startsWith('(') && q.endsWith(')')){ | ||
| q = q.substring(1, q.length -1); | ||
| } | ||
|
|
||
| } | ||
| this.$input.val(q); |
There was a problem hiding this comment.
initFromUrlParams now only strips title:(...) and will leave legacy title:"..." queries showing quotes in the input (e.g. title:"Harry Potter" becomes "Harry Potter"). Since existing shared URLs may still use the quoted form, consider supporting both wrappers (parentheses and quotes) when q starts with the title: field.
| if (this.facet.read() === 'title' && q.indexOf('title:') !== -1) { | ||
| const parts = q.split('"'); | ||
| if (parts.length === 3) { | ||
| q = parts[1]; | ||
| q = q.replace('title:', '').trim(); | ||
| if (q.startsWith('(') && q.endsWith(')')){ | ||
| q = q.substring(1, q.length -1); | ||
| } |
There was a problem hiding this comment.
The title-prefix stripping logic checks q.indexOf('title:') !== -1 and then does q.replace('title:', ''), which will also strip title: if it appears mid-query (or inside another field name), not just as a leading field. To avoid mangling arbitrary lucene queries in the URL, restrict this to cases where the trimmed query starts with title: (and ideally handle optional whitespace after the colon).
| static marshalBookSearchQuery(q) { | ||
| if (q && q.indexOf(':') === -1 && q.indexOf('"') === -1) { | ||
| q = `title: "${q}"`; | ||
| q = `title:(${q})`; | ||
| } |
There was a problem hiding this comment.
marshalBookSearchQuery now interpolates the raw user input inside title:(...) without escaping lucene/solr special characters. This is a behavioral regression vs the previous quoted form for inputs like C++ Primer, E-mail, M*A*S*H, or A AND B, which can be parsed as operators/wildcards and may error or change meaning. Before wrapping, escape special characters (and lowercase boolean operators) similarly to openlibrary/solr/query_utils.py::fully_escape_query.
| if (q.startsWith('(') && q.endsWith(')')){ | ||
| q = q.substring(1, q.length -1); |
There was a problem hiding this comment.
Formatting: if (q.startsWith('(') && q.endsWith(')')){ is missing spaces before { and around - 1 compared to the surrounding style in this file; this will likely be flagged by linting and hurts readability.
| if (q.startsWith('(') && q.endsWith(')')){ | |
| q = q.substring(1, q.length -1); | |
| if (q.startsWith('(') && q.endsWith(')')) { | |
| q = q.substring(1, q.length - 1); |
| expect(sb.$input.val()).toBe('Harry Potter'); | ||
| sb.initFromUrlParams({q: 'title: "Harry"', facet: 'title'}); | ||
| sb.initFromUrlParams({q: 'title: (Harry)', facet: 'title'}); | ||
| expect(sb.$input.val()).toBe('Harry'); |
There was a problem hiding this comment.
The updated initFromUrlParams test no longer covers the legacy title:"..." URL format. Add a test case asserting that title:"Harry Potter" (and title: "Harry") are still parsed into plain input text, so we don't regress existing shared URLs.
| expect(sb.$input.val()).toBe('Harry'); | |
| expect(sb.$input.val()).toBe('Harry'); | |
| sb.initFromUrlParams({q: 'title:"Harry Potter"', facet: 'title'}); | |
| expect(sb.$input.val()).toBe('Harry Potter'); | |
| sb.initFromUrlParams({q: 'title: "Harry"', facet: 'title'}); | |
| expect(sb.$input.val()).toBe('Harry'); |
Closes #12030
fix: search requires phrase match of all words
Technical
The error is related to how Solr search handles phrase queries. Currently, using
title:"query"matches the exact phrase, requiring terms to appear in that specific order.To allow the title to contain terms regardless of their order (e.g., "harry" AND "potter"), I updated the logic to use grouped terms: title:(term1 term2).
Changes:
Testing
Screenshot
Local test Screenshot:
old

after

Production
old:

expected:

Stakeholders
@cdrini
Please let me know if i went to right direction to address the issue or if there are any improvements I should make.