Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML search: serialize search index as JSON #13097

Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
47f6547
search: check that query terms exist as properties in term indices be…
jayaddison Oct 24, 2024
713d6bb
Tests: fixtures: add document title for ECMAScript testroot
jayaddison Nov 2, 2024
0fff170
Tests: add JavaScript test coverage for prototype-property-name query
jayaddison Nov 2, 2024
64d6e09
Add CHANGES.rst entry
jayaddison Nov 2, 2024
9bab17c
Tests: add negative-test case
jayaddison Nov 5, 2024
9b1edac
search: add explanatory comment for document title/text term lookup
jayaddison Nov 11, 2024
52b9de3
search: retain `undefined` result instead of `[]` for non-matching terms
jayaddison Nov 11, 2024
bcb02fe
search: add explanatory comment regarding `hasOwnProperty` usage
jayaddison Nov 11, 2024
7267990
Merge branch 'master' into issue-13096/html-search-handle-proto-query
AA-Turner Nov 13, 2024
462c859
HTML search: encapsulate the search index content within a JSON string
jayaddison Nov 14, 2024
443fd21
HTML search: use the `JSON.parse` `reviver` argument to freeze search…
jayaddison Nov 14, 2024
295d230
Merge branch 'master' into issue-13096/html-search-handle-proto-query
jayaddison Nov 14, 2024
3498755
Tests: fixup: accommodate adjusted `setIndex` prefix/suffix lengths
jayaddison Nov 14, 2024
f38ce20
HTML search: replace `repr` with simple embedded single-quotes
jayaddison Nov 14, 2024
0d9a26c
Revert "HTML search: use the `JSON.parse` `reviver` argument to freez…
jayaddison Nov 15, 2024
d4bba1e
WIP: HTML search: use JS `Map`s in serialized index
jayaddison Nov 15, 2024
d9dee52
HTML search: fixup for object search
jayaddison Nov 15, 2024
d056a8a
HTML search: ensure array-format index entries are sorted
jayaddison Nov 16, 2024
15a7dfa
HTML search: refactor-out redundant code
jayaddison Nov 16, 2024
6e670ea
HTML search: rectify variable names
jayaddison Nov 16, 2024
f2848ed
HTML search: compress index representation
jayaddison Nov 16, 2024
798cb48
HTML search: enclose index names in double-quotes
jayaddison Nov 16, 2024
af61293
Tests: HTML search: use `ast.literal_eval` to parse index contents
jayaddison Nov 16, 2024
2bcf7a6
Tests: HTML search: update test expectation
jayaddison Nov 16, 2024
381c26f
HTML search: cleanup: remove redundant `assert` statement
jayaddison Nov 16, 2024
ac281dc
HTML search: typing/lint fixups for `ruff` and `mypy`
jayaddison Nov 16, 2024
fae81a0
Edit CHANGES.rst entry
jayaddison Nov 16, 2024
6d5a121
HTML search: use JSON format for the search index
jayaddison Nov 24, 2024
39b4a4e
Merge branch 'master' into issue-13096/html-search-handle-proto-query
jayaddison Nov 24, 2024
7428932
Tests: cleanup: remove unused `ast` import
jayaddison Nov 24, 2024
22acc49
HTML search: refactor loader to use Fetch API
jayaddison Nov 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ Bugs fixed

* #13060: HTML Search: use ``Map`` to store per-file term scores.
Patch by James Addison
* #13097: HTML Search: add a precautionary check for query term
presence in index properties before accessing them.
Patch by James Addison

Testing
-------
4 changes: 2 additions & 2 deletions sphinx/themes/basic/static/searchtools.js
Original file line number Diff line number Diff line change
Expand Up @@ -514,8 +514,8 @@ const Search = {
searchTerms.forEach((word) => {
const files = [];
const arr = [
{ files: terms[word], score: Scorer.term },
{ files: titleTerms[word], score: Scorer.title },
{ files: terms.hasOwnProperty(word) ? terms[word] : [], score: Scorer.term },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually a little confused by how this might work, as it doesn't appear that JavaScript supports setting the __proto__ property in an object. Am I missing something? Should we be using a map to store these terms instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When parsing JSON, an object key named "__proto__" is accepted, and converted to the JavaScript object key ["__proto__"] -- distinct from the __proto__ property (a reference to a class prototype).

During read-access, introducing hasOwnProperty allows us to filter out the latter, so that we don't mistakenly retrieve the prototype class reference from terms.

I'm not certain whether the below example will help to clarify, but I find experimentation helpful to understand some of the behaviours:

> x = JSON.parse('{}')
{}
> y = JSON.parse('{"__proto__": "testing"}')
{ ['__proto__']: 'testing' }
> '__proto__' in x
true
> x.hasOwnProperty('__proto__')
false
> '__proto__' in y
true
> y.hasOwnProperty('__proto__')
true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to respond to your Map question: that is certainly possible, although I feel that doing so might add some runtime overhead (since we'd need to load the native JS object contents from the index into respective Map objects).

As I understand it, Map objects are also more difficult to make immutable in JavaScript than Object and Array instances. Freezing a Map object doesn't prevent its private contents from being updated:

> x = {'foo': 'bar'}
{ foo: 'bar' }
> Object.freeze(x)
{ foo: 'bar' }
> x['foo'] = 'baz'
'baz'
> x['foo']
'bar'
> y = new Map([['foo', 'bar']])
Map(1) { 'foo' => 'bar' }
> Object.freeze(y)
Map(1) { 'foo' => 'bar' }
> y.set('foo', 'baz')
Map(1) { 'foo' => 'baz' }
> y.get('foo')
'baz'

So another factor is that, to the extent that I'd like to see the search index contents become immutable at load-time in future (#13098), I'd prefer not to migrate to the Map type unless there is a compelling reason to (maybe avoiding prototype pollution is that reason -- but I don't think we're exposed to that currently).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok got it, this makes sense to me re: the property. TIL!

I'm a little bit less sure about whether the index should be immutable, but happy to defer that discussion to another day.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, one extra piece of optional feedback: since this is kind of counter-intuitive, you might want to add a comment on this line explaining what you're doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, OK - I'll try to add some phrasing to explain this succinctly. Thanks for the code review!

{ files: titleTerms.hasOwnProperty(word) ? titleTerms[word] : [], score: Scorer.title },
jayaddison marked this conversation as resolved.
Show resolved Hide resolved
];
// add support for partial matches
if (word.length > 2) {
Expand Down
1 change: 1 addition & 0 deletions tests/js/fixtures/ecmascript/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Empty file.
8 changes: 8 additions & 0 deletions tests/js/roots/ecmascript/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
ECMAScript
----------

This is a sample JavaScript (aka ``ECMAScript``) project used to generate a search engine index fixture.

.. js:attribute:: Object.__proto__
jayaddison marked this conversation as resolved.
Show resolved Hide resolved

Used to access the `prototype <https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Objects/Object_prototypes>`_ of an object instance.
32 changes: 32 additions & 0 deletions tests/js/searchtools.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,38 @@ describe('Basic html theme search', function() {

});

describe('can handle edge-case search queries', function() {

it('can search for the javascript prototype property', function() {
eval(loadFixture("ecmascript/searchindex.js"));

searchParameters = Search._parseQuery('__proto__');

hits = [
[
'index',
'Object.__proto__',
'#Object.__proto__',
'JavaScript attribute, in ECMAScript',
16,
'index.rst',
'object'
]
];
expect(Search._performSearch(...searchParameters)).toEqual(hits);
});

it('does not find the javascript prototype property in unrelated documents', function() {
eval(loadFixture("partial/searchindex.js"));

searchParameters = Search._parseQuery('__proto__');

hits = [];
expect(Search._performSearch(...searchParameters)).toEqual(hits);
});

});

});

describe("htmlToText", function() {
Expand Down
Loading