Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autocomplete: stricter housenumber matching #1308

Closed
missinglink opened this issue Jun 3, 2019 · 5 comments
Closed

Autocomplete: stricter housenumber matching #1308

missinglink opened this issue Jun 3, 2019 · 5 comments

Comments

@missinglink
Copy link
Member

We currently consider housenumber as a 'SHOULD' match condition for autocomplete.

https://github.com/pelias/api/blob/master/query/autocomplete.js#L29

We could probably move this to a 'MUST' condition to improve performance and reduce noise.

I suspect the reason we have this as a SHOULD is to handle the case where the house number doesn't exist in the index, in this case we would return 0 results for a MUST query.

It's probably possible to rewrite these subqueries to be MUST and match term OR NULL using something like https://www.elastic.co/guide/en/elasticsearch/reference/current/null-value.html, or https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-dsl-missing-query.html

@missinglink
Copy link
Member Author

Looking at what we currently have in production there doesn't seem to be any significant noise issues (showing other house numbers on the same street).

The reason for this seems to be that the house number is being matched against the name.default field using a MUST condition, so if the token '180' isn't in the index with 'grolmanstrasse' then 0 results are returned:

/v1/autocomplete?debug=true&text=180 grolmanstrasse
/v1/autocomplete?debug=true&text=grolmanstrasse 180

...so I'm not sure why the subquery isn't already a MUST
any ideas @orangejulius ?

@orangejulius
Copy link
Member

Just came across this old issue that I don't recall from last year. I think the code is effectively meant to link to this line, which doesn't appear to have changed:

query.score( peliasQuery.view.address('housenumber') );

I think that the housenumber and street specific query clauses are not having much of an effect, since as mentioned the full address is also included in a must condition. In fact we are looking to make that less strict with #1432.

The only effect I imagine these two query clauses are having is to boost address records. As we are looking at in #1430 and elsewhere, we may or may not want this scoring boost.

I also bet that we could improve performance by skipping these should query clauses all together. Each one requires a lookup in the inverted index for another field, and additional scoring logic. Might be worth experimenting with.

@missinglink
Copy link
Member Author

We need to ensure that 5 E 10 st, 5 E 5 st, 10 E 10 st and 10 E 5 st score appropriately depending on the input.

Ideally we wouldn't show the incorrect versions at all, but the extra noise may be a fair compromise, so long as the ranking is preserved.

@taygun
Copy link

taygun commented Jul 20, 2020

Hello. I have a question regarding the behavior of Pelias autocomplete requests containing street suffixes and prefixes. If I search on autocomplete with "Tartu mnt 5" (mnt = street/road in Estonian) I get results with number "5" only.
If I remove "mnt" and make the request with "Tartu 5" I get the other street numbers

I've looked into the ElasticSearch queries for both requests and I see the difference causing this. My question is why this behavior was chosen? As a user, I would expect that the first result to be "Tartu mnt 5", but also to have other options as well (so that I don't have to write the entire number) regardless of whether or not I write "mnt". A possible list could be "Tartu mnt 5", "Tartu mnt 55", "Tartu mnt 57". Thank you

@orangejulius
Copy link
Member

We just merged #1469 which removes the fields mentioned in this issue. Overall, they cause more issues with POI matches than they appear to be worth, at least in their current form.

Exploring queries that account for missing fields would be great follow up work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants