Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index for all wof languages (when differents to default language) #446

Merged
merged 2 commits into from
May 20, 2019

Conversation

Joxit
Copy link
Member

@Joxit Joxit commented May 16, 2019

What is this for ?

This importer did not index preferred names of all languages.
We need this for the multi-lang search in autocomplete.

I remove all elements when they are present in the default index. That means, if we say New York in the default language and New York in FRA, I will drop the FRA entry. This should prevent a massive increase of ES indexes.

For now, only name:{lang}_x_preferred is used, I need to add the support to label:{lang}_x_preferred
Should I Add variant too ?

related: pelias/api#1296

@Joxit Joxit force-pushed the joxit/feat/multi-lang-index branch from 53c139a to a3f2849 Compare May 16, 2019 15:25
@orangejulius
Copy link
Member

Oooh, looking forward to this. Keep in mind I just made a PR (#444) that drastically changes the format of some tests. You'll probably have to rebase :( On the plus side there are 1000 fewer lines in that test file :)

@Joxit
Copy link
Member Author

Joxit commented May 16, 2019

ha ha thx I saw that, I force push :p

… default language)

This importer did not index preferred names of all languages.
We need this for the multi-lang search in autocomplete.

I remove all elements when they are present in the default index. That means, if we say New York in the default language and New York in FRA, I will drop the FRA entry. This should prevent a massive increase of ES indexes.
@Joxit Joxit force-pushed the joxit/feat/multi-lang-index branch from a3f2849 to 85765c1 Compare May 16, 2019 16:24
@Joxit Joxit marked this pull request as ready for review May 16, 2019 16:24
@Joxit
Copy link
Member Author

Joxit commented May 16, 2019

Updated, I added the support for label:* and variants 😄 ready for review !

…ad of iso3166.

WOF is using iso639 in their names. That caused some fail with 
spanish/spain (spa vs es)
@Joxit
Copy link
Member Author

Joxit commented May 17, 2019

Added commit for tests and iso639. WOF is using iso639 in their names. That caused some fail with
spanish/spain (spa vs esp).

@orangejulius
Copy link
Member

Cool. this looks good to me. I ran a quick import and the size of WOF data increased from 1.8GB to 2.8GB, which is well worth it!

@orangejulius orangejulius merged commit 9a3fc2c into master May 20, 2019
@Joxit Joxit deleted the joxit/feat/multi-lang-index branch May 20, 2019 14:06
@NickStallman
Copy link

Hey @Joxit I just tried doing a planet import and it looks like there is a clash between WOF and this patch.

2019-06-23T22:33:33.340Z - error: [whosonfirst] {
  "id": 1310593459,
  "name": "Bahrdorf",
  "name_aliases": [],
  "name_langs": {
    "de": [
      [
        "Bahrdorf (Samtgemeinde Velpke)"
      ]
    ]
  },```

It looks like the nested array needs to be compacted down to a single array.

@Joxit
Copy link
Member Author

Joxit commented Jun 25, 2019

Hi @NickStallman,
Are you using bundle import or SQLite ?
It seems that 1310593459 is now deprecated in the repository whosonfirst-data-admin-de

I tried a Germany import only (which contains Bahrdorf) and I didn't find any error (with the docker pelias/whosonfirst:master) 🤔

@bboure
Copy link
Member

bboure commented Feb 2, 2020

@Joxit @orangejulius I think this change might have introduced a small bug.
When a language has a preferred name equal to the default, but also has a variant, then the variant is returned by the API.

Example: Madrid
In English, preferred is "Madrid" (same as default), which is skipped.
There is also a variant: "MAD".
This results in ES like this:

"name": {
    "default": [
        "Madrid",
        "MAD"
    ],
	"en": "MAD",
}

And the API returns "MAD".

"properties": {
        "id": "101748283",
        "gid": "whosonfirst:locality:101748283",
        "layer": "locality",
        "source": "whosonfirst",
        "source_id": "101748283",
        "name": "MAD",
        "accuracy": "centroid",
        "country": "Spain",
        "country_gid": "whosonfirst:country:85633129",
        "country_a": "ESP",
        "macroregion": "Community of Madrid",
        "macroregion_gid": "whosonfirst:macroregion:404227387",
        "region": "Provincia de Madrid",
        "region_gid": "whosonfirst:region:85682783",
        "region_a": "MD",
        "locality": "Madrid",
        "locality_gid": "whosonfirst:locality:101748283",
        "continent": "Europe",
        "continent_gid": "whosonfirst:continent:102191581",
        "label": "MAD, Madrid, Spain"
      },

@Joxit
Copy link
Member Author

Joxit commented Feb 2, 2020

Hi @bboure, is your placeholder up and contains Madrid ?
When you use search, it uses placeholder to get the correct name in the correct language.
When placeholder is not present, it fallbacks on ES name.

@bboure
Copy link
Member

bboure commented Feb 2, 2020

@Joxit I was indeed missing data in placeholder. It now works as expected.
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants