Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use "name:en" as a fallback for when no default name was found #498

Merged
merged 2 commits into from
Aug 14, 2019

Conversation

missinglink
Copy link
Member

resolves #497

@mihneadb
Copy link

mihneadb commented Aug 13, 2019

@missinglink great stuff for adding this! I was wondering, in the context of pelias/api#1301, wouldn't it make sense to import all available names in the docs?
Like even if there is no default, but there is, say, name:ru, import that under ru.

@missinglink
Copy link
Member Author

@mihneadb we're already doing that in https://github.com/pelias/openstreetmap/blob/master/stream/tag_mapper.js#L37-L45, so it's already set up for allowing us to query for various languages at runtime :)

The issue highlighted by #497 is that the address is available as GID openstreetmap:address:node/6600732186 but the venue should also be available as GIDopenstreetmap:venue:node/6600732186 too, which it isn't at the moment.

@missinglink
Copy link
Member Author

missinglink commented Aug 13, 2019

Our code requires a default name be present or the record is discarded, its set up like that because other code expects a default name to always be present and having to do language checks all over the place would be messy.

Having said that, it might be something we need to change in the future to improve i18n.

@mihneadb
Copy link

Thanks for the insight. That still means that if a place has only a some-other-language-than-EN name it will be discarded as a venue, right?

@missinglink
Copy link
Member Author

Unfortunately yes, the order of priority for default would now be the following OSM tags:

  1. name
  2. loc_name
  3. alt_name
  4. short_name
  5. name:en

Do you think it would be best to add another level of fallback which accepts any other set language in the case all of these fail?
cc/ @orangejulius thoughts?

@missinglink
Copy link
Member Author

We could also check if the official or national tags are set and use them where available?
ref: https://github.com/pelias/openstreetmap/blob/master/schema/name_osm.js

@orangejulius
Copy link
Member

This PR looks good as is.

Another thing we might consider is this, for essentially item 6 on the order of priority above:

If there is any single name:* tag and no name tag, then assign that name:* tag to name.default

So for example if there is only name:es, then that would become the default name.

@missinglink
Copy link
Member Author

missinglink commented Aug 13, 2019

What about if I change the PR to this:

      // Handle the case where no default name was set but there were
      // other names which could use as the default.
      if( !doc.getName('default') ){

        var defaultName =
          doc.getName('official') ||
          doc.getName('international') ||
          doc.getName('national') ||
          doc.getName('regional') ||
          doc.getName('en');

        // use one of the preferred name tags
        if ( defaultName ){
          doc.setName('default', defaultName);
        }
        
        // else just select the first available name tag
        else {
          var tags = Object.keys(doc.name)
          if ( !!tags.length ){
            doc.setName('default', doc.getName(tags[0]));
          }
        }
      }

I ran the end-to-end tests and it didn't catch any other cases, but since it's only covering Vancouver (because its small and dual-language), but this issue might be more exaggerated in other parts of the world.

@orangejulius
Copy link
Member

I don't really like the idea of using the first name tag. That would effectively prioritize languages in alphabetical order, which doesn't really make sense.

It's hard to evaluate the code snippet above, since it's not a diff. How about we merge this PR and then discuss refactoring/other improvements in their own PR?

orangejulius
orangejulius previously approved these changes Aug 13, 2019
@missinglink
Copy link
Member Author

missinglink commented Aug 13, 2019

Yea, picking the first name will either favour tags by lexical sorting or insertion order, which isn't very elegant 😟

In a rare case where none of the preferred tags are available and name:en is also not available and also there are multiple other language names available, in that case, we would pick one at random the first one.

Since its fairly rare (I'm assuming), we might be better off selecting any name rather than no name, just so the record isn't discarded during import.

Is there possibly another algorithm we could use to prioritize languages?

@orangejulius orangejulius dismissed their stale review August 13, 2019 15:43

out of date

@mariusa
Copy link

mariusa commented Aug 14, 2019

Since its fairly rare (I'm assuming), we might be better off selecting any name rather than no name, just so the record isn't discarded during import.

+1

I don't really like the idea of using the first name tag.
Agreeing that having a name, even first, is better than nothing, let's see if we can improve this.
Note that 'en' is already prioritized. If 'en' isn't present either, what to do?

How about trying the language code which matches the country of the place? This needs a map of official languages for each country code though. It would have to be sorted by language popularity too (eg Switzerland has 4 languages)

I guess this could also be filled as a separate enhancement request. It's lower priority and shouldn't block this PR.

@missinglink
Copy link
Member Author

missinglink commented Aug 14, 2019

okay, here are some stats I generated today from a planet build.

note: I changed the code slightly to only consider language tags for the 'select first available'
ie. var keys = Object.keys(doc.name).filter(n => n.length === 2);
this was to avoid selecting a tag such as 'old_name' which would be undesirable.

Adding the first-level fallback (the one which considers ['official', 'international', 'national', 'regional', 'en']) would yeild an additional 11587 records globally.

For the second-level fallback (the contentious bit where we select one of the names prefixed with a language code), we could import an additional 2941 records, of which the split is:

1x name:** tag  present => 2744 records
2x name:** tags present => 194 records
3x name:** tags present => 3 records

some examples:

3x name:** fields

2x name:** fields

note: it looks like some of these OSM record have now been fixed (someone added a 'name' tag) since the extract I used for this analysis was created (~2019.FEB.01)

@missinglink
Copy link
Member Author

missinglink commented Aug 14, 2019

So I'm going to rebase this PR to solve this issue for the 11587 'simple cases' and also the 2744 'unambiguous' cases.

I'm not going to try and tackle the 197 'ambiguous' cases because I don't have time for it, although I'd be happy if someone else would like to address them, I've added examples above to aid with this task.

@missinglink missinglink merged commit f897b83 into master Aug 14, 2019
@missinglink missinglink deleted the name_fallback branch August 14, 2019 11:32
@orangejulius
Copy link
Member

Nice analysis. Looks like a really clean and good solution and thankfully the data shows we don't have to make any difficult choices this time :)

var keys = Object.keys(doc.name).filter(n => n.length === 2);

// unambiguous (there is only a single two-letter name tag)
if ( keys.length === 1 ){

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pursuing this! Looks like the best solution.

michaelkirk pushed a commit to michaelkirk-pelias/openstreetmap that referenced this pull request Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Place in OSM not found in Pelias after importing
4 participants