-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to keep only the types provided in the ner without linking to wikipedia ? #126
Comments
If you want to extract only the Named Entities from the 27 types, without using the lookup to wikipedia, you should just use "ner" in the If you want to have ALL the possible entities (ner and wikipedia) without any link to the knowledge base, then you can use |
Actually when a named entity is "linked", the type is removed - it's a design choice. The idea is that a fully disambiguated entity is more informative than a type and this avoids to have a named entity type inconsistent with the Wikidata disambiguated entity (the type was left in the past and that was looking really bad, so I removed it). So the only way to get the NE "type" is indeed to have You could simply run the NER component (https://github.com/kermitt2/grobid-ner) - but it is a library... there is no web service. If you're only interested in geographical named entity, a NER based on Ontonotes is more relevant and modern Deep Learning implementation will perform much better than grobid-ner (for instance from https://github.com/kermitt2/delft#ontonotes-50-conll-2012 using ELMo, GPE -> 96.22 F1-score!). (GPE = geographical place entity) |
@kermitt2 Is it possible now to get entity type along with Wikipedia id/wikidata id for the entities? I read in the above comment that the entity type is disabled for the disambiguated entities. Is there any option available to get the entity type for every entity possible? |
Hi @Vasistareddy ! The "entity types" here are "named entity types", so only entities corresponding to named entity types (e.g. name, location, date, etc.) could in theory have such a type associated. If the entity_type is coming from an NER and we have a Wikidata disambiguated entity, the entity_type is currently discarded - as I said above it's a design choice: when we have a Wikidata entity, we have all the statements information and attributes of Wikidata available to characterize this entity, so it's usually richer than an arbitrary named entity type and we had issues of inconsistent named entity type given to the disambiguated entity (like PERSON predicted by the NER for a disambiguated Wikidata entity corresponding to a city). So entity_type are currently only kept for entity found by the NER but not disambiguated against Wikidata (missing in Wikidata or too ambiguous), because it's better than nothing. However, we tried to do better. So I think based on this, with some further some work (some improvement and integration/ data aggregation), it would possible to add named entity types for all relevant Wikidata entities in the future. |
I am trying to find all the named geographical entities in a text and I would like to get only the 27 types from the ner. When I use API online, the only way to get the types used by the ner is when minSelectorScore = 1, is there any other way to get these types ?
The text was updated successfully, but these errors were encountered: