Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.
Grab the package using pip
(this will take a few minutes)
pip install geograpy2
Geograpy2 uses NLTK for entity recognition, so you'll also need to download the models we're using. Fortunately there's a command that'll take care of this for you.
geograpy-nltk
Import the module, give some text or a URL, and presto.
import geograpy2
url = 'http://www.bbc.com/news/world-europe-26919928'
places = geograpy2.get_place_context(url=url)
Geograpy2 is a fork of geograpy and inherits most of it, but solves several problems (such as support for utf8, places names with multiple words, confusion over homonyms etc).
Geograpy2 uses the following excellent libraries:
- NLTK for entity recognition
- newspaper for text extraction from HTML
- jellyfish for fuzzy text match
- pycountry for country/region lookups
Geograpy uses the following data sources:
- GeoLite2 for city lookups
- ISO3166ErrorDictionary for common country mispellings via Sara-Jayne Terp
Hat tip to Chris Albon for the name.
Released under the MIT license.