-
Notifications
You must be signed in to change notification settings - Fork 54
Finding concepts for keywords
When you are searching for articles or events, one of the conditions that you can specify are keywords. If you'd for example like to find articles about Barack Obama you can make the following query:
er = EventRegistry(apiKey = YOUR_API_KEY)
q = QueryArticles(keywords = "Barack Obama")
q.addRequestedResult(RequestArticlesInfo())
res = er.execQuery(q)
res
will return a list of articles that mention the words "Barack" and "Obama". A much better, faster and more desirable approach to make such queries would, however, be to use a concept search. In Event Registry, articles are annotated with entities (people, organizations, and location) and important words that are mentioned in the articles. These annotations are called concepts. Since articles are annotated with concepts we are also able to annotate events with concepts - events are annotated with those concepts that appear frequently enough in the articles describing the event.
Now, why would you care about using concepts? The main reason is that the languages are ambiguous and using keywords can yield undesirable results. The same word can mean different things and different words can often mean the same thing. Here is where the concepts come in. Each concept in Event Registry is represented with a unique identifier (URI), which is in our case the URL to the concept's Wikipedia page. For "Barack Obama", for example, the concept URI is http://en.wikipedia.org/wiki/Barack_Obama
. Using the concept URI, we can repeat the top query in this way:
er = EventRegistry(apiKey = YOUR_API_KEY)
q = QueryArticles(conceptUri = "http://en.wikipedia.org/wiki/Barack_Obama")
q.addRequestedResult(RequestArticlesInfo())
res = er.execQuery(q)
The difference in the obtained results would mainly be twofold:
-
Results would also include articles from languages that use a different script, such as Russian, Arabic or Chinese. This can be done because we know for each concept how it is spelled in different languages and for each concept, we use the concept URI regardless of the language in which the concept is mentioned.
-
The results would also include articles where Barack Obama is mentioned simply as "Obama". This would be even more common with organizations or things that are often mentioned using different words, phrases or abbreviations. This feature is available because we use Wikipedia as a knowledge base and we are aware of several ways in which concepts can be mentioned.
I hope by now, the reason for preferring concepts over simple keywords is evident by now. The only question that remains is how can you find the concept URI for a concept of your interest. The simple way in which you can find the concept URI based on the label of some entity or word is to use the getConceptUri()
API call. Here would be an example:
er = EventRegistry(apiKey = YOUR_API_KEY)
uri = er.getConceptUri("sandra bullock")
q = QueryArticles(conceptUri = uri)
q.addRequestedResult(RequestArticlesInfo())
res = er.execQuery(q)
As you can see, the call uri = er.getConceptUri("sandra bullock")
searches for the concept URI that best matches the label "sandra bullock". In this case, uri
would get the value http://en.wikipedia.org/wiki/Sandra_Bullock
. When multiple concepts match the given label, the one that appears most often in the news articles will be returned.
Core Information
Usage tracking
Terminology
EventRegistry
class
ReturnInfo
class
Data models for returned information
Finding concepts for keywords
Filtering content by news sources
Text analytics
Semantic annotation, categorization, sentiment
Searching
Searching for events
Searching for articles
Article/event info
Get event information
Get article information
Other
Supported languages
Different ways to search using keywords
Feed of new articles/events
Social media shares
Daily trends
Find the event for your own text
Article URL to URI mapping