-
Scraping eprint abstracts from the arXiv servers from any query.
-
Cleaning eacho abstract by removing MathJax/LaTeX tags, extra whitespace, punctuation, and common stopwords, and then by fixing the case of the remaining text.
-
Grouping most common bigram and trigram collocations into monograms for each abstract (preserves most technical phrases).
-
Filtering out non-technical terminology in each of the abstracts (i.e. NTLK's corpus of English-language words).
-
Computing the frequencies of each buzzword in the abstracts over a rolling time window.
-
Notifications
You must be signed in to change notification settings - Fork 0
knowbodynos/arx-live
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
An arXiv-trend predictive engine
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published