Skip to content

Releases: hardikp/fnlp

Refreshed data files

05 Apr 17:00
Compare
Choose a tag to compare
  • Sources: qplum, investopedia, wikipedia
  • number of tokens: 45M
  • vocab size: 91944
  • window size: 15

Larger data source

02 May 22:17
Compare
Choose a tag to compare

The major difference is the larger text data source by traversing wikipedia for 2 levels.
Number of tokens: 37M
Vocab size: 85322
lowercase vocabulary
word dim: 50 (glove.37M.50d.zip) and 300 (glove.37M.300d.zip)

lowercase tokens

01 May 18:30
Compare
Choose a tag to compare

Text data is same as the previous release, but only the lowercase tokens are considered here.
Number of tokens: 2.2M
Vocab size: 13522

wikipedia finance topics + qplum pages

28 Apr 17:17
Compare
Choose a tag to compare

2.2M tokens
About 15,000 words

First release - GloVe vectors for financial terms

24 Apr 22:36
Compare
Choose a tag to compare

The text was gathered using a wikipedia scrapy crawler. The web crawler simply goes through all the pages linked from here with depth == 1.

The word vectors were trained using the GloVe software obtained from - https://nlp.stanford.edu/projects/glove/