Releases: hardikp/fnlp
Releases · hardikp/fnlp
Refreshed data files
Larger data source
The major difference is the larger text data source by traversing wikipedia for 2 levels.
Number of tokens: 37M
Vocab size: 85322
lowercase vocabulary
word dim: 50 (glove.37M.50d.zip) and 300 (glove.37M.300d.zip)
lowercase tokens
Text data is same as the previous release, but only the lowercase tokens are considered here.
Number of tokens: 2.2M
Vocab size: 13522
wikipedia finance topics + qplum pages
2.2M tokens
About 15,000 words
First release - GloVe vectors for financial terms
The text was gathered using a wikipedia scrapy crawler. The web crawler simply goes through all the pages linked from here with depth == 1
.
The word vectors were trained using the GloVe software obtained from - https://nlp.stanford.edu/projects/glove/