Releases · hardikp/fnlp

05 Apr 17:00

hardikp

v0.0.5

936f4df

Refreshed data files Latest

Latest

Sources: qplum, investopedia, wikipedia
number of tokens: 45M
vocab size: 91944
window size: 15

Assets 3

02 May 22:17

hardikp

v0.0.4

a0ed5dc

Larger data source

The major difference is the larger text data source by traversing wikipedia for 2 levels.
Number of tokens: 37M
Vocab size: 85322
lowercase vocabulary
word dim: 50 (glove.37M.50d.zip) and 300 (glove.37M.300d.zip)

Assets 4

01 May 18:30

hardikp

v0.0.3

e0fa410

lowercase tokens

Text data is same as the previous release, but only the lowercase tokens are considered here.
Number of tokens: 2.2M
Vocab size: 13522

Assets 3

28 Apr 17:17

hardikp

v0.0.2

6cc1b47

wikipedia finance topics + qplum pages

2.2M tokens
About 15,000 words

Assets 3

24 Apr 22:36

hardikp

v0.0.1

6aa2e7e

First release - GloVe vectors for financial terms

The text was gathered using a wikipedia scrapy crawler. The web crawler simply goes through all the pages linked from here with depth == 1.

The word vectors were trained using the GloVe software obtained from - https://nlp.stanford.edu/projects/glove/

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: hardikp/fnlp

Refreshed data files

Larger data source

lowercase tokens

wikipedia finance topics + qplum pages

First release - GloVe vectors for financial terms