Skip to content
This repository has been archived by the owner on Feb 27, 2022. It is now read-only.

Releases: tos-kamiya/d2vg

v2.0.0-beta.1 Intorudcing index DB and sentence transformer models

15 Feb 04:32
Compare
Choose a tag to compare

Once I marked d2vg a beta version, however, I will put it back to alpha in order to explore new models.

This major version update brings:

  • revised index DB structure for parallelization
  • introducing sentence transformer models
  • (somehow breaking) changes in the command line

Refer dev page for installation and usages.

Full Changelog: v1.0.0...v2.0.0-beta.1

v2.0.0-alpha.3

17 Dec 16:51
Compare
Choose a tag to compare
v2.0.0-alpha.3 Pre-release
Pre-release

See the dev branch page for command-line information on the new feature.

WARNING: In the dev branch, an experiment to change the structure of the DB is currently underway. The index DB created with the alpha or beta release will not be usable in the official release.

What's new in this release (since v1.0.0)

  • Searching within indexed document files
  • Explicit indexing command (changed to automatically delete obsolete index data in alpha 3)
  • Changes in index database (breaking change)

Doc2Vec model files:

Full Changelog: v1.0.0...v2.0.0-alpha.3

v2.0.0-alpha.2 Explicit indexing. Tested on Ubuntu 20.04.

16 Dec 14:06
Compare
Choose a tag to compare

See the dev branch page for command-line information on the new feature.

WARNING: In the dev branch, an experiment to change the structure of the DB is currently underway. The index DB created with the alpha or beta release will not be usable in the official release.

What's new in this release (since v1.0.0)

  • Searching within indexed document files
  • Explicit indexing command
  • Changes in index database (breaking change)

Doc2Vec model files:

Full Changelog: v1.0.0...v2.0.0-alpha.2

v1.0.0

09 Dec 02:33
Compare
Choose a tag to compare

We are proud to announce the release of 1.0.0! 🎊🎉

What's new in this release (since v0.8.0)

  • Introducing parallel processing to parse document files (-j)
  • Stabilized search results (fixed wrong argument of infer_vector)
  • Updating the algorithm for extracting headings
  • Clearer warning message when pdf or docx file is encrypted
  • Removing bugs and performance glitches
  • Utilize optimized options of Janome tokenizer (-l ja)

Doc2Vec model files:

Full Changelog: v0.8.0...v1.0.0

v1.0.0-rc.4 Release Candidate 4

07 Dec 09:31
Compare
Choose a tag to compare

What's new in this release (since v0.8.0)

  • Introducing parallel processing to parse document files (-j)
  • Stabilized search results (fixed wrong argument of infer_vector)
  • Updating the algorithm for extracting headings
  • Clearer warning message when pdf or docx file is encrypted
  • Removing bugs and performance glitches
  • Utilize optimized options of Janome tokenizer (-l ja)

Doc2Vec model files:

Full Changelog: v0.8.0...v1.0.0-rc.4

v1.0.0-rc.3 Release Candidate 3

06 Dec 10:18
Compare
Choose a tag to compare

What's new in this release (since v0.8.0)

  • Introducing parallel processing to parse document files (-j)
  • Stabilized search results (fixed wrong argument of infer_vector)
  • Updating the algorithm for extracting headings
  • Removing bugs and performance glitches
  • Utilize optimized options of Janome tokenizer (-l ja)

Doc2Vec model files:

Full Changelog: v0.8.0...v1.0.0-rc.3

v1.0.0-rc.2 Release Candidate 2

04 Dec 14:44
Compare
Choose a tag to compare

What's new in this release:

  • Introducing parallel processing to parse document files (-j)
  • Stabilized search results (fixed wrong argument of infer_vector)
  • Updating the algorithm for extracting headings
  • Utilize optimized options of Janome tokenizer (-l ja)

Doc2Vec model files:

Full Changelog: v0.8.0...v1.0.0-rc.2

v0.8.0 Easier installtion and improved Doc2Vec models

02 Dec 10:59
Compare
Choose a tag to compare

What's new:

  • Easier to install
  • Improved display of search results
  • Tuning of Doc2Vec models
  • Enhanced error messages

(Note: For Japanese, be sure to use the new Doc2Vec model, as the tokenizer has changed since 0.7.0.)
(注意: 日本語に関してはトークナイザーが0.7.0から変更されているので、必ず新しいDoc2Vecモデルを利用してください。)

Doc2Vec models can be found below: ... *.bz2 file.

Full Changelog: v0.7.0...v0.8.0

v0.7.0 Experimental support for languages: ko, zh

26 Nov 20:59
Compare
Choose a tag to compare

What's new in this release:

  • Experimental support for languages: ko, zh

Honestly, I can't read or write Chinese and Korean languages, so I am not sure about the accuracy of the models. Any comments, patches, or tuned models are welcome :)

To install, follow the instructions at https://github.com/tos-kamiya/d2vg#installation.

For the en and ja Doc2Vec model files (enw50k.*, jaw50k.*), please use the same files as in v0.5.1's assets:
enw50k.tar.bz2.aa, enw50k.tar.bz2.ab, jaw50k.tar.bz2

For ko and zh Doc2Vec model files are available at this release's assets.

Note: There is an ongoing attempt to refine the en and ja Doc2Vec models, based on feedback from the experience of adding the zh and ko Doc2Vec models. The v0.7.0 might be a short-term release.

Full Changelog: v0.6.0...v0.7.0

v0.6.0 Add keyword search

25 Nov 00:43
Compare
Choose a tag to compare

What's new in this release:

  • Add keyword search
  • Performance improvement by replacing the PDF library

For the Doc2Vec model files (enw50k.*, jaw50k.*), please use the same files as in v0.5.1's assets:
enw50k.tar.bz2.aa, enw50k.tar.bz2.ab, jaw50k.tar.bz2

It is recommended that you install the latest version by doing the following:

pip3 install git+https://github.com/tos-kamiya/d2vg.git

However, If you want to install this version, explicitly specify the version as follows:

pip3 install https://github.com/tos-kamiya/d2vg/releases/download/v0.6.0/d2vg-0.6.0-py3-none-any.whl

Full Changelog: v0.5.3...v0.6.0