Skip to content

Releases: scribeocr/scribe.js

v0.5.1

10 Dec 09:30
Compare
Choose a tag to compare

What's Changed

  • Fixed bug causing crashes when recognizing certain PDFs using Node.js (#26)
  • Minor updates

Full Changelog: v0.5.0...v0.5.1

v0.5.0

25 Nov 09:08
Compare
Choose a tag to compare

What's Changed

  • Added config argument to recognize, which allows for passing arguments to Tesseract.js (#22)
  • Added support for parsing PDF text at various orientations (90/180/270 degrees).
  • Minor improvements to OCR quality.
  • Various improvements to imports of HOCR and native PDF text.
  • Added saveAs utility function for saving files.
  • Added opt.kerning option that can be used to enable or disable kerening.

Full Changelog: v0.4.1...v0.5.0

v0.4.1

10 Nov 19:24
Compare
Choose a tag to compare

What's Changed

  • Implemented parallel processing by default for Node.js version
    • To restore the previous behavior (1 worker), set scribe.opt.workerN = 1 before calling any functions.
  • Non-default behavior for extracting text from PDF files is now handled by setting the properties of scribe.opt.usePDFText.
  • Added Nimbus Mono font (similar to Courier)
  • Improvements to text extraction from PDF files.
  • Improvements to text positioning.

Full Changelog: v0.3.1...v0.4.1

Note: This post combines changes for 0.4.0 and 0.4.1 since the former was only the most recent version for a few hours.

v0.3.1

31 Oct 08:38
Compare
Choose a tag to compare

What's Changed

  • Fixed memory leaks

Full Changelog: v0.3.0...v0.3.1

v0.3.0

31 Oct 03:59
Compare
Choose a tag to compare

What's Changed

  • Improvements to parsing existing text from PDF files
  • Various improvements to OCR text and bounding box quality
  • Fixed memory leak
  • Various minor changes

Full Changelog: v0.2.8...v0.3.0

v0.2.8

30 Sep 07:30
Compare
Choose a tag to compare
  • Improved performance of "Quality" recognition mode.
    • Many documents should run up to 10-15% faster in quality mode.
  • Updated Scribe Tesseract build to improve recognition accuracy.
    • Accuracy for data tables and other complex layouts has been noticeably improved.
  • Improved image pre-processing.
  • Updated Vanilla Tesseract build to support debugging features and image upscaling.
  • Other minor changes

Full Changelog: v0.2.7...v0.2.8

v0.2.7

25 Sep 05:21
Compare
Choose a tag to compare
  • Fixed bug preventing existing text in some PDFs from being detected (025456a)
  • Increased resolution at which PDFs are rendered (0dd8801)
  • Added calcSuppFontInfo option that calculates font metrics for the fonts in text-native PDFs (4b2b43e)
    • This is useful for niche applications that require highly-accurate visual coordinates from text-native PDFs.
  • Various other minor updates

Full Changelog: v0.2.6...v0.2.7

v0.2.6

06 Sep 08:00
Compare
Choose a tag to compare

v0.2.5

06 Sep 07:40
Compare
Choose a tag to compare
  • Improved performance, especially for single-page documents.
  • Improved accuracy for "Quality" recognition mode (the default).
  • Fixed various minor bugs

Full Changelog: v0.2.4...v0.2.5

v0.2.4

29 Aug 07:56
Compare
Choose a tag to compare
  • Improved support with build tools such as Webpack
  • Fixed bug where PDF resources were being loaded when not necessary (dd99124)
  • Fixed Tesseract bug causing incorrect metrics for single-word recognition (Recognize Word) in Scribe OCR UI (f6be561)

Full Changelog: v0.2.3...v0.2.4