Replies: 2 comments 6 replies
-
In short, no. While the files are available on general npm CDNs like unpkg or jsDelivr, I believe you will encounter issues trying to load this library from them, because of the browser restriction that Web Workers must be served from the same origin as the current page. Tesseract.js meanwhile does have a CDN distribution, and it looks like they use a workaround for the same-origin restriction for workers. Perhaps tesseract-wasm could integrate something similar in future. |
Beta Was this translation helpful? Give feedback.
-
It turns out it is possible to load tesseract-wasm from jsDelivr using a similar approach to Tesseract.js, with a workaround for an issue with URLs. Here is a demo HTML example, which you will need to serve via some local HTTP server (or upload to an HTTP URL): <html>
<body>
<script type="module">
import { OCRClient } from 'https://cdn.jsdelivr.net/npm/[email protected]/+esm';
async function runOCR() {
// Fetch document image and decode it into an ImageBitmap.
const imageResponse = await fetch('https://i.imgur.com/YfEZa63.jpeg');
const imageBlob = await imageResponse.blob();
const image = await createImageBitmap(imageBlob);
const wasmResponse = await fetch(
'https://cdn.jsdelivr.net/npm/[email protected]/dist/tesseract-core.wasm'
);
const wasmBinary = await wasmResponse.arrayBuffer();
const realWorkerURL =
'https://cdn.jsdelivr.net/npm/[email protected]/dist/tesseract-worker.js';
const workerSource = new Blob([`
// Make URL constructor a no-op to work around an error when emscripten
// tries to determine the WASM URL (which is not essential here).
globalThis.URL = function () {};
importScripts("${realWorkerURL}");
`], {
type: 'application/javascript',
});
const workerURL = URL.createObjectURL(workerSource);
// Initialize the OCR engine. This will start a Web Worker to do the
// work in the background.
const ocr = new OCRClient({
workerURL,
wasmBinary,
});
try {
// Load the appropriate OCR training data for the image(s) we want to
// process.
await ocr.loadModel(
'https://raw.githubusercontent.com/tesseract-ocr/tessdata_fast/main/eng.traineddata'
);
await ocr.loadImage(image);
// Perform text recognition and return text in reading order.
const text = await ocr.getText();
console.log('OCR text: ', text);
} finally {
// Once all OCR-ing has been done, shut down the Web Worker and free up
// resources.
ocr.destroy();
}
}
runOCR();
</script>
</body>
</html> |
Beta Was this translation helpful? Give feedback.
-
Hi Robert,
Isn't there a CDN distribution of tesseract-wasm? I'm trying to use it in a browser and It should be simpler than using npm.
Cheers
Davide
Beta Was this translation helpful? Give feedback.
All reactions