Why not just use calibre to convert all supported calibre files into txt with the build in ebook-convert function? #30
Replies: 1 comment
-
It's a good suggestion - I already use Calibre for PDF and EPUB files, and it certainly wouldn't hurt or cost much time to add support for additional formats, but I've been preoccupied with working on a few functionalities related to OCR, layout detection, cleaup and chapter detection, especially for PDFs (since most of my books are in that format). I'm testing Surya, a set of OCR, layout detection and reading order detection models that should make it possible to deal with headers, footers, properly detect section titles, manage footnotes and tables in a way that makes sense for TTS (plus it has much higher accuracy compared to Tesseract, on par with commercial solutions like Google Vision or whatever the Amazon thing is called). And at the moment I'm working on enhancing the subtitle/dubbing workflow, which I actually need for my work. |
Beta Was this translation helpful? Give feedback.
-
Noticed a limited support for files types only around 3,
This seems like a good solution, Using the
ebook-convert
calibre command should allow it to support all of these:What formats does calibre support conversion to/from?
calibre supports the conversion of many input formats to many output formats. It can convert every input format in the following list, to every output format.
Input Formats: AZW, AZW3, AZW4, CBZ, CBR, CB7, CBC, CHM, DJVU, DOCX, EPUB, FB2, FBZ, HTML, HTMLZ, LIT, LRF, MOBI, ODT, PDF, PRC, PDB, PML, RB, RTF, SNB, TCR, TXT, TXTZ
Output Formats: AZW3, EPUB, DOCX, FB2, HTMLZ, OEB, LIT, LRF, MOBI, PDB, PMLZ, RB, PDF, RTF, SNB, TCR, TXT, TXTZ, ZIP
-Using the
ebook-convert
calibre commandBeta Was this translation helpful? Give feedback.
All reactions