Skip to content

Commit

Permalink
Documentation updates.
Browse files Browse the repository at this point in the history
  • Loading branch information
euske committed Nov 17, 2013
1 parent cf1e3c9 commit e39e39f
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 2 deletions.
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ It includes a PDF converter that can transform PDF files
into other text formats (such as HTML). It has an extensible
PDF parser that can be used for other purposes than text analysis.


Features
--------

Expand All @@ -23,6 +24,7 @@ Features
* Tagged contents extraction.
* Automatic layout analysis.


How to Install
--------------

Expand All @@ -37,6 +39,7 @@ How to Install

$ pdf2txt.py samples/simple1.pdf


For CJK Languages
-----------------

Expand All @@ -60,6 +63,7 @@ paste the following commands on a command line prompt:
python tools\conv_cmap.py -c KSC-EUC=euc-kr -c KSC-Johab=johab -c KSCms-UHC=cp949 -c UniKS-UTF8=utf-8 pdfminer\cmap Adobe-Korea1 cmaprsrc\cid2code_Adobe_Korea1.txt
python setup.py install


Command Line Tools
------------------

Expand Down Expand Up @@ -87,6 +91,21 @@ but it's also possible to extract some meaningful contents (e.g. images).

(For details, refer to the html document.)


API Changes
-----------

As of November 2013, there were a few changes made to the PDFMiner API
prior to October 2013. This is the result of code restructuring. Here
is a list of the changes:

* PDFDocument class is moved to pdfdocument.py.
* PDFDocument class now takes a PDFParser object as an argument.
PDFDocument.set_parser() and PDFParser.set_document() is removed.
* PDFPage class is moved to pdfpage.py
* process_pdf function is implemented as a class method PDFPage.get_pages.


TODO
----

Expand All @@ -97,6 +116,7 @@ TODO
* Better documentation.
* Crypt stream filter support.


Related Projects
----------------

Expand All @@ -105,6 +125,7 @@ Related Projects
* <a href="http://www.pdfbox.org/">pdfbox</a>
* <a href="http://mupdf.com/">mupdf</a>


Terms and Conditions
--------------------

Expand Down
15 changes: 13 additions & 2 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

<div align=right class=lastmod>
<!-- hhmts start -->
Last Modified: Sat Oct 26 15:03:35 UTC 2013
Last Modified: Sun Nov 17 06:32:44 UTC 2013
<!-- hhmts end -->
</div>

Expand Down Expand Up @@ -368,7 +368,18 @@ <h4>Options</h4>

<h2><a name="changes">Changes</a></h2>
<ul>
<li> 2013/10/22: Sudden resurge of interests.
<li> 2013/11/13: Bugfixes and minor improvements.<br>
As of November 2013, there were a few changes made to the PDFMiner API
prior to October 2013. This is the result of code restructuring. Here
is a list of the changes:
<ul>
<li> <code>PDFDocument</code> class is moved to <code>pdfdocument.py</code>.
<li> <code>PDFDocument</code> class now takes a <code>PDFParser</code> object as an argument.
<li> <code>PDFDocument.set_parser()</code> and <code>PDFParser.set_document()</code> is removed.
<li> <code>PDFPage</code> class is moved to <code>pdfpage.py</code>.
<li> <code>process_pdf</code> function is implemented as <code>PDFPage.get_pages</code>.
</ul>
<li> 2013/10/22: Sudden resurge of interests. API changes.
Incorporated a lot of patches and robust handling of broken PDFs.
<li> 2011/05/15: Speed improvements for layout analysis.
<li> 2011/05/15: API changes. <code>LTText.get_text()</code> is added.
Expand Down

0 comments on commit e39e39f

Please sign in to comment.