From c33bbc8d8a4f78757f1752eccc834f3bb3352259 Mon Sep 17 00:00:00 2001 From: user202729 <25191436+user202729@users.noreply.github.com> Date: Wed, 13 Jan 2021 16:08:27 +0700 Subject: [PATCH 1/5] Fix wrong method name updatefStream -> updateStream in faq.rst --- docs/faq.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/faq.rst b/docs/faq.rst index 773ea5b02..8a3c5902f 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -2034,7 +2034,7 @@ How to Handle Object Streams ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Some object types contain additional data apart from their object definition. Examples are images, fonts, embedded files or commands describing the appearance of a page. -Objects of these types are called "stream objects". PyMuPDF allows reading an object's stream via method :meth:`Document.xrefStream` with the object's :data:`xref` as an argument. And it is also possible to write back a modified version of a stream using :meth:`Document.updatefStream`. +Objects of these types are called "stream objects". PyMuPDF allows reading an object's stream via method :meth:`Document.xrefStream` with the object's :data:`xref` as an argument. And it is also possible to write back a modified version of a stream using :meth:`Document.updateStream`. Assume that the following snippet wants to read all streams of a PDF for whatever reason:: @@ -2044,9 +2044,9 @@ Assume that the following snippet wants to read all streams of a PDF for whateve # do something with it (it is a bytes object or None) # e.g. just write it back: if stream: - doc.updatefStream(xref, stream) + doc.updateStream(xref, stream) -:meth:`Document.xrefStream` automatically returns a stream decompressed as a bytes object -- and :meth:`Document.updatefStream` automatically compresses it (where beneficial). +:meth:`Document.xrefStream` automatically returns a stream decompressed as a bytes object -- and :meth:`Document.updateStream` automatically compresses it (where beneficial). ---------------------------------- @@ -2159,7 +2159,7 @@ PyMuPDF has no way to **interpret or change** this information directly, because Using some XML package, the XML data can be interpreted and / or modified and then stored back:: >>> # write back modified XML metadata: - >>> doc.updatefStream(metaxref, xmlmetadata) + >>> doc.updateStream(metaxref, xmlmetadata) >>> >>> # if these data are not wanted, delete them: >>> doc._delXmlMetadata() From 5bc31ae1603d01e3dffe67fd4ec55d2b6ca985d8 Mon Sep 17 00:00:00 2001 From: user202729 <25191436+user202729@users.noreply.github.com> Date: Wed, 13 Jan 2021 16:17:13 +0700 Subject: [PATCH 2/5] Fix several references to deleted methods and self-references in the documentation --- docs/document.rst | 10 +++++----- docs/faq.rst | 6 +++--- docs/functions.rst | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/document.rst b/docs/document.rst index 3fa5e0ea0..531788488 100644 --- a/docs/document.rst +++ b/docs/document.rst @@ -1307,27 +1307,27 @@ For details on **embedded files** refer to Appendix 3. *(New in version 1.16.8)* - PDF only: Return the definition of a PDF object. For details please refer to :meth:`Document.xrefObject`. + PDF only: Return the definition of a PDF object. .. method:: PDFCatalog() *(New in version 1.16.8)* - PDF only: Return the :data:`xref` of the PDF catalog (or root) object. For details please refer to :meth:`Document._getPDFroot`. + PDF only: Return the :data:`xref` of the PDF catalog (or root) object. .. method:: PDFTrailer(compressed=False) *(New in version 1.16.8)* - PDF only: Return the trailer of the PDF (UTF-8), which is usually located at the PDF file's end. For details please refer to :meth:`Document._getTrailerString`. + PDF only: Return the trailer of the PDF (UTF-8), which is usually located at the PDF file's end. .. method:: metadataXML() *(New in version 1.16.8)* - PDF only: Return the :data:`xref` of the document's XML metadata. For details please refer to :meth:`Document._getXmlMetadataXref`. + PDF only: Return the :data:`xref` of the document's XML metadata. .. method:: xrefStream(xref) @@ -1517,7 +1517,7 @@ Clear metadata information. If you do this out of privacy / data protection conc {'producer': 'none', 'format': 'PDF 1.4', 'encryption': None, 'author': 'none', 'modDate': 'none', 'keywords': 'none', 'title': 'none', 'creationDate': 'none', 'creator': 'none', 'subject': 'none'} ->>> doc._delXmlMetadata() # clear any XML metadata +>>> doc.del_xml_metadata() # clear any XML metadata >>> doc.save("anonymous.pdf", garbage = 4) # save anonymized doc :meth:`setToC` Demonstration diff --git a/docs/faq.rst b/docs/faq.rst index 8a3c5902f..0925f5624 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -2125,11 +2125,11 @@ ID array File identifier consisting of two byte strings. XRefStm int Offset of a cross-reference stream. See :ref:`AdobeManual` p. 109. ======= =========== =================================================================================== -Access this information via PyMuPDF with :meth:`Document._getTrailerString`. +Access this information via PyMuPDF with :meth:`Document.PDFTrailer`. >>> import fitz >>> doc=fitz.open("PyMuPDF.pdf") - >>> trailer=doc._getTrailerString() + >>> trailer=doc.PDFTrailer() >>> print(trailer) <> >>> @@ -2162,4 +2162,4 @@ Using some XML package, the XML data can be interpreted and / or modified and th >>> doc.updateStream(metaxref, xmlmetadata) >>> >>> # if these data are not wanted, delete them: - >>> doc._delXmlMetadata() + >>> doc.del_xml_metadata() diff --git a/docs/functions.rst b/docs/functions.rst index a275622f4..7fcdc62b2 100644 --- a/docs/functions.rst +++ b/docs/functions.rst @@ -410,7 +410,7 @@ Yet others are handy, general-purpose utilities. .. method:: Document.xml_metadata_xref() - Return the XML-based metadata :data:`xref` of the PDF if present -- also refer to :meth:`Document._delXmlMetadata`. You can use it to retrieve the content via :meth:`Document.xrefStream` and then work with it using some XML software. + Return the XML-based metadata :data:`xref` of the PDF if present -- also refer to :meth:`Document.del_xml_metadata`. You can use it to retrieve the content via :meth:`Document.xrefStream` and then work with it using some XML software. :rtype: int :returns: :data:`xref` of PDF file level XML metadata -- or 0 if none exists. From 1726be5dc2878f72c32ddca5edf992481fbafdf1 Mon Sep 17 00:00:00 2001 From: user202729 <25191436+user202729@users.noreply.github.com> Date: Wed, 13 Jan 2021 18:56:21 +0700 Subject: [PATCH 3/5] Fix several typos in the code snippets in the documentation --- docs/faq.rst | 4 ++-- docs/functions.rst | 6 +++--- docs/page.rst | 4 ++-- docs/rect.rst | 2 +- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/faq.rst b/docs/faq.rst index 0925f5624..b9ce87663 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -1952,12 +1952,12 @@ If it is *False* or if you want to be on the safe side, pick one of the followin * **Prepend** the missing stacking command by executing *fitz.TOOLS._insert_contents(page, b"q\n", False)*. * **Append** an unstacking command by executing *fitz.TOOLS._insert_contents(page, b"\nQ", True)*. -* Alternatively, just use :meth:`Page._wrapContents`, which executes the previous two functions. +* Alternatively, just use :meth:`Page.wrap_contents`, which executes the previous two functions. .. note:: If small incremental update deltas are a concern, this approach is the most effective. Other contents objects are not touched. The utility method creates two new PDF :data:`stream` objects and inserts them before, resp. after the page's other :data:`contents`. We therefore recommend the following snippet to get this situation under control: >>> if not page._isWrapped: - page._wrapContents() + page.wrap_contents() >>> # start inserting text, images or annotations here -------------------------- diff --git a/docs/functions.rst b/docs/functions.rst index 7fcdc62b2..4ad8a5fca 100644 --- a/docs/functions.rst +++ b/docs/functions.rst @@ -20,7 +20,7 @@ Yet others are handy, general-purpose utilities. :meth:`ConversionTrailer` return trailer string for *getText* methods :meth:`Document.del_xml_metadata` PDF only: remove XML metadata :meth:`Document.set_xml_metadata` PDF only: remove XML metadata -:meth:`Document.delete_object` PDF only: delete an object +:meth:`Document._deleteObject` PDF only: delete an object :meth:`Document.get_new_xref` PDF only: create and return a new :data:`xref` entry :meth:`Document._getOLRootNumber` PDF only: return / create :data:`xref` of */Outline* :meth:`Document.pdf_catalog` PDF only: return the :data:`xref` of the catalog @@ -346,7 +346,7 @@ Yet others are handy, general-purpose utilities. ----- - .. method:: Document.delete_object(xref) + .. method:: Document._deleteObject(xref) PDF only: Delete an object given by its cross reference number. @@ -521,7 +521,7 @@ Yet others are handy, general-purpose utilities. PDF only: Clean and concatenate all :data:`contents` objects associated with this page. "Cleaning" includes syntactical corrections, standardizations and "pretty printing" of the contents stream. Discrepancies between :data:`contents` and :data:`resources` objects will also be corrected if sanitize is true. See :meth:`Page.getContents` for more details. - Changed in version 1.16.0 Annotations are no longer implicitely cleaned by this method. Use :meth:`Annot._cleanContents` separately. + Changed in version 1.16.0 Annotations are no longer implicitely cleaned by this method. Use :meth:`Annot.cleanContents` separately. :arg bool sanitize: *(new in v1.17.6)* if true, synchronization between resources and their actual use in the contents object is snychronized. For example, if a font is not actually used for any text of the page, then it will be deleted from the ``/Resources/Font`` object. diff --git a/docs/page.rst b/docs/page.rst index 92efd677d..85742f6d8 100644 --- a/docs/page.rst +++ b/docs/page.rst @@ -96,7 +96,7 @@ In a nutshell, this is what you can do with PyMuPDF: :meth:`Page.showPDFpage` PDF only: display PDF page image :meth:`Page.updateLink` PDF only: modify a link :meth:`Page.widgets` return a generator over the fields on the page -:meth:`Page.writeText` write one or more :ref:`Textwriter` objects +:meth:`Page.writeText` write one or more :ref:`TextWriter` objects :attr:`Page.CropBox` the page's :data:`CropBox` :attr:`Page.CropBoxPosition` displacement of the :data:`CropBox` :attr:`Page.firstAnnot` first :ref:`Annot` on the page @@ -472,7 +472,7 @@ In a nutshell, this is what you can do with PyMuPDF: *(New in version 1.16.18)* - PDF only: Write the text of one or more :ref:`Textwriter` ojects to the page. + PDF only: Write the text of one or more :ref:`TextWriter` ojects to the page. :arg rect_like rect: where to place the text. If omitted, the rectangle union of the text writers is used. :arg sequence writers: a non-empty tuple / list of :ref:`TextWriter` objects or a single :ref:`TextWriter`. diff --git a/docs/rect.rst b/docs/rect.rst index 381b4d920..7d440f1f9 100644 --- a/docs/rect.rst +++ b/docs/rect.rst @@ -33,7 +33,7 @@ Hence some useful classification: :meth:`Rect.morph` transform with a point and a matrix :meth:`Rect.norm` the Euclidean norm :meth:`Rect.normalize` makes a rectangle finite -:meth:`Rect.round` create smallest :ref:`Irect` containing rectangle +:meth:`Rect.round` create smallest :ref:`IRect` containing rectangle :meth:`Rect.transform` transform rectangle with a matrix :attr:`Rect.bottom_left` bottom left point, synonym *bl* :attr:`Rect.bottom_right` bottom right point, synonym *br* From 8c4d9b5eac9e69ddf7a277089b9bef9d4f01a69e Mon Sep 17 00:00:00 2001 From: user202729 <25191436+user202729@users.noreply.github.com> Date: Wed, 13 Jan 2021 19:01:44 +0700 Subject: [PATCH 4/5] Fix several typos in the documentation --- docs/document.rst | 2 +- docs/functions.rst | 4 ++-- docs/page.rst | 2 +- docs/tools.rst | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/document.rst b/docs/document.rst index 531788488..bd179303a 100644 --- a/docs/document.rst +++ b/docs/document.rst @@ -219,7 +219,7 @@ For details on **embedded files** refer to Appendix 3. >>> for item in doc.layer_configs: print(item) {'number': 0, 'name': 'my-config', 'creator': ''} - >>> # use 'number' as config identifyer in add_ocg + >>> # use 'number' as config identifier in add_ocg .. method:: add_layer_config(name, creator=None, on=None) diff --git a/docs/functions.rst b/docs/functions.rst index 4ad8a5fca..2aca78868 100644 --- a/docs/functions.rst +++ b/docs/functions.rst @@ -521,9 +521,9 @@ Yet others are handy, general-purpose utilities. PDF only: Clean and concatenate all :data:`contents` objects associated with this page. "Cleaning" includes syntactical corrections, standardizations and "pretty printing" of the contents stream. Discrepancies between :data:`contents` and :data:`resources` objects will also be corrected if sanitize is true. See :meth:`Page.getContents` for more details. - Changed in version 1.16.0 Annotations are no longer implicitely cleaned by this method. Use :meth:`Annot.cleanContents` separately. + Changed in version 1.16.0 Annotations are no longer implicitly cleaned by this method. Use :meth:`Annot.cleanContents` separately. - :arg bool sanitize: *(new in v1.17.6)* if true, synchronization between resources and their actual use in the contents object is snychronized. For example, if a font is not actually used for any text of the page, then it will be deleted from the ``/Resources/Font`` object. + :arg bool sanitize: *(new in v1.17.6)* if true, synchronization between resources and their actual use in the contents object is synchronized. For example, if a font is not actually used for any text of the page, then it will be deleted from the ``/Resources/Font`` object. .. warning:: This is a complex function which may generate large amounts of new data and render old data unused. It is **not recommended** using it together with the **incremental save** option. Also note that the resulting singleton new */Contents* object is **uncompressed**. So you should save to a **new file** using options *"deflate=True, garbage=3"*. diff --git a/docs/page.rst b/docs/page.rst index 85742f6d8..35baeba3c 100644 --- a/docs/page.rst +++ b/docs/page.rst @@ -472,7 +472,7 @@ In a nutshell, this is what you can do with PyMuPDF: *(New in version 1.16.18)* - PDF only: Write the text of one or more :ref:`TextWriter` ojects to the page. + PDF only: Write the text of one or more :ref:`TextWriter` objects to the page. :arg rect_like rect: where to place the text. If omitted, the rectangle union of the text writers is used. :arg sequence writers: a non-empty tuple / list of :ref:`TextWriter` objects or a single :ref:`TextWriter`. diff --git a/docs/tools.rst b/docs/tools.rst index b007fe931..f6f52ca87 100644 --- a/docs/tools.rst +++ b/docs/tools.rst @@ -8,7 +8,7 @@ This class is a collection of utility methods and attributes, mainly around memo ====================================== ================================================= **Method / Attribute** **Description** ====================================== ================================================= -:meth:`Tools.gen_id` generate a unique identifyer +:meth:`Tools.gen_id` generate a unique identifier :meth:`Tools.image_profile` report basic image properties :meth:`Tools.store_shrink` shrink the storables cache [#f1]_ :meth:`Tools.mupdf_warnings` return the accumulated MuPDF warnings From 50caae0972d929364c1e6c750cf8a9dd6709bbdf Mon Sep 17 00:00:00 2001 From: user202729 <25191436+user202729@users.noreply.github.com> Date: Wed, 13 Jan 2021 19:12:09 +0700 Subject: [PATCH 5/5] Add extractTEXT function to TextPage for consistency --- fitz/fitz.i | 1 + 1 file changed, 1 insertion(+) diff --git a/fitz/fitz.i b/fitz/fitz.i index 42dbae39c..b91c2cfeb 100644 --- a/fitz/fitz.i +++ b/fitz/fitz.i @@ -10002,6 +10002,7 @@ struct TextPage { """Return simple, bare text on the page.""" return self._extractText(0) + extractTEXT = extractText def extractHTML(self) -> str: """Return page content as a HTML string."""