Skip to content

Commit a22f435

Browse files
JorjMcKiejulian-smith-artifex-com
authored andcommitted
Clarify role of filetype Parameter
1 parent ff6f64e commit a22f435

File tree

2 files changed

+6
-18
lines changed

2 files changed

+6
-18
lines changed

docs/document.rst

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -177,17 +177,13 @@ For details on **embedded files** refer to Appendix 3.
177177
* If ``stream`` is given, then the document is created from memory.
178178
* If ``stream`` is `None`, then a document is created from the file given by ``filename``.
179179

180-
:arg str,pathlib filename: A UTF-8 string or ``pathlib.Path`` object containing a file path. The document type is always determined from the file content. The ``filetype`` parameter can be used to ensure that the detected type is as expected or, respectively, to force treating any file as plain text.
180+
:arg str,pathlib filename: A UTF-8 string or ``pathlib.Path`` object containing a file path. The document type is always determined from the file content. The ``filetype`` parameter is ignored, except when content inspection was unsuccessful. This is regularly the case for plain text types like "txt", "html", "xml" etc. with a wrong or missing file extension.
181181

182-
:arg bytes,bytearray,BytesIO stream: A memory area containing file data. The document type is **always** detected from the data content. The ``filetype`` parameter is ignored except for undetected data content. In that case only, using ``filetype="txt"`` will treat the data as containing plain text.
182+
:arg bytes,bytearray,BytesIO stream: A memory area containing file data. The document type is always detected from the data content. The ``filetype`` parameter is ignored, except when content inspection was unsuccessful. This is regularly the case for plain text types like "txt", "html", "xml" etc.
183183

184-
:arg str filetype: A string specifying the type of document. This may be anything looking like a filename (e.g. "x.pdf"), in which case MuPDF uses the extension to determine the type, or a mime type like ``application/pdf``. Just using strings like "pdf" or ".pdf" will also work. Can be omitted for :ref:`a supported document type<Supported_File_Types>`.
185-
186-
If opening a file name / path only, it will be used to ensure that the detected type is as expected. An exception is raised for a mismatch. Using `filetype="txt"` will treat any file as containing plain text.
187-
188-
When opening from memory, this parameter is ignored except for undetected data content. Only in that case, using ``filetype="txt"`` will treat the data as containing plain text.
184+
:arg str filetype: A string specifying the type of document. This is only ever needed when file content inspection fails. Text types like "txt", "html", "xml" etc. cannot be disambiguated by their content. When such files are provided in memory or being provided with the wrong file extension, this parameter **must** be used.
189185

190-
:arg rect_like rect: a rectangle specifying the desired page size. This parameter is only meaningful for documents with a variable page layout ("reflowable" documents), like e-books or HTML, and ignored otherwise. If specified, it must be a non-empty, finite rectangle with top-left coordinates (0, 0). Together with parameter *fontsize*, each page will be accordingly laid out and hence also determine the number of pages.
186+
:arg rect_like rect: a rectangle specifying the desired page size. This parameter is only meaningful for documents with a variable page layout ("reflowable" documents), like e-books or HTML, and ignored otherwise. If specified, it must be a non-empty, finite rectangle with top-left coordinates (0, 0). Together with parameter :data:`fontsize`, each page will be accordingly laid out and hence also determine the number of pages.
191187

192188
:arg float width: may used together with ``height`` as an alternative to ``rect`` to specify layout information.
193189

docs/how-to-open-a-file.rst

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -81,17 +81,9 @@ This component looks at the actual data in the file using a number of heuristics
8181

8282
Here is a list of details about how the file content recognizer works:
8383

84-
* When opening from a file name, use the ``filetype`` parameter if you need to make sure that the created :ref:`Document` is of the expected type. An exception is raised for any mismatch.
85-
86-
* Text files are an exception: they do not contain recognizable internal structures at all. Here, the file extension ".txt" and the ``filetype`` parameter continue to play a role and are used to create a "Tex" document. Correspondingly, text files with other / no extensions, can successfully be opened using `filetype="txt"`.
87-
88-
* Using `filetype="txt"` will treat **any** file as containing plain text when opened from a file name / path -- even when its content is a supported document type.
89-
90-
* When opening from a stream, the file content recognizer will ignore the ``filetype`` parameter entirely for known file types -- even in case of a mismatch or when `filetype="txt"` was specified.
91-
92-
* Streams with a known file type cannot be opened as plain text.
93-
* Specifying ``filetype`` currently only has an effect when no match was found. Then using ``filetype="txt"`` will treat the file as containing plain text.
84+
* When opening from a file name, use the ``filetype`` parameter if your file format cannot be determined by content inspection. This is for instance the case for all text files: "txt", "html", "xml" or source files. If the file extension is missing or wrong or the file resides in memory, the ``filetype`` must be used. File formats that can successfully be recognized will be opened even without or wrong extensions, and the ``filetype`` paraneter will be ignored.
9485

86+
* Files based on text content do not contain unambiguously recognizable internal structures. This is true for source files (Python, C, etc.) but also HTML, XML and so on. Here, the file extensions and the ``filetype`` parameter continue to play a role and are used to create a "Tex" / "HTML" / ... document. Correspondingly, text files with other / no extensions, can successfully be opened using ``filetype``.
9587

9688
----------
9789

0 commit comments

Comments
 (0)