update docs

Python-Markdown · Mar 7, 2024 · c4a139f · c4a139f
1 parent 9d5d813
commit c4a139f
Show file tree

Hide file tree

Showing 2 changed files with 30 additions and 0 deletions.
diff --git a/docs/changelog.md b/docs/changelog.md
@@ -10,6 +10,25 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [unreleased]
 
+### Changed
+
+#### Refactor TOC Sanitation
+
+* All postprocessors are run on heading content.
+* Footnote references are stripped from heading content. Fixes #660.
+* A more robust `striptags` is provided to convert headings to plain text.
+  Unlike, markupsafe's implementation, HTML entities are not unescaped.
+* The plain text `name`, rich `html` and unescaped raw `data-toc-label` are
+  saved to `toc_tokens`, allowing users to access the full rich text content of
+  the headings directly from `toc_tokens`.
+* `data-toc-label` is sanitized separate from heading content.
+* A `html.unescape` call is made just prior to calling `slugify` so that
+  `slugify` only operates on Unicode characters. Note that `html.unescape` is
+  not run on the `name` or `html`.
+* The `get_name` and `stashedHTML2text` functions defined in the `toc` extension
+  are both **deprecated**. Instead, use some combination of `run_postprocessors`,
+  `render_inner_html` and `striptags`.
+
 ### Fixed
 
 * Include `scripts/*.py` in the generated source tarballs (#1430).

diff --git a/docs/extensions/toc.md b/docs/extensions/toc.md
@@ -80,6 +80,8 @@ the following object at `md.toc_tokens`:
         'level': 1,
         'id': 'header-1',
         'name': 'Header 1',
+        'html': 'Header 1',
+        'data-toc-label': '',
         'children': [
             {'level': 2, 'id': 'header-2', 'name': 'Header 2', 'children':[]}
         ]
@@ -91,6 +93,11 @@ Note that the `level` refers to the `hn` level. In other words, `<h1>` is level
 `1` and `<h2>` is level `2`, etc. Be aware that improperly nested levels in the
 input may result in odd nesting of the output.
 
+`name` is the sanitized value which would also be used as a label for the HTML
+version of the Table of Contents. `html` contains the fully rendered HTML
+content of the heading and has not been sanitized in any way. This may be used
+with your own custom sanitation to create custom table of contents.
+
 ### Custom Labels
 
 In most cases, the text label in the Table of Contents should match the text of
@@ -131,6 +138,10 @@ attribute list to provide a cleaner URL when linking to the header. If the ID is
 not manually defined, it is always derived from the text of the header, never
 from the `data-toc-label` attribute.
 
+The value of the `data-toc-label` attribute is sanitized and stripped of any HTML
+tags. However, `toc_tokens` will contain the raw content under
+`data-toc-label`.
+
 Usage
 -----