diff --git a/scripts/process_notebooks.py b/scripts/process_notebooks.py index 09ea0d35..3aa8fb6f 100755 --- a/scripts/process_notebooks.py +++ b/scripts/process_notebooks.py @@ -33,6 +33,114 @@ ``noteout.interact-nb-suffix``. """ +""" +Notes on implementation +----------------------- + +Crossreferences ++++++++++++++++ + +Our problem is that we generate the notebooks before Quarto has resolve its +cross-references. For example, in the generated `sampling_tools.Rmd` +notebook, we have the following ``quarto-unresolved-xref`` instances. Here is +the first:: + + sec-resampling-two, we were + +This refers to the ``resampling_with_code2.html`` page, identifiable by the +matching link in the generated `sampling_tools.html` page containing the +notebook, that has:: + + Chapter + 6 + +We can also find it via ``grep``, in the ``resampling_with_code2.html`` page:: + +

6  More + resampling with code

+ +Here is the second unresolved reference:: + + sec-random-zero-through-ninety-nine: + +This is more easily resolved, because we see this in the +``sampling_tools.html`` page:: + + Section 6.3.3 + +as well as some matching lines in ``resampling_with_code2.html``:: + +
  • 6.3.3 Random numbers from 0 through + 99
  • + +
    +

    6.3.3 Random numbers from 0 through + 99

    + + +Now consider another notebook ``billies_bill.Rmd`` derived from the +``about_technology.Rmd`` page, with generated HTML ``about_technology.html``. +One reference in the notebook is:: + + sec-running-own-computer + +This is an internal reference (to the same page as the source of the notebook). +The generated HTML has:: + + Section + 4.10. + +To find potential cross reference, resolutions, were therefore process the +whole HTML tree. We go through each page and look for Quarto references hrefs +(labeled as such with "quarto-xref" class), as well as the span-ids from the +titles of the pages. + +We reprocess the crossreference hrefs to add the page reference for each +anchor reference. That is:: + + + +becomes:: + + + +We compile a dictionary where the anchor text is the key (e.g. +``#fig_mercury_reserves``) and the processed href tag is the value (e.g. ````. + +This is the generated `xrefs` property of the ``NBProcessor`` class. + +We still haven't covered the case of the cross-references to pages, as these +no longer have their anchors (Quarto section titles) to use as keys. For +these we go to the matching page of the cross-ref, and fetch the id. So for +the cross-ref above:: + + Chapter + 6 + +We go look for the matching id in the title for that page, and generate a key, +value pair in the xref dictionary accordingly; the key is +``#sec-resampling-two`` from the title span above, and the value is the href +tag above. + +Next we look through the Markdown of the generated notebooks. Each undefined +(``quarto-unresolved-ref``) reference has an anchor. We just replace the +whole unresolved ref tag with the matching tag in xrefs (parsed from the +HTML). +""" + from argparse import ArgumentParser, RawDescriptionHelpFormatter from copy import deepcopy from functools import partial