Interlinks filter: support extremely large inventory files #249

machow · 2023-08-22T18:08:08Z

Large inventory files slow the interlinks filter. For sites with many pages (e.g. 100+), this can substantially increase build times.

Background

quartodoc uses this process for handling interlinks:

Running quartodoc interlinks downloads inventory files, and saves them as as json (e.g. _inv/python_objects.json)
The interlinks lua filter (installation, code) does this for each page being rendered:
- reads the json files
- replaces any uses of interlinks syntax with links derived from the json files

However, this raises the following challenges:

io, parsing: the files must be read and parsed every time a file is rendered. This delay can add up over a lot of files.
lookup: currently, we loop over every entry when looking up a link. This is inefficient, but I suspect fast in human time.

inventories like statsmodels.org are 10MB unzipped, and have ~40,000 entries, so they're not a huge time sink, but it adds up when multiplied by number of files. From what I can tell, it takes the lua filter ~.75 seconds just to read and parse this file (mostly due to parsing json).

Example

Run the following on the files below:

quartodoc interlinks (creates _inv folder with inventory files as json)
quarto render example.qmd --to gfm --output --

_quarto.yml:

filters:
  - interlinks

# interlinks are slow
interlinks:
  sources:
    python:
      url: https://docs.python.org/3/
    statsmodels:
      url: https://www.statsmodels.org/stable/
#   matplotlib:
#     url: https://matplotlib.org/stable/
#   mizani:
#     url: https://mizani.readthedocs.io/stable/
#   numpy:
#     url: https://numpy.org/doc/stable/
#   scipy:
#     url: https://docs.scipy.org/doc/scipy/
#   pandas:
#     url: https://pandas.pydata.org/pandas-docs/stable/
#   sklearn:
#     url: https://scikit-learn.org/stable/
#   skmisc:
#     url: https://has2k1.github.io/scikit-misc/stable/
#   adjustText:
#     url: https://adjusttext.readthedocs.io/en/latest/
#   patsy:
#      url: https://patsy.readthedocs.io/en/stable/

example.qmd:

---
---

[](`statsmodels.base.distributed_estimation`)

Potential Solutions

In order of complexity:

speed up parsing
- find a way to parse the json files faster, OR
- use a format that takes advantage of the fact that this data has a fixed structure (e.g. similar to a SQL table)
provide some kind of persistence (and only parse once)
- e.g. run a webserver during site rendering, serve data or answers over a file socket.
- e.g. as an extreme example we have a "we need something redis-like" problem, due to quarto's rendering approach.
- (other tools like sphinx or mkdocs provide these mechanisms out of the box)

The text was updated successfully, but these errors were encountered:

machow · 2023-08-22T18:55:33Z

cc @has2k1 who raised this as part of has2k1/plotnine#706, pairing w/ Carlos tomorrow on it

machow · 2023-09-25T17:00:31Z

Setting the fast option speeds up the loading of interlinks files. Instead of saving as json, it just saves the original inventories as a .txt, and parses in lua. (the json parsing provided by quarto for lua filters is very slow).

interlinks:
  fast: true

machow · 2023-09-25T17:00:46Z

https://machow.github.io/quartodoc/get-started/interlinks.html#experimental-fast-option

machow added the .epic label Aug 22, 2023

machow mentioned this issue Aug 25, 2023

Feat interlinks fast #253

Merged

machow added the plotnine label Aug 25, 2023

machow closed this as completed Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interlinks filter: support extremely large inventory files #249

Interlinks filter: support extremely large inventory files #249

machow commented Aug 22, 2023 •

edited

Loading

machow commented Aug 22, 2023 •

edited

Loading

machow commented Sep 25, 2023

machow commented Sep 25, 2023

Interlinks filter: support extremely large inventory files #249

Interlinks filter: support extremely large inventory files #249

Comments

machow commented Aug 22, 2023 • edited Loading

Background

Example

Potential Solutions

machow commented Aug 22, 2023 • edited Loading

machow commented Sep 25, 2023

machow commented Sep 25, 2023

machow commented Aug 22, 2023 •

edited

Loading

machow commented Aug 22, 2023 •

edited

Loading