Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat interlinks fast #253

Merged
merged 10 commits into from
Aug 31, 2023
Merged

Feat interlinks fast #253

merged 10 commits into from
Aug 31, 2023

Conversation

machow
Copy link
Owner

@machow machow commented Aug 25, 2023

This PR addresses #249 , by adding a "fast" mode via the quartodoc interlinks --fast command.

This should cut parse time to about 1/5th what it was. Note that the package being generated still produces a json inventory. I think we eventually should swap that out with json (and move it into _inv, and later the .quartodoc scratchpad folder).

@machow
Copy link
Owner Author

machow commented Aug 25, 2023

@has2k1, do you mind checking to see if this speeds up your plotnine build? I pulled the plotnine quartodoc PR and built, and it seems to cut the time down to ~3.5mins for the API docs.

You should just need to add the fast option to your _quarto.yml config:

interlinks:
  fast: true
  ...

Then run

quartodoc build
quartodoc interlinks
quarto preview

The build and interlinks command should produce inventory files that are .txt instead of .json now...!

@machow machow marked this pull request as ready for review August 26, 2023 14:58
@has2k1
Copy link
Contributor

has2k1 commented Aug 30, 2023

@has2k1, do you mind checking to see if this speeds up your plotnine build?

On my laptop build time is 10min down from 40min.

I think for plotnine I have all the pieces to close this out.

  1. No JSON
  2. Inverted Inventory so no need to search all items.
  3. Compile and cache inventory so no need to rebuild it for every page
  4. Implicit py domain and all others explicit.

All this together leaves plotnine with different interlinks code, the results would be the same though.

@has2k1
Copy link
Contributor

has2k1 commented Aug 30, 2023

Is there a way to rationalise removing fast: true config option? It looks like there is an issue of keeping the python interlinks code in sync with the lua-filter.

Maybe an error message that prompts the user to either (or both of):

  1. Run quartdoc interlinks
  2. Update the lua-filter

@machow
Copy link
Owner Author

machow commented Aug 30, 2023

All this together leaves plotnine with different interlinks code, the results would be the same though.

Just to double check, when you say it dropped from 40 minutes to 10 minutes, are you saying you ran this PR and it dropped the time to 10 minutes? If you also know the time from your custom interlinks filter that would be useful to know, too!

On my laptop build time is 10min down from 40min.

What version of quarto are you using? Later versions of quarto have dropped some of the time cost per render, so I wonder if the 10 minutes could be cut down by building with a pre-release

@machow
Copy link
Owner Author

machow commented Aug 30, 2023

It looks like there is an issue of keeping the python interlinks code in sync with the lua-filter.

I'm not sure I understand. Do you mind giving me an example of how to produce an error / bad behavior? Are you saying that if a user with the old lua filter sets fast: true, then something bad happens? (I think it will silently ignore the .txt files, so won't load the inventories, which is not great :/ )

The fast: true is for backwards compatibility for now, but I think we may also be able to set and check the filter version, to flag if fast: true is set, and the filter needs to be updated.

@has2k1
Copy link
Contributor

has2k1 commented Aug 31, 2023

Just to double check, when you say it dropped from 40 minutes to 10 minutes, are you saying you ran this PR and it dropped the time to 10 minutes? If you also know the time from your custom interlinks filter that would be useful to know, too!

Yes this PR got the time down to 10 min.

The inverted inventory (.2 above) plus the trick with the compiling and caching (.3 above) got about 7 min. This involved a single reading of JSON files once.

Putting it all (1 - 4 above) together I expect the time to be about 7 min on my end. The 3 min is the difference between repeatedly creating an inventory from text files VS repeatedly loading compiled inventory.

quarto-1.3.450

@has2k1
Copy link
Contributor

has2k1 commented Aug 31, 2023

The fast: true is for backwards compatibility ...

I got that.

... for now,

Because of the "for now", I don't think there should be a config option. The config option will be redundant soon, but it will live on much longer because it was once required! As everyone wants "fast", new quartodoc users will see the option in some random config and may include it because they want "fast", yet they already do have "fast".

What if the option to have fast: true in the config is off the table, what are the options? Maybe in the lua-filter, detect the filetypes in _inv/ (json or txt) then do the right thing. If dealing with JSON files issue a future warning.

The deeper issue is, quartodoc and the interlinks lua-filter are coupled two times:

  1. by the _inv/ files
  2. by project source repo, yet they are installed via different means

Effectively, the interlinking implementation is split across two projects. I think this will create a bit of confusion. It is a hard one to resolve, but here is a try.

qaurtodoc should:

  • know the version compatibility of its (first party) lua filters
  • check for compatibility when the filters are enabled
  • install, upgrade and delete filters. i.e. some cli management

I think that is more forward looking than the option of having all interlinks implementation in the lua-filter!

@machow
Copy link
Owner Author

machow commented Aug 31, 2023

Doing those three things (check filter version, verify compat, manage filters somehow) make a lot of sense. You've convinced me this needs to be tackled soon (as part of comprehensive, out of the box support for interlinks), but I likely won't be able to pick it up for the next week (or maybe after posit conf).

I'm going to merge as is, so we can see how the new parsing does, while allowing a fallback to the old behavior. Once we reach a place where we want to remove the fast: option altogether, I feel okay with emitting a warning that it no longer needs to be set.

Opened issue on validation: #259

@machow
Copy link
Owner Author

machow commented Aug 31, 2023

Thanks for unearthing this crazy source of slowness, and digging so deep into optimizing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants