Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling expensive data sources #75

Open
Waitak opened this issue Nov 29, 2024 · 1 comment
Open

Handling expensive data sources #75

Waitak opened this issue Nov 29, 2024 · 1 comment
Labels
question Further information is requested

Comments

@Waitak
Copy link

Waitak commented Nov 29, 2024

Thank you for making this wonderful package available. As I mentioned in a Slashdot question I have a use case that I wonder if it handles. I'd like to use pandoc-plot to generate a number of charts based on a Pandas data frame that is computationally costly to construct. I thought initially that I could use the preamble parameter to invoke a script that constructs the data frame, but that isn't working. Do you have any suggestions?

@LaurentRDC LaurentRDC added the question Further information is requested label Nov 29, 2024
@LaurentRDC
Copy link
Owner

Hi Scott,

Thank you for the kind words.

Out-of-the-box there's no handling of your use-case in the pandoc-plot filter. Each code block that gets turned into a plot is intended to be independent from all others. This has many benefits, most importantly performance -- I wrong pandoc-plot for book-sized workloads, with close to 100 figures.

The reason using preamble isn't working is because the preamble script gets copy-pasted into every code block before pandoc-plot renders a figure. Therefore, the creation of your dataframe will still be duplicated.

I would recommend you proceed with a script to wrap your usage of pandoc. For example (assuming you use bash):

# Run a script that goes through your expensive computation,
# storing the results as a CSV i
python create-data.py

# Render the document, where plots can reference the file created by 
# your python script instead of re-creating the pandas dataframe for every plot
pandoc -f pandoc-plot ...

# Clean up temporary data file if you know where it is

You can communicate between the bash script above and your document plots using environment variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants