Skip to content

Conversation

@fk128
Copy link
Contributor

@fk128 fk128 commented Mar 16, 2025

Add support for selecting specific pages in multi-page PDFs

In LaTeX, when including a multi-page PDF as a graphic, it's possible to specify a page number:

 \includegraphics[width=1.0\textwidth,page=3]{figures/my_pic.pdf}

However, the page parameter is currently ignored, causing all the links to point only to the first page, even though ImageMagick extracts and converts all pages.

This PR adds support for the page parameter, ensuring that the correct page is selected when converting multi-page PDFs.

@changeset-bot
Copy link

changeset-bot bot commented Mar 16, 2025

🦋 Changeset detected

Latest commit: 56921a5

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 5 packages
Name Type
myst-cli Patch
tex-to-myst Patch
myst-to-tex Patch
mystmd Patch
myst-migrate Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Member

@rowanc1 rowanc1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks great. Left a few minor things.

Could you also add page with a comment here:

https://github.com/executablebooks/mystmd/blob/main/packages/myst-spec-ext/src/types.ts#L117

e.g.

export type Image = SpecImage & {
  urlSource?: string;
  urlOptimized?: string;
  height?: string;
  placeholder?: boolean;
  /** Optional page number for PDF images, this ensure the correct page is extracted when converting to web and translated to LaTeX */
  page?: number;
};

} ${output}`;
session.log.debug(`Executing: ${executable}`);

session.log.info(`Executing: ${executable}`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
session.log.info(`Executing: ${executable}`);
session.log.debug(`Executing: ${executable}`);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

import type { Handler, ITexParser } from './types.js';
import { getArguments, texToText } from './utils.js';
import { getArguments, extractParams, texToText } from './utils.js';
import { group } from 'console';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

}
if (params.page) {
if (typeof params.page === 'number') {
params.page = Number(params.page) - 1; // Convert to 0-based for imagemagick
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to round or parse this or have something similar to Number.isFinite(Number.parseFloat(params.page))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! Unless I'm mistaken, I think the extractParams should already have done the parsing, but I've added the missing Number.isFinite and the rounding.

@agoose77
Copy link
Contributor

agoose77 commented Mar 17, 2025

@fwkoch @rowanc1 I'm wondering whether it makes sense to bring page awareness to our normal AST. Should we not treat this as a post-tex transform, and rewrite the image node with the proper figure?

Copy link
Member

@fwkoch fwkoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at this PR now for the first time - looks great, and I think we should work on getting it landed.

First, just to summarize my understanding, this has two parts: (1) On parsing a tex file, this looks at image params and adds page to the image node (alongside width, which was previously supported). (2) On image conversion, if page is present, imagemagick extracts the correct page.

A couple points:

  • @agoose77 - I think your concern was around adding page to the image node. It feels a little clutter-y since it only applies in a few specific cases. Thinking through the alternative a bit: If we do not store page on the node, we would need to do initial image processing at parse time, pulling out the specific page as a separate file. This separate file would still need final conversion to correct format. This means we have extra intermediate files stored on users' machines.
  • I noticed if this page-specific PDF image is selected as the "implicit" thumbnail, it also causes problems: thumbnail is just a file, not an entire image node. That means thumbnail does not get the page value, and is stuck trying to process/convert the full PDF file. There are certainly ways around this where we could get these thumbnails working... but I also think we might not need to worry about this edge case. An "explicit" thumbnail can always be set if there are errors with the "implicit" thumbnail.

My inclination is to keep this as-is. It works nicely and it's relatively simple. The only downside is an extra page field on image nodes that will usually be undefined.

);
}

export function extractParams(args: { content: string }[]): Record<string, string | number> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it has potential for wider use across other macros/parameter types. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants