Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Proper WMF/EMF image handling in PPTXReader #17819

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Leon-Sander
Copy link

@Leon-Sander Leon-Sander commented Feb 14, 2025

Description

This PR fixes a bug in PPTXReader where .wmf and .emf images cause crashes due to PIL.Image.open() not supporting these formats.

Key Changes:

  • If LibreOffice is installed, .wmf and .emf images are automatically converted to PNG before processing.
  • If LibreOffice is not available, an error message string is returned instead of crashing.
  • Cross-platform support: Works on Windows, Linux, and macOS.
  • Exception handling added for PIL.Image.open() to prevent crashes from unreadable or corrupt images.
    • If exception handling is not desired, I can remove it, and the code will behave as before by raising an exception.

Fixes: #17820


New Package?

  • Yes
  • No

Version Bump?

  • Yes
  • No

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • I manually tested this with PowerPoint files containing .wmf and .emf images.
  • I added new unit tests to cover this change.
  • I believe this change is already covered by existing unit tests.

Suggested Checklist

  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • My changes generate no new warnings.
  • New and existing unit tests pass locally with my changes.
  • I ran make format; make lint to ensure proper formatting.

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 14, 2025
output_path = file_path.with_suffix(".png")

if sys.platform == "win32":
libreoffice_path = r"C:\Program Files\LibreOffice\program\soffice.exe"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhhh this is a pretty strong assumption on the location of this executable

Copy link
Author

@Leon-Sander Leon-Sander Feb 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, how about just searching for the path first, and if nothing was found checking for default installation paths as fallback. I also created a new function with the single purpose of finding the libreoffice path, and put the basic imports into the import section.

    def find_libreoffice(self) -> str:
        """Finds the LibreOffice executable path."""
        libreoffice_path = shutil.which("soffice")

        if not libreoffice_path and sys.platform == "win32":
            # Check common installation paths on Windows
            possible_paths = [
                r"C:\Program Files\LibreOffice\program\soffice.exe",
                r"C:\Program Files (x86)\LibreOffice\program\soffice.exe",
            ]
            libreoffice_path = next(
                (path for path in possible_paths if os.path.exists(path)), None
            )

        if not libreoffice_path:
            raise OSError(
                "LibreOffice (soffice) not found. Please install LibreOffice or add it to your system PATH."
            )

        return libreoffice_path

    def convert_wmf_to_png(self, input_path: str) -> str:
        """Convert WMF/EMF to PNG using LibreOffice."""
        file_path = Path(input_path)
        output_path = file_path.with_suffix(".png")

        libreoffice_path = self.find_libreoffice()

        subprocess.run(
            [
                libreoffice_path,
                "--headless",
                "--convert-to",
                "png",
                "--outdir",
                str(file_path.parent),
                str(file_path),
            ],
            check=True,
        )

        return str(output_path)

@Leon-Sander Leon-Sander force-pushed the fix-pptx-reader-wmf-emf-handling branch from 749075e to 3a53caf Compare February 16, 2025 23:59
@Leon-Sander Leon-Sander force-pushed the fix-pptx-reader-wmf-emf-handling branch from 3a53caf to e316a1c Compare February 17, 2025 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: PPTXReader crashes when handling WMF/EMF images
2 participants