Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove scipy dependency #253

Open
r-bit-rry opened this issue Mar 18, 2025 · 4 comments
Open

Remove scipy dependency #253

r-bit-rry opened this issue Mar 18, 2025 · 4 comments

Comments

@r-bit-rry
Copy link

Its a heavy dependency which requires external libraries and tooling installed.
We are only using it for the scipy.ndimage.zoom method which is provided both by Pillow and OpenCV which are already a depepndency for this project.

Of the top of my head these two implementations can replace zoom method, note that the opencv one will be the best performant:

from PIL import Image
import numpy as np

def pillow_zoom(input_array, zoom_factor):
    """Resize using Pillow with similar functionality to scipy.ndimage.zoom"""
    if input_array.ndim == 2:
        # Handle 2D arrays (single-channel images)
        img = Image.fromarray(input_array)
        new_size = tuple(int(s * zoom_factor) for s in input_array.shape)
        # Swap dimensions for PIL (width, height)
        resized = img.resize((new_size[1], new_size[0]), Image.Resampling.BICUBIC)
        return np.array(resized)
    elif input_array.ndim == 3:
        # Handle multi-channel images
        results = []
        for i in range(input_array.shape[0]):
            results.append(pillow_zoom(input_array[i], zoom_factor))
        return np.stack(results)
    else:
        raise ValueError("Unsupported array dimension")

or

import cv2
import numpy as np

def opencv_zoom(input_array, zoom_factor):
    """Resize using OpenCV with similar functionality to scipy.ndimage.zoom"""
    if input_array.ndim == 2:
        # Single channel 2D array
        new_shape = tuple(int(s * zoom_factor) for s in input_array.shape)
        # Note: cv2.resize takes (width, height) which is opposite of numpy's (height, width)
        return cv2.resize(input_array, (new_shape[1], new_shape[0]), interpolation=cv2.INTER_CUBIC)
    elif input_array.ndim == 3:
        # Multi-channel or 3D array
        results = []
        for i in range(input_array.shape[0]):
            results.append(opencv_zoom(input_array[i], zoom_factor))
        return np.stack(results)
    else:
        raise ValueError("Unsupported array dimension")
@Blaizzy
Copy link
Owner

Blaizzy commented Mar 20, 2025

I agree with you!

Could you send a PR and add some metrics like performance (acc and performance)?

@asmeurer
Copy link
Contributor

Also scipy is pinned in the requirements, which makes it harder to install mlx-vlm in an environment with other packages.

@r-bit-rry
Copy link
Author

on it

@r-bit-rry
Copy link
Author

#268

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants