Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Duplicate images #6

Open
jorgb90 opened this issue Apr 17, 2024 · 3 comments
Open

Request: Duplicate images #6

jorgb90 opened this issue Apr 17, 2024 · 3 comments

Comments

@jorgb90
Copy link

jorgb90 commented Apr 17, 2024

Great addition to this already great plugin would be an option to remove duplicates. Duplicate images occur during importing of products in bulk for example.

@hostep
Copy link
Member

hostep commented Apr 17, 2024

Hi @jorgb90: can you give me some more info about this particular situation?

The tool will already remove any file which isn't referenced in the database, so even if duplicated images exist on filesystem, they should get removed if they aren't referenced in the database.

So I'm not quite sure what you mean exactly? Some concrete example could help here.

Thanks!

@jorgb90
Copy link
Author

jorgb90 commented Apr 17, 2024

@hostep I have a Magento setup which has all images per product three times. I first thought this was happening because of updating the products through imports, but upon further investigation its happening because its uploading them for global and 2 storeviews.. :/ I guess we just need to delete the duplicates in the storeviews..

My initial thought was they are uploaded and are available in the database and filesystem, so currently won't be detected by this extension, but the hash of those images should be the same since its the same image. This way that can also be cleared up and prevent duplicate images.

@hostep
Copy link
Member

hostep commented Apr 17, 2024

Aha okay, it makes sense now.

Do the database entries use the exact same filename for global and storeview values? Or are the files also duplicated on disk with a different filename?

If they are the same filename, I think by using the EAV Cleaner module with its eav:attributes:restore-use-default-value command might be able to get rid of those unneeded values in the database.

If they aren't the same filename, then I guess we could add some checking in this module for this by:

  • looping over all products that have more than a single image (global + storeview values combined)
  • calculate the hash of each image file
  • if a duplicated match is found, we can try to delete the duplicated value from the database
  • after this command is over, running our catalog:images:remove-unused-files command then should delete the duplicates from disk as they are no longer in the database

This is probably a very expensive operation and might take a lot of minutes if not hours to run if you have tons of products. So I'm not sure yet if we will have time to implement this and if it's worth it at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants