-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use thumbnail_url
for thumbnail generation when present
#675
Use thumbnail_url
for thumbnail generation when present
#675
Comments
Just a heads-up, the field name in the API right now is just |
hey, @krysal @sarayourfriend @stacimc can I work on this, I think I can close this. |
Sure @sahil-R, please feel free to do so! I'll assign it to you. |
Thank you @dhruvkb |
Following @obulat's request of evaluating the stored thumbnails, I ran a couple of queries on the API database to investigate. See the providers with thumbnailsSELECT DISTINCT(provider) FROM image WHERE thumbnail IS NOT NULL ORDER BY provider ASC;
+--------------------+
| provider |
|--------------------|
| 500px |
| eol |
| flickr |
| iha |
| met |
| nappy |
| phylopic |
| rawpixel |
| rijksmuseum |
| sciencemuseum |
| sketchfab |
| smk |
| stocksnap |
| thingiverse |
| thorvaldsensmuseum |
| waltersartmuseum |
| wordpress |
+--------------------+
SELECT 17
Time: 1950.363s (32 minutes 30 seconds), executed in: 1950.335s (32 minutes 30 seconds) Following this result, I was curious to know how many thumbnails we have for each provider, omitting '500px', 'eol', 'iha', 'thorvaldsensmuseum' and 'waltersartmuseum' providers because those are disabled at the moment. Thumbnails availability by (active) providerSELECT provider,
COALESCE(SUM(CASE WHEN thumbnail IS NOT NULL THEN 1 ELSE 0 END), 0) AS thumbs_available,
COALESCE(SUM(CASE WHEN thumbnail IS NULL THEN 1 ELSE 0 END), 0) AS thumbs_not_available
FROM image
WHERE provider IN (
'flickr', 'met', 'nappy', 'phylopic', 'rawpixel', 'rijksmuseum', 'sciencemuseum', 'sketchfab', 'smk', 'stocksnap', 'thingiverse', 'wordpress'
)
GROUP BY provider
ORDER BY provider ASC;
+---------------+------------------+----------------------+
| provider | thumbs_available | thumbs_not_available |
|---------------+------------------+----------------------|
| flickr | 497009314 | 911578 |
| met | 128707 | 234675 |
| nappy | 2059 | 0 |
| phylopic | 3145 | 1011 |
| rawpixel | 26239 | 136508 |
| rijksmuseum | 30000 | 0 |
| sciencemuseum | 82558 | 12143 |
| sketchfab | 37872 | 0 |
| smk | 106651 | 1416 |
| stocksnap | 35436 | 1625 |
| thingiverse | 32659 | 0 |
| wordpress | 2518 | 3202 |
+---------------+------------------+----------------------+
SELECT 12
Time: 2324.308s (38 minutes 44 seconds), executed in: 2324.287s (38 minutes 44 seconds) The 'nappy', 'rawpixel', 'stocksnap' and 'wordpress' provider were recently added so I assume their thumbnails are acceptable. For the rest, I queried some samples and gathered some notes. Old providers with thumbnails storedThe size shown is only for the sample at the right, it does not apply to all of the supplier's.
The only thumbnails not working are from 'phylopic', so I'll omit them. For the rest, some are small but since we don't have specific quality requirements at the moment I'll assume they are fine as long as they are online. |
Thank you for this amazing investigation, @krysal!
I would prefer not to use the small thumbnails (flickr, rijksmuseum, sciencemuseum in the table) because when we discussed this issue (https://wordpress.slack.com/archives/C02012JB00N/p1678203055330969), the lowest width mentioned seems to have been 400px. Having said that, Flickr might be OK to use because I suspect that the smaller thumbnails are from older images, and newer ones will have larger thumbnails available soon. |
@obulat how would you suggest proceeding in that case? Set the thumbnail to null for all other providers beyond Flickr that don't have at least 400px to a side? |
I'd like to leave the code for using thumbnails without exception and address the small thumbnails in consequent issues. As I said, since they're working, it should be okay, and if it's important to have better thumbnail quality we can rise WordPress/openverse-catalog#905's priority and the sub-issues that may come from there. |
From a frontend perspective I really wouldn't want to go live with thumbnails below the 400px size we identified. The thumbnails are a user's first glimpse at potential results and if they're pixelated and blurry it's going to discourage them from ever using Openverse again. Using thumbnails which are smaller then we currently use feels like a big usability reduction that I don't think we should ship. |
On the catalog side, I can think of two approaches for providers with no acceptable thumbnail available: either use the same URL as the main image or set the thumbnail to None. The first approach would mean that the code in the API is the same for all providers: take the Based on what we do in the catalog, the API would either always use the I have one more question about this issue. Do we want to thumbnail a thumbnail?. If we already have a thumbnail URL in the database, do we want to generate our own thumbnail out of it? Would this adversely affect the quality of the resulting image? What are the reasons for not using the provider's thumbnail link directly? Provide the Openverse cache layer for it to prevent dead images for the cached items? |
Considering the implications of bad thumbnail quality on users' perceptions, I would prefer for us to go in another direction: start with only the providers that have good thumbnails (SMK and Met), and add other thumbnails when the other providers' thumbnail quality improves. |
@obulat @zackkrida So are you suggesting we should delete existing thumbnails for rijksmuseum and sketchfab? Flickr is the main source from were the
That number was also quite arbitrary, does it mean we should lower the default thumbnail size to 400px as well? We currently use 600px, which I thought was a high number too. openverse/api/catalog/settings.py Line 220 in 8e9d23a
I think the last point is a bit of overreaction as well. They might seem a little blurry in a piece of high-spec equipment and for an eye keen on the details, but most internet users nowadays navigate using a mobile phone, where these sizes should be acceptable and in fact, small images are preferred for data savings. These thumbnails are coming from our providers and we usually go with the idea that if do that then it's fine for us until we can come up with a better solution (in this case, review each provider and get better thumbnails).
The thumbnail column should be empty if there is no thumbnail. Using the same link of the image will make the column lose its semantic meaning and it's an unnecessary redundancy.
Avoiding this complexity is precisely what I'm trying to do here, it's just delaying an issue. Let's work on the source of the problem instead. Providers should have acceptable thumbnails in the thumbnail column, if not, the direct image URL is used.
As far as I know, these questions are solved on the side of Photon. I don't appreciate a high reduction in the quality (depending on the size). |
Would it be possible for us to test this out on staging? It's hard to assess how this might impact the search view since we're presently generating our own thumbnails for all of these providers. I think given whatever we discuss here, we'll also need to update our ingesters (or make note for future ingesters) that we should only record a thumbnail if it meets a minimum size criteria. |
My only goal is to prevent us from shipping a noticeable visual regression to users. In the case of Openverse, 90% of our users are on desktop: On a typical laptop screen the dimensions of each image thumbnail are around ~2-300px. When we factor in contemporary high DPI displays that means 4-600px is an ideal size. Anything smaller is going to stretch. Here's an example with a 200x200 image of a cat which at its default size is sharp and high quality: I don't feel qualified to manage the technical approach to how to fix the PR. Perhaps deleting all non-viable thumbnail urls will be the best. |
Good idea! As @sarayourfriend will perform tests in staging with near-production data size, the timing couldn't be more perfect. We just need to merge #1331 before.
Agree, I can do that after @zackkrida or @obulat (or anyone else willing to) evaluate the use of the stored thumbnails in staging. and delete the bad ones by providers if needed. Hope this gives us more confident in this solution. |
Problem
Some providers (like SMK) may link to quite large images in their
image_url
. This can cause unacceptably slow response times or even timeouts when generating thumbnails via our thumbnail service.Description
We already have a
thumbnail_url
available on the Image model which we can use.When the
thumbnail_url
is available, we should send this to the thumbnail service instead of theimage_url
. This should be a small change here:(using the serialized field names)
Additional context
#1450 tracks updating the SMK provider script in the Catalog to populate
thumbnail_url
with a link to a smaller image size.Implementation
The text was updated successfully, but these errors were encountered: