Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new thumbnail matching #16071

Open
i30817 opened this issue Jan 2, 2024 · 6 comments
Open

new thumbnail matching #16071

i30817 opened this issue Jan 2, 2024 · 6 comments

Comments

@i30817
Copy link
Contributor

i30817 commented Jan 2, 2024

I'd like to understand how the new thumbnail matching works, in the context of fetching the thumbnail to see how contributors should go about it @zoltanvb

I understand that previously RA got its thumbnails from db_name of each entry to get the system and the label to get the name which then applies a denylist of illegal characters replacing them by underscore.

If the user had used the manual scanner without a .dat file, the label would be the same as the filename in all cases (instead of just a vast majority due to no-intro \ redump naming schemes sometimes replacing illegal characters by legible alternatives instead of underscore). In normal labels there are no underscore replacements, but that happens when the download happens, so it's only cases like Title: subtitle being turned into Title - subtitle in redump filenames where the label thumbnail variant should exist.

Therefore, thumbnail makers tended to use the filename and in the few cases where it didn't match the .dat game name (label) after illegal characters replacement by _, added that variant too. This allowed just one server call, which is good for limited platforms.

Now, there was a commit a while ago to make the playlist present the filename, even in non manually scanned playlists. I thought that was simply cosmetic and it still used the underlying playlist label to fetch thumbnails, and even if it wasn't for well made thumbnails set it wouldn't matter, so I did not care.

Then the new thumbnail download retry pr was committed, and I don't get it.

It apparently retries, first the filename thumbnail (why??? The previous label check worked for both because the manual scanner places the filename on the label without a dat), then label (duplicating work in the majority of the cases of nonmatches) then the 'short' (cutting off the name in the first parenthesis, not sure if all variants of parenthesis or just the round ones - I of course, agree with this one).

In short, I think there might have been a misunderstanding of the scheme the thumbnails and the manual\automatic scanner follow and there is a extra useless server call? Label and short should be enough?

@i30817
Copy link
Contributor Author

i30817 commented Jan 2, 2024

I also hope the recent minor changes commit uses 'the first db_name' just for databases and not thumbnails, because playlists actually save that by entry (for instance, for history playlist I think, or another hypothetical multiplatform playlist). If that is already so, ignore this post.

But from the commit message it really seems not...

Ie: the message indicates it's getting that database names from info files instead of entries in the playlist, which I suppose can be got from the core name. But this is still not a guarantee of the right system for the thumbnails like db_name is, afaict, for instance, because default_core_name, default_core_path can be empty (history), and core_name, core_path can be 'DETECT'.

If this is a attempt to only read the system name once, I suspect it's a bad optimization, especially if later people want to do smart playlists features (like 'all two player games and not fighting-games') or just do something Iike that manually.

@zoltanvb
Copy link
Contributor

zoltanvb commented Jan 3, 2024

Hi, thank you for caring about this topic. Not sure if I can put a response to all questions now, in this case I will return later.

About character replacements, No-Intro naming convention says:
the following Low ASCII characters are NOT allowed:
\ / : * ? " < > | `

RetroArch code has one difference to this - it was like that and I did not make any change in this. Ampersand ("&") is also replaced to underscore. I checked the thumbnail repositories, and uploaded files do not have any of the other characters, but for the ampersand, there are sometimes double versions, as an example:

Itchy & Scratchy Game, The (USA, Europe).png
Itchy _ Scratchy Game, The (USA, Europe).png

There is no intended change for this part.

@i30817
Copy link
Contributor Author

i30817 commented Jan 3, 2024

Yes I know about that replacement algorithm. Some dump sets replace & by ' and ' or vice versa, so I had to take care of that specifically in my fuzzy thumbnail downloader.
The whdload set does that.

I also ended up replacing all of the illegal windows characters (there are more, in the range 0-30-something, I want to say 31, but could be more), just in case Linux users had some in their local filenames.

When I mentioned ': ' vs ' - ' I was talking about some of the other transformation of game titles (labels in the RetroArch playlist context) redump and tosec do for filenames before the denylist. It's not unusual, but it's not in the 'official rules', but can be seen from context when looking at a .dat.

For instance (not a redump example but similar) this is one the scummvm discworld 2 with scummvm names with the label (as it appears in the playlist, the png asked for, and the actual filename, and the png sent for in that case):

Discworld II: Missing presumed …!? (CD/DOS/English(US))
Discworld II_ Missing presumed …!_ (CD_DOS_English(US)).png

Discworld II_ Missing presumed …!_ (CD_DOS_English(US)).scummvm
Discworld II_ Missing presumed …!_ (CD_DOS_English(US)).png

As you can see, the result for the png is the same in this case, so you only need one thumbnail in the server, and only one call. If for some reason the scummvm file had replaced the actual title ': ' for ' - ' like redump sometimes does, it would need 2 (3 with the new metadata chopped off fallback).

Therefore, thumbnail creators that care enough put both versions in the server, the label and the filename because of the way RetroArch can present both and both can be used as argument to get the thumbnail.

It's that subtilty that I think the PR missed, there is no actual need to query both since the scanners are purposefully made to place both in the labels, with the automatic always going for the game name\label and the manual doing both filename and name\label depending on setting. The only fallback the PR needed for the intent was the 'chop off metadata' fallback.

At least in most cases where the thumbnail server is well done.

Look, I just think you're trading latency and performance for a perceived convinience of thumbnail creators that is worse than what's there now because now the rules are unclear. Do you get the thumbnails named after the .dat name, filename or both? Previously it was 'only add the label\name if it would be different from the filename', now its 'add both, or one, plus the metadata-less fallback'.

More flexible rules at the cost of performance is not a good deal. For the metadata null versions thats ok, since lots of people don't use redump names but would get the (american\main country) thumbnails if that was done because people often already put in those names\thumbs as a source of linked symlinks so that's a win for people with non default sets, if a partial one with possible false positives.

A extra useless server call? No, I don't think it's good. If fact, I think you can have the best of all worlds if you check that the png queried is not the same on all 3 calls if you insist in 3 calls (id probably only do 2 anyway because playlist responsiveness is nice, but it's probably not that bad to compare the strings after replacing the illegal characters in the label and cutting off the metadata in the 'short' whichever source that uses).

The vast majority of label\filename derived thumbs will be the same. And some short ones too if the original source didn't have metadata.

@zoltanvb
Copy link
Contributor

zoltanvb commented Jan 3, 2024

the label and the filename because of the way RetroArch can present both and both can be used as argument to get the thumbnail.
While this is technically true, it needed a specific setting from the user, as opposed to just working automatically.

if you check that the png queried is not the same on all 3 calls
This is definitely reasonable, and I had that in mind but the code structure turned out to be a bit more complex than what I hoped for. Anyways, now I have an idea on how to handle that, so it will come eventually.

The vast majority of label\filename derived thumbs will be the same.
If it was just those databases where redump/no-intro sets are well formulated and maintained (where this is probably true), I would not have bothered with this. The main motivation was to handle those typical home computer systems better, where there is a huge e.g. TOSEC list with several different cracks, etc. of the same game.

The minor change commit enhances the case when there is no database associated, such as an entry in History playlist which was opened with Load Content directly. It should not have any effect in normal playlists, or when history entry is coming from a playlist.

@i30817
Copy link
Contributor Author

i30817 commented Jan 4, 2024

the label and the filename because of the way RetroArch can present both and both can be used as argument to get the thumbnail.

While this is technically true, it needed a specific setting from the user, as opposed to just working automatically.

Using the manual scanner is very common especially with sets that RetroArch won't scan. That by default always uses the filename (if not providing a dat file). The automatic scanner always uses the label, since if it finds the game, it has a label in the database.

I still believe that not following the standard the old code used of only trying the label is very much wasted effort and might make the thumbnail makers more lackadaisical about what thumbnail names they put in, but if you managed the 'check if the names are the same' thing at least it won't remove too much performance in the actual playlists. Note that there were people complaining about the performance of changing games image display quickly when the image was in actual local cache (hopefully you save and read the thumbnail as the label in the local filesystem so the code when reading the local doesn't have to care to check two possible names right?)

I suspect you're mentioning the new setting to display playlist names as the filename, which I actually very much hoped didn't actually affect thumbnails download.

The minor change commit enhances the case when there is no database associated, such as an entry in History playlist which was opened with Load Content directly.

Good to know.

@i30817
Copy link
Contributor Author

i30817 commented Jan 4, 2024

Particularly I also think thumbnail makers are not very enthusiastic about adding TOSEC names, and I can't blame them myself. They're a mess. The C64 tosec playlist is 64 mb! And it's just text. Now imagine that as thumbnails, and often duplicated thumbnails because the label and the filename are so different often in that godforsaken dump group.

Even with my fuzzy thumbnail downloader while stripping metadata, the nearest thing I have to a tosec set, the whdload set, which is a set without spaces between words, capitalized words, and that replaces and by &, I can only get a 75-80% coverage. Most of that is just missing thumbs in the server but not all I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants