Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better scene detection #76

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Better scene detection #76

wants to merge 11 commits into from

Conversation

plotski
Copy link
Collaborator

@plotski plotski commented Jun 9, 2020

See pm on bb.

pythonbits/bb.py Outdated
Comment on lines 46 to 55
def prompt_yesno(question, default):
while True:
choices = '[Y/n]' if default else '[y/N]'
choice = input('%s %s ' % (question, choices))
if not choice:
return default
elif choice.casefold() == 'y':
return True
elif choice.casefold() == 'n':
return False
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go into a separate module

log.error(str(error))

path = self['path']
modified = scene.check_integrity(path, on_error=handle_error)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if it thinks it's not a scene release, it queries the user every time?

The CRC check should stay as an option IMO, since it's the "safest". And it had the advantage of not querying the user if it wasn't found. My usecase for it is headless submissions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if it thinks it's not a scene release, it queries the user every time?

No. If it finds nothing, it's not a scene release. This should work all the time unless the release group or title were changed. This is an improvement over the old algorithm that always asked the user unless it found the exact title.

The CRC check should stay as an option IMO

I don't object as long as I can disable it.


import json
import os
import pickle
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for in fact writing tests, but I cannot accept pickle objects into the repository. This is inherently unsafe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong with pickle? How do you propose we store API responses for testing?

Copy link
Owner

@mueslo mueslo Jun 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pickled object is just that, an object, which can contain almost arbitrary code, i.e. it is a (remote/hidden) code execution vulnerability as you can't even inspect it properly. The API responses could just be stored in plaintext HTML or JSON. If there is HTML then the common structure should of course be factored out.

@plotski
Copy link
Collaborator Author

plotski commented Jun 13, 2020 via email

@mueslo
Copy link
Owner

mueslo commented Jun 13, 2020

@plotski you as the committer can still hide almost arbitrary code in there, which will get executed if you run tests. It's like a proprietary software blob. This is what I disagree with.

@plotski
Copy link
Collaborator Author

plotski commented Jun 13, 2020 via email

@plotski
Copy link
Collaborator Author

plotski commented Jun 14, 2020

API responses are now stored as plain text files. Are we good on the other points you raised?

Comment on lines +368 to +371
def _get_basename_noext(path):
# os.path.splitext() removes anything after the rightmost "." which is no good for
# scene release names.
return re.sub(r'\.[a-z0-9]{1,3}$$', '', os.path.basename(path))
Copy link
Owner

@mueslo mueslo Jun 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this instead of something like os.path.splitext(os.path.basename(path))[0]? I guess because you want to use it for directories too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> os.path.splitext(os.path.basename("The.Title.2000.x265-GRP"))
('The.Title.2000', '.x265-GRP')

@plotski
Copy link
Collaborator Author

plotski commented Jun 21, 2020 via email

Comment on lines +46 to +48
if 'release_group' not in guess:
log.debug('No release group found - cannot find scene release: {}', guess)
return None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now the scene check completely fails when you have a renamed file with the group removed. This is a regression. It's also meant to help people discover scene files that they in fact have but that may have been removed due to some renaming script.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you suggest? You can't really search for a release without the group and always asking the user is annoying.

To be honest, I've come to the conclusion that the scene rule is annoying. I've seen so many existing renamed and/or modified scene releases. I'm not sure if I really care about this pull request anymore. Nobody else seems to care about this tag.

Comment on lines 118 to 128
def _check_file(relpath, fspath, info):
"""
Return True if `path` is unmodified or None if not sure

relpath: Relative path that starts with the release name
fspath: Path that exists in the file system

It is important that `path` is relative and starts with the release/torrent name.

Raise SceneError if `path` does not match `info`
"""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call signature does not match documentation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@mueslo
Copy link
Owner

mueslo commented Jun 21, 2020

A scene file in a directory with the wrong folder name yields "scene: False". This is another regression, while it would be possible to detect that this is basically equivalent to a renamed scene release, previously it at least queried the user.

@plotski
Copy link
Collaborator Author

plotski commented Jun 21, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants