-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for finding PVR Channels and PVR Broadcasts #7
base: master
Are you sure you want to change the base?
Conversation
kodi_voice/kodi.py
Outdated
except: | ||
pass | ||
wordified = wordified + word + " " | ||
return wordified[:-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a function for this already -- words2digits(). It also supports German.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was added after I forked. Should have checked, sorry. I've rebased onto the latest version and switched to the proper function.
name = re.sub(r"\sch\s", " channel ", name) | ||
name = re.sub("^channel", "", name) | ||
name = re.sub(r"(?<=\D)(?=\d)|(?<=\d)(?=\D)", " ", name) | ||
name = words2digits(name, lang=lang) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't call words2digits() here because the fuzzy matcher will do it later on. Also, have a look at the other things we added to sanitize_name() -- Amazon's new builder is more restrictive of what characters are allowed in slots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be easier just to call santize_name from within sanitize_channel?
I put in the words2digits here because I couldn't get the matching to work reliably otherwise, but I've probably misunderstood something somewhere...
Channels on the backend often have "number words" in the name e.g. "BBC One HD". When you say "ask kodi to switch to channel BBC One HD", you seem to get given the string "BBC 1 hd" in the JSON from amazon. I think it's this string that gets words2digits applied to it by the fuzzy matcher?
Without this words2digits in sanitize_channel the simple match fails (BBC 1 hd != BBC One HD) and I kept getting given CBBC HD by the digits2roman fuzzy match. What I've tried to do in sanitize_channel is make all the channel names more similar to the string you get from amazon so that the simple match stands a chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For text mangling, that's what the fuzzy matching is for. It shouldn't do this in sanitize_channel().
If we need to make adjustments to get it to work more reliably, we can, but this isn't the place to do that.
Furthermore, in German, Amazon spits out number-words rather than digits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. In that case I'll change it to call sanitize_name from sanitize_channel and remove the words2digits call? Are you happy with keeping the rest of sanitize_channel?
- switching + to plus (so channel 4+1 -> channel 4 plus 1)
- removing "channel" since it's so common it doesn't seem to be useful in the matching (i can combine the "ch" and "channel" regexps into one)
- putting breaks between numbers and words (4seven -> 4 seven)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think where you're getting confused is that sanitize_name() was for sanitizing the output from Kodi for the slots, primarily. It's not intended to sanitize the user's input, with the sole exception of a quick attempt at a straight ('simple') match.
We handle input-mangling (that is, stuff from Alexa) in Kodi.matchHeard(), where it tries to massage the input to more closely match how it's stored in Kodi.
kodi_voice/kodi.py
Outdated
@@ -479,6 +486,44 @@ def FindSong(self, heard_search): | |||
|
|||
return None, None | |||
|
|||
def FindPVRChannel(self, heard_search): | |||
print 'Searching for channel "%s"' % (sanitize_name(heard_search)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use heard_search.encode("utf-8") instead of sanitize_name(heard_search) here and in all other print statements. I know we used to do this, but with the more restrictive sanitization especially, it makes debugging a bit harder since the string we see in the log might not look anything like the actual input string.
edit: sorry, I posted my comments on the wrong PR >.> |
This is based on the work by @freemans13 here:
m0ngr31/kanzi#47
Hopefully I've managed to make it fit in with the new flask-ask layout.