-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature innertube client fix #219
base: master
Are you sure you want to change the base?
Feature innertube client fix #219
Conversation
1da876d
to
5276825
Compare
Hi, it's been a while without any feedback. For
|
Great work on this; still need to finish reviewing. Should have
something back to you this weekend
|
Thanks for the feedback. It gives a peace of mind for me. No need to rush. I am open for any improvement suggestions to this pull request. |
youtube/util.py
Outdated
print('Unable to access ' + player_file) | ||
|
||
signature_timestamp = None | ||
signature_timestamp_cache = settings.data_dir + '/sts_' + player_version + 'txt' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend using os.path.join here. Also, .txt, not txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late reply.
I've used os.path.join
in place of string concatenation in this file and similar places.
youtube/util.py
Outdated
response_dict = json.loads(response) | ||
if settings.use_visitor_data: | ||
if not settings.use_po_token: | ||
if response_dict['responseContext'].get('visitorData'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't assume 'responseContext' will be present - otherwise it will raise an exception when youtube changes something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put this inside try ... except
block, with specific KeyError
exception message.
youtube/util.py
Outdated
if settings.use_visitor_data: | ||
if not settings.use_po_token: | ||
if response_dict['responseContext'].get('visitorData'): | ||
if not os.path.exists(visitor_data_file): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how the visitor system works - but do we want to refresh this file ever? Maybe YouTube issues an updated token for example and marks the old one as invalid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added os.path.getmtime
check to make sure that the visitorData.txt
file is less than 86400 seconds old before using its content. Otherwise, the visitor data file will be deleted and replaced with new one.
youtube/util.py
Outdated
else: | ||
if os.path.exists(visitor_data_file): | ||
print('Removing visitor_data file') | ||
os.remove(visitor_data_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For anonymity's sake - do we want to consider refreshing the visitor data every day? Again, not really sure what constraints go into it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, as I mentioned above.
server.py
Outdated
with open(visitor_data_file, "r") as file: | ||
visitor_data = file.read() | ||
file.close() | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use except Exception
, otherwise you'll catch KeyboardInterrupt and SystemExit: https://stackoverflow.com/questions/54948548/what-is-wrong-with-using-a-bare-except
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added except OSError
to notify if there is a file access error which prevents access to the visitor data file.
|
||
def extract_nsig_func(base_js): | ||
for i, member in enumerate(NSIG_FUNCTION_ARRAYS): | ||
func_array_re = regex.compile(member.replace('$', '\\$')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend against this; I would just put the three \
escapes you need into your regex instead of modifying it at runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only tried doing what inv_sig_helper
does, copying exactly the same regex pattern with runtime replacement of escaping dollar sign which only done once for as long as the extracted nsig_func_{player_version}.js
file exists.
The resulting n_sig_code is cached as data/nsig_func_{player_version}.js
and loaded as info['nsig_func'] = { player_version: js_nsig_decrypt_code } during runtime of the
youtube-local` session.
So the n_sig_function
extraction is only done once and the subsequent access to it is either loaded directly from the info['nsig_func']
dict or loaded from nsig_func_{player_version}.js
file if the file is already exists.
func_body_re = [] | ||
for i, member in enumerate(NSIG_FUNCTION_ENDINGS): | ||
func_body_re_item = '' | ||
func_body_re_item += func_context.group(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to do a re.escape()
on this before appending it to your regexes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
print('jscode len is: ' + str(len(jscode))) | ||
dukpy_session = dukpy.JSInterpreter() | ||
# Loading the function into dukpy session | ||
dukpy_session.evaljs(jscode) | ||
print('n_sig = ' + n_sig) | ||
#n_sig_result = dukpy_session.evaljs('decrypt_nsig("' + n_sig + '")') | ||
n_sig_result = dukpy_session.evaljs("decrypt_nsig(dukpy['n'])", n=n_sig) | ||
print('n_sig_result = ' + n_sig_result) | ||
return n_sig_result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we verified that dukpy has limited execution privileges? For instance, can javascript code executed with Dukpy make network requests or open files? If so it would be a massive security hole
Also recommend removing these debugging print statmenets when you're done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dukpy
is just a python wrapper for duktape js engine.
The pypi package of dukpy
has dukpy-install
command which is able to download npm packages from the internet.
Unless told to do so, dukpy
module doesn't access the internet for as far as I know. The nsig_func
doesn't need access to the internet during runtime, which I have verified doing manual n_sig
decryption using various python js bindings.
I also consider dukpy
as just-work lightweight js engine for python, since it has wheels for arm64
on pypi and armhf
on piwheels.org so if anyone runs this on their single board computers, they will hopefully meet no problems during runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirming that this works on arm64.
requirements.txt
Outdated
@@ -6,3 +6,6 @@ urllib3>=1.24.1 | |||
defusedxml>=0.5.0 | |||
cachetools>=4.0.0 | |||
stem>=1.8.0 | |||
fake-useragent>=1.5.1 | |||
flpc>=0.2.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you do any performance testing that suggested the need for this? Is there a noticeable speedup? Would rather avoid dependencies if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With no break
statement, the re
module will hang for some time. flpc
will not hang in finding the specified regex, even without break
statement in the for
loop.
I added the break
statement in the for
loop so the regex engine will be freed from work (i.e. testing another regex pattern) after a match is found, which mitigates hanging on the built-in re
module.
I've removed flpc
from the requirements to use the built-in re
module as you wish, with very small or no performance degradation during my extended testing.
youtube/util.py
Outdated
print("Debugging headers") | ||
for item in headers: | ||
print(item) | ||
print("Debugging data payload") | ||
print(json.dumps(data, indent=4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend removing these debugging print statmenets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Any chance you could make a version of this similar to the youtube local main branch that uses Python 3.6 or earlier so it can run on Windows 7? |
I actually tried several times to also build packages for So I reverted my github action recipe to only build for python 3.11 on Windows. |
Add a setting option to enable video download. Downloading is disabled by default.
Use fake useragent library instead of hardcoded useragent.
Allow selecting innertube client via settings page.
Update ios and web innertube client context.
Make the loading of age_restricted contents configurable. Disabled by default.
Add functional mweb client. Also several features included: - Fix sig decryption code - Add nsig decryption with function body extraction taken from iv-org/inv_sig_helper, using dukpy to execute nsig js decryption code. - Add support to use visitorData from api response - Add support to use poToken by saving a json file as 'po_token_cache.txt': { "visitorData": "long_base64_visitorData_value", "poToken": "long_base64_poToken_value" } - More consistent request headers around several module: server.py, channel.py, and comments.py
Avoid removing dist-info directories so fake-useragent can start, since the module will call importlib.metadata on init. Also use HEAD instead of master during archive copying step to allow generate_release.py to run on branch other than master.
- use os.path.join to access visitorData.txt and signature timestamp cache file - remove visitorData.txt if file is more than 24h old. - try accessing responseContext and raise exception on KeyError
- use os.path.join to define visitorData.txt file - use OSError exception instead of blank exception
- remove flpc import - add re.escape to extracted function name before appending to list - remove debugging messages
remove flpc from requirements, since built-in python regex engine is good enough.
Return error message when n signature is None
Remove debugging messages during yt-api request
Only send visitor data header to specific google domains.
f50cae3
to
c7e1cac
Compare
Update ios innertube client context.
Update to require fake-useragent>=2.0.0
Update player version for innertube clients.
Emergency fix for player 3bb1f723 to fix both signature and `n` query parameter decryption. Credits to /yt-dlp team.
Fix wrong indentation in po_token conditinal statements.
Update for today: Currently experiencing 403 errors with Full video can only be played on |
Update for today: It seems that the cause of 1 minute playable stream is
Adding Btw, Any improvement suggestion is appreciated. |
Fix missing `import json` to load po_token_cache.txt when settings.use_po_token=True
The major change is the addition of
mweb
innertube client, which includes some refactoring of howbase_js
is handled and a new trick to decryptn
signature by extracting the relevant decryption code frombase_js
file, using technique similar toiv-org/inv_sig_helper
.This pull request also introduce three dependencies:
fake-useragent
to simplify user agent header creation of mobile and desktop browser,flpc
to parse nsig decryption regex, anddukpy
to execute the extracted nsig decryption code.Also add the ability to parse
visitorData
in the YT Api response and specifying ownvisitorData
andpoToken
pair using properly formatted json in thedata/po_token_cache.txt
file.Also several more fixes for
android
andios
client and makeinnertube
client selectable.Also several changes in
settings.py
, notably to allow reloading oftv_embedded
client in case of missing player urls and showingDownload
placeholder viause_video_download
option, which credits~heckyel/yt-local
.This will hopefully fix #218