Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

有bug Spider_XHS/xhs_utils/xhs_util.py #17

Open
j1nse opened this issue Jan 8, 2024 · 6 comments
Open

有bug Spider_XHS/xhs_utils/xhs_util.py #17

j1nse opened this issue Jan 8, 2024 · 6 comments

Comments

@j1nse
Copy link

j1nse commented Jan 8, 2024

File "/home/xxxx/tmp/Spider_XHS/xhs_utils/xhs_util.py", line 74, in handle_profile_info
info = re.findall(r'<script>window.INITIAL_STATE=(.*?)</script>', html_text)[0]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

@cv-cat
Copy link
Owner

cv-cat commented Jan 9, 2024

这个错误第一次遇到,可以加我vx或者把运行过程详细说一下

@j1nse
Copy link
Author

j1nse commented Jan 9, 2024

这个错误第一次遇到,可以加我vx或者把运行过程详细说一下

首先会卡住,手动结束后会这个
^CTraceback (most recent call last): File "/home/xxxx/tmp/Spider_XHS/home.py", line 95, in <module> home.main(url_list) File "/home/xxxx/tmp/Spider_XHS/home.py", line 83, in main self.save_all_note_info(url) File "/home/xxxx/tmp/Spider_XHS/home.py", line 49, in save_all_note_info profile = self.profile.save_profile_info(url) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/tmp/Spider_XHS/profile.py", line 22, in save_profile_info profile = self.get_profile_info(url) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/tmp/Spider_XHS/profile.py", line 15, in get_profile_info response = requests.get(url, headers=headers, cookies=self.cookies) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/api.py", line 73, in get return request("get", url, params=params, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( ^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 715, in urlopen httplib_response = self._make_request( ^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 404, in _make_request self._validate_conn(conn) File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1058, in _validate_conn conn.connect() File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connection.py", line 363, in connect self.sock = conn = self._new_conn() ^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connection.py", line 174, in _new_conn conn = connection.create_connection( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection sock.connect(sa) KeyboardInterrupt
第二次运行就会上面那样
cookie有效 Traceback (most recent call last): File "/home/xxxx/tmp/Spider_XHS/home.py", line 83, in main self.save_all_note_info(url) File "/home/xxxx/tmp/Spider_XHS/home.py", line 49, in save_all_note_info profile = self.profile.save_profile_info(url) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/tmp/Spider_XHS/profile.py", line 22, in save_profile_info profile = self.get_profile_info(url) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/tmp/Spider_XHS/profile.py", line 18, in get_profile_info profile = handle_profile_info(userId, html_text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/tmp/Spider_XHS/xhs_utils/xhs_util.py", line 74, in handle_profile_info info = re.findall(r'<script>window.__INITIAL_STATE__=(.*?)</script>', html_text)[0] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^ IndexError: list index out of range 用户 https://www.xiaohongshu.com/user/profile/5d024990000000001602b9b8 查询失败None
环境是wsl2,Ubuntu22。
我怀疑是第一次被ban,第二次连接就啥都不返回了

@j1nse
Copy link
Author

j1nse commented Jan 9, 2024

^CTraceback (most recent call last):
File "/home/xxxx/tmp/Spider_XHS/home.py", line 95, in
home.main(url_list)
File "/home/xxxx/tmp/Spider_XHS/home.py", line 83, in main
self.save_all_note_info(url)
File "/home/xxxx/tmp/Spider_XHS/home.py", line 49, in save_all_note_info
profile = self.profile.save_profile_info(url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxxx/tmp/Spider_XHS/profile.py", line 22, in save_profile_info
profile = self.get_profile_info(url)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxxx/tmp/Spider_XHS/profile.py", line 15, in get_profile_info
response = requests.get(url, headers=headers, cookies=self.cookies)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/api.py", line 73, in get
return request("get", url, params=params, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 715, in urlopen
httplib_response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 404, in _make_request
self._validate_conn(conn)
File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1058, in _validate_conn
conn.connect()
File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connection.py", line 363, in connect
self.sock = conn = self._new_conn()
^^^^^^^^^^^^^^^^
File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
sock.connect(sa)
KeyboardInterrupt

@j1nse
Copy link
Author

j1nse commented Jan 9, 2024

cookie有效
Traceback (most recent call last):
File "/home/xxxx/tmp/Spider_XHS/home.py", line 83, in main
self.save_all_note_info(url)
File "/home/xxxx/tmp/Spider_XHS/home.py", line 49, in save_all_note_info
profile = self.profile.save_profile_info(url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxxx/tmp/Spider_XHS/profile.py", line 22, in save_profile_info
profile = self.get_profile_info(url)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxxx/tmp/Spider_XHS/profile.py", line 18, in get_profile_info
profile = handle_profile_info(userId, html_text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxxx/tmp/Spider_XHS/xhs_utils/xhs_util.py", line 74, in handle_profile_info
info = re.findall(r'<script>window.INITIAL_STATE=(.*?)</script>', html_text)[0]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
用户 https://www.xiaohongshu.com/user/profile/5d024990000000001602b9b8 查询失败None

@xuyao91
Copy link

xuyao91 commented Jan 18, 2024

我也遇到这么问题
cookie有效 Traceback (most recent call last): File "home.py", line 93, in <module> home.main(url_list) File "home.py", line 83, in main self.save_all_note_info(url) File "home.py", line 49, in save_all_note_info profile = self.profile.save_profile_info(url) File "/Users/xuyao/Workspaces/learns/Spider_XHS/profile.py", line 22, in save_profile_info profile = self.get_profile_info(url) File "/Users/xuyao/Workspaces/learns/Spider_XHS/profile.py", line 18, in get_profile_info profile = handle_profile_info(userId, html_text) File "/Users/xuyao/Workspaces/learns/Spider_XHS/xhs_utils/xhs_util.py", line 74, in handle_profile_info info = re.findall(r'<script>window.__INITIAL_STATE__=(.*?)</script>', html_text)[0] IndexError: list index out of range
mac环境,python 3.8

@xuyao91
Copy link

xuyao91 commented Jan 18, 2024

我也遇到这么问题 cookie有效 Traceback (most recent call last): File "home.py", line 93, in <module> home.main(url_list) File "home.py", line 83, in main self.save_all_note_info(url) File "home.py", line 49, in save_all_note_info profile = self.profile.save_profile_info(url) File "/Users/xuyao/Workspaces/learns/Spider_XHS/profile.py", line 22, in save_profile_info profile = self.get_profile_info(url) File "/Users/xuyao/Workspaces/learns/Spider_XHS/profile.py", line 18, in get_profile_info profile = handle_profile_info(userId, html_text) File "/Users/xuyao/Workspaces/learns/Spider_XHS/xhs_utils/xhs_util.py", line 74, in handle_profile_info info = re.findall(r'<script>window.__INITIAL_STATE__=(.*?)</script>', html_text)[0] IndexError: list index out of range mac环境,python 3.8

我把他的返回结果打印了一下,核心内容如下:
<body><div id="app"></div><script>function vue3Check(){void 0===window.Proxy&&alert("您当前系统版本过低,请升级后再试")}vue3Check()</script></body>
直接返回系统版本过低,是不是被ban了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants