Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR:Error occurred while extracting data: Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/magic_html/extractors/weixin_extractor.py", line 29, in extract body_tree = tree.xpath('.//*[@id="img-content"]')[0] #21

Open
totorofly opened this issue Jan 13, 2025 · 2 comments

Comments

@totorofly
Copy link

totorofly commented Jan 13, 2025

有篇公众号文章报错:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/magic_html/extractors/weixin_extractor.py", line 29, in extract
    body_tree = tree.xpath('.//*[@id="img-content"]')[0]
IndexError: list index out of range
```。

链接是:https://mp.weixin.qq.com/s?scene=1&__biz=MzA3MzA5MTU4NA==&mid=2247519770&idx=1&sn=1861a3d46e41c487b39f491e3cc35877&sharer_shareinfo_first=b0d0484c2623d4519f415968ce8e931c&sharer_shareinfo=b0d0484c2623d4519f415968ce8e931c&from=groupmessage&isappinstalled=0&clicktime=1736731317&enterid=1736731317&ascene=1&devicetype=iOS18.1.1&version=1800363a&nettype=WIFI&abtest_cookie=AAACAA%3D%3D&lang=zh_CN&countrycode=CN&fontScale=100&exportkey=n_ChQIAhIQgWOm%2FmvUr3W%2FO%2FDUgtpEFxL2AQIE97dBBAEAAAAAAFxRMT9axk0AAAAOpnltbLcz9gKNyK89dVj0wTDQWycK72s8ROkE8OW71helCpNEdNER9xVzCD6L%2FkaIHTFf9Iqtc5hzR1tM%2BGJe40rahZ6vEWpIELZGaCFElXW1aXxARWTGqteXNk5gcM%2BbJRMk2Sm2XxoapQLmt51xZSc0LXNXpyizkFvbimPZWLPr1twY642dBBnNM%2FqG4unzLljPVNbYrZeb1KEJYUGE9R%2BqP0qggmfl%2BuwTN8%2BaP7NI9P%2FwrGCfMLeoMMAH6YCM2x880pfYVzc3Kt9MXrz3QfCZbeSqUT2hVyeHF%2FMHfg%3D%3D&pass_ticket=MHZTVtiHlD9Y5lyrrsNlsO0HGkFqZAAx22OnAXv2xUl49qzEYpyElGbRoPzhJn3d&wx_header=3



@sixgad
Copy link
Collaborator

sixgad commented Jan 13, 2025

@totorofly 样式上有些区别,extractor内默认使用的xpath rule 是//[@id="img-content"],而此篇文章对应的xpath rule应该是//[@id="js_content"]。感谢反馈,后续版本会补充此条rule!

@totorofly
Copy link
Author

@totorofly 样式上有些区别,extractor内默认使用的xpath rule 是//[@id="img-content"],而此篇文章对应的xpath rule应该是//[@id="js_content"]。感谢反馈,后续版本会补充此条rule!

谢谢,期待新版本发布~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants