Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

文章列表无法正常获取,返回HTTP301 #1

Open
Edward-liang opened this issue Jan 1, 2019 · 3 comments
Open

文章列表无法正常获取,返回HTTP301 #1

Edward-liang opened this issue Jan 1, 2019 · 3 comments

Comments

@Edward-liang
Copy link

Edward-liang commented Jan 1, 2019

现在get_toutiao_news_byapi.py里使用的url是:
http://www.toutiao.com/api/pc/feed/?category=__all__&utm_source=toutiao&widen=1&max_behot_time=0&max_behot_time_tmp=0&tadrequire=true&as=A1B5D9F152FBC03&cp=59123B3CE0B3FE1
网上搜索结果显示301是资源改变位置了。
请问是api的url改动了吗?从哪里可以获取新的api接口呢 谢谢。

此外,该url直接访问是可以打开,返回json结果的。
开头是{"has_more": false, "message": "success"。
请问这个是不是爬虫的配置造成的,谢谢

@haibincoder
Copy link
Owner

这些接口都比较老,是16年开发的,后面很多都过期了,还有的xpath变了。

如果想获取新闻列表,

  1. 保底方案是使用selenium直接加载头条主页获取新闻列表
  2. https://toutiao.com/search_content/?offset=0&format=json&keyword=手机&autoload=true&count=20&cur_tab=1&from=search_tab 这个是通过关键字获取新闻的接口,应该还能用。

@Edward-liang
Copy link
Author

好的 我理解了 Thank you.

@honyxiao
Copy link

honyxiao commented Mar 7, 2019

@haibincoder 根据关键词搜索新闻的接口有变,截止2019-03-07测试有效,修改:
get_toutiao_news_bykeyword.py
url = 'https://www.toutiao.com/api/search/content/?aid=24&app_name=web_search&offset=0&format=json&keyword=' + keyword + '&autoload=true&count=20&en_qc=1&cur_tab=1&from=search_tab&pd=synthesis'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants