Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

修改现有搜狐路由规则 #13028

Closed
1 task done
sasasqt opened this issue Aug 14, 2023 · 3 comments
Closed
1 task done

修改现有搜狐路由规则 #13028

sasasqt opened this issue Aug 14, 2023 · 3 comments
Labels
no archive RSS is not a Web Archive RSS enhancement New feature or request to existing RSS

Comments

@sasasqt
Copy link

sasasqt commented Aug 14, 2023

这是一个什么样的功能?

https://v2.sohu.com/author-page-api/author-articles/pc/220095?pNo=3&columnId=

修改现有mp.js

const authorArticleAPI = `https://v2.sohu.com/author-page-api/author-articles/pc/${id}`;

const list = response.data.data.pcArticleVOS.splice(0, 10);

以便支持多页, 比如:
修改

/sohu/mp/:id

/sohu/mp/:id/:page

page为3 即抓取

https://v2.sohu.com/author-page-api/author-articles/pc/220095?pNo=1&columnId=
https://v2.sohu.com/author-page-api/author-articles/pc/220095?pNo=2&columnId=
https://v2.sohu.com/author-page-api/author-articles/pc/220095?pNo=3&columnId=
每页所有(20)行的内容

这个功能可以解决什么问题?

目前搜狐只抓第一页前10条内容

额外描述

@HenryQW

先提前感谢作者

这不是重复的功能请求和 RSS 提案

@sasasqt sasasqt added the RSS enhancement New feature or request to existing RSS label Aug 14, 2023
@TonyRL
Copy link
Collaborator

TonyRL commented Nov 14, 2023

很多路由上限10条,无法增大。

  1. 消耗大量服务器资源。
  2. 容易被反爬虫。
  3. 绝大部分过时 feed 没有意义。

Originally posted by HenryQW in #572 (comment)

RSS 是用来获取更新,增大条目无意义

特殊需求请在相应模块里自行修改

Originally posted by DIYgod in #572 (comment)

rss 就是用来跟踪内容源的更新内容的,采集全部内容不是rsshub要做的,建议你去找一个爬虫

Originally posted by yefoenix in #7291 (comment)

You should not expect RSSHub to retrospectively retrieve every feed item.

Originally posted by HenryQW in #9633 (comment)

Do not expect it can provide you everything that happens since the stone age.

Originally posted by TonyRL in #9633 (comment)

RSS 强调获取内容更新。若有“一次性获取全部条目”的需求,应设计针对该内容的爬虫。

Originally posted by nczitzk in #9964 (comment)

@TonyRL TonyRL closed this as completed Nov 14, 2023
@TonyRL TonyRL closed this as not planned Won't fix, can't repro, duplicate, stale Nov 14, 2023
@sasasqt
Copy link
Author

sasasqt commented Nov 14, 2023

很多路由上限10条,无法增大。

  1. 消耗大量服务器资源。
  2. 容易被反爬虫。
  3. 绝大部分过时 feed 没有意义。

Originally posted by HenryQW in #572 (comment)

RSS 是用来获取更新,增大条目无意义
特殊需求请在相应模块里自行修改

Originally posted by DIYgod in #572 (comment)

rss 就是用来跟踪内容源的更新内容的,采集全部内容不是rsshub要做的,建议你去找一个爬虫

Originally posted by yefoenix in #7291 (comment)

You should not expect RSSHub to retrospectively retrieve every feed item.

Originally posted by HenryQW in #9633 (comment)

Do not expect it can provide you everything that happens since the stone age.

Originally posted by TonyRL in #9633 (comment)

RSS 强调获取内容更新。若有“一次性获取全部条目”的需求,应设计针对该内容的爬虫。

Originally posted by nczitzk in #9964 (comment)

rss 就是用来跟踪内容源的更新内容的
RSS 强调获取内容更新。若有“一次性获取全部条目”的需求,应设计针对该内容的爬虫。

写死只获取10条会导致如果文章跟新频率高于rsshub抓取频率时漏抓文章,导致部分增量文章/更新内容遗失
比如https://v2.sohu.com/author-page-api/author-articles/pc/260616?pNo=1&columnId=

image
在半小时内(19:44-20:14)更新了10+(12)条

@TonyRL

@TonyRL
Copy link
Collaborator

TonyRL commented Nov 14, 2023

RSSHub will not visit any sites proactively. The frequency of a feed update depends on your RSS reader and the value of CACHE_EXPIRE from the associated RSSHub instance.

@TonyRL TonyRL added the no archive RSS is not a Web Archive label Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no archive RSS is not a Web Archive RSS enhancement New feature or request to existing RSS
Projects
None yet
Development

No branches or pull requests

2 participants