-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
爬蟲爬超過 24 小時後,會發生靈異事件 #26
Comments
觀察:速度慢除了網站回應慢之外,另一個原因是爬蟲執行時有時候會拿不到下一個目標,導致平行程度很容易下降 |
觀察:有時候 request_ts 會被多塞 1~3 倍重複的 request XD |
原因之一: 當 DB 很忙,而且短時間內有 n 隻 可以介入的地方:
|
暫時的治標方法: 解決不了問題,解決製造問題的重複 request ,定期跑以下 sql ,把重複的 request 刪掉... delete from request_ts where id in (select id from (select min(id) as id, count(*) as n from request_ts group by year, month, day, (seed->>'house_id')) as t where n > 1); |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
請填知道的部份就好,不用全部都填~
問題
因為目前用 crontab 跑爬蟲,所以當爬蟲跑出過 24hr 時,就會同時跑好幾支起來
這個問題是關於什麼?
解法
治標
TBD
治本
TBD
既有資料修正
這個問題和既有的資料有關嗎?修正的步驟是什麼?
The text was updated successfully, but these errors were encountered: