Skip to content
This repository has been archived by the owner on Mar 12, 2024. It is now read-only.

Commit

Permalink
Use random ua for every request. (#54)
Browse files Browse the repository at this point in the history
* Use random ua for every request.

* modify ua.csv location error
  • Loading branch information
puppylpg authored Dec 11, 2020
1 parent 7a49969 commit 4dfac6d
Show file tree
Hide file tree
Showing 6 changed files with 9,067 additions and 18 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,3 +116,7 @@
* 功能
- 增加`requirements.txt`
- readme增加关于使用uu加速器的doc;

## 3.9.0(2020-12-11)
* 功能
- 每一次请求都随机选用user agent,也许会对封禁有所帮助;
33 changes: 31 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,34 @@
走路草,白天沉睡,夜晚潜行,我来过,并将信息镌刻在深深的记忆里。

## Aim
# Aim
To crawl csgo skin from `buff.163.com`.
If there is no data available, crawl from the website, then analyse data from local pandas DataFrame to avoid more crawling behavior.

> **First Rule: BE GOOD AND TRY TO FOLLOW A WEBSITE’S CRAWLING POLICIES. Don't crawl the website with a high frequency!**
# 免责声明
1. 滥用爬虫有被封号的风险;
1. 禁止恶意大量爬取buff数据,否则由此造成的责任自负;
1. 禁止将该爬虫或爬虫获取到的数据用作商业用途,否则由此造成的责任自负;
1. 该爬虫为兴趣使然,不收取任何费用,也没有任何恶意代码。如果还是出了问题,我们可以协商改进代码,但由此造成的后果(比如账号被封),我们也无能为力;

# 如何防止账号被封禁
该爬虫只是为了买个性价比最高的饰品卖到steam里换个买游戏钱,所以本质上不会被经常使用,也不需要爬取太多数据。

总体原则是“缓慢”地获取“少量”数据。这里有一些建议,通过修改配置`config.ini`,能帮助你更合理地使用oddish:

1. 缓慢:尽量调大配置里的`frequency_interval_low``frequency_interval_high`,爬取时间间隔越长,爬得越慢,越安全;
1. 少量:
1. 尽量使用配置里的`category_white_list`限定爬取饰品的类别,类别越少,要爬的数据量越小;
1. 尽量使用配置里的`category_black_list`限定爬取饰品的类别,类别越多,要爬的数据量越小;
1. 尽量使用配置里的`crawl_min_price_item``crawl_max_price_item`缩小要爬去饰品的价格区间,区间越小,要爬的数据量越小;
1. 不要经常使用。就我本人来讲,一月能用一次就不错了……csgo出个大行动,或者有新游戏发行,才有使用oddish的场景。天天爬,一天爬很多次的同学,有那么多游戏要买吗……

- 如果你对该爬虫的代码实现感兴趣,欢迎学习交流;
- 如果你想买个饰品卖到steam换个游戏钱,欢迎使用;
- 如果你想倒(往steam里有毛好倒的?倒完卖余额吗???),gun (ノ`Д)ノ

# 我要如何使用
## 视频教程
最直白的方式,就是再跟着视频一步步来:[oddish纯小白使用教程](https://www.bilibili.com/video/BV1ET4y1w7e1/)
Expand All @@ -27,7 +49,7 @@ If there is no data available, crawl from the website, then analyse data from lo

## 启动前必看
### 警告
**警告:由于现在buff有反爬机制,爬的过频繁会账号冷却。目前程序配置的是2-4s爬取一次。可自行在配置文件`config.ini`里配置间隔时间,但为了您的账号安全,程序无论如何都不会以小于2s的间隔爬数据**
**警告:由于现在buff有反爬机制,爬的过频繁会账号冷却。目前程序配置的是4-8s爬取一次。可自行在配置文件`config.ini`里配置间隔时间,但为了您的账号安全,程序无论如何都不会以小于4s的间隔爬数据**

如果还不放心,建议时间间隔再调大一些。当然,调的越大,爬得越慢。所以建议同时使用配置里的黑白名单缩小饰品爬取范围,
减少没必要的爬取。
Expand Down Expand Up @@ -72,6 +94,13 @@ steam_cookie = timezoneOffset=28800,0; steamMachineAuth76561198093333055=649A9B5
如果不关心过程,只查看分析结果即可。

## 依赖
如果只会人肉安装,就安装以下依赖:
- python: 3.8.5
- pandas: 1.1.0
- numpy: 1.19.1
- requests: 2.24.0

如果懂pip,直接用以下命令安装:
> pip install -r requirements.txt
# 按照需求自定义配置
Expand Down
12 changes: 6 additions & 6 deletions config/config.ini
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,17 @@
# 网页登录buff和steam后,把浏览器的cookie贴到这里,才能运行。推荐从Chrome中复制,下面是两条示例cookie
buff_cookie = _ga=GA1.2.162602080.1551374933; _ntes_nuid=8ce0cf6bdce55512e73f49cb8a49960e; mail_psc_fingerprint=d80ec72871726e9b192181fd1a3633d6; OUTFOX_SEARCH_USER_ID_NCOO=29659292.15961449; Device-Id=33u998YqmNWbhH5GbWUo; vjuids=369cb7d82.170e16a9519.0.3eb2c52902997; vjlast=1584329824.1584329824.30; _ntes_nnid=8ce0cf6bdce55512e73f49cb8a49960e,1584329823520; vinfo_n_f_l_n3=d81bf3a25989eb31.1.4.1561837557589.1576393349946.1585037711031; NTES_CMT_USER_INFO=305053074%7C%E6%9C%89%E6%80%81%E5%BA%A6%E7%BD%91%E5%8F%8B0ibHSi%7Chttp%3A%2F%2Fcms-bucket.nosdn.127.net%2F2018%2F08%2F13%2F078ea9f65d954410b62a52ac773875a1.jpeg%7Cfalse%7CeWQuNzU3YTdkZjAwZWNiNDJlOGJAMTYzLmNvbQ%3D%3D; [email protected]:-1:1; __oc_uuid=ed078220-12cd-11eb-8ff5-199a4d2b4ac4; Locale-Supported=zh-Hans; game=csgo; _gid=GA1.2.648285736.1605190175; _gat_gtag_UA_109989484_1=1; NTES_YD_SESS=SN.CH9UV_zHPlqCiCLvgBOoLTrvc2fBGRyieqBhbqAP1HglxHydKU4DQmq5B7At06lgkSmM_II0j06AJnuMWYnpdtYe8PPxUJMsM4X5yH3jBY3xJdC_d59nM8A1bksgKL51SSXhh3Rbd4SeDy6ZIwse2MUjzElPeLdPKBaMoZafPdtNUF9E67TduT0krt3r6_s46hz3dnGE.y20NruVavQP3kETqQCAqK6iZ3b0Nc6tJw; S_INFO=1605190207|0|3&80##|2051; P_INFO=2051|1605190207|1|netease_buff|00&99|bej&1602387932&netease_buff#bej&null#10#0#0|&0|null|2051; remember_me=U1094050600|T3zeeLJIc6y9kVtTTAGV0mdqvIXDpeX0; session=1-WfP1TH9yGjtZniGRmbfFSezTOMS-ZeYguhJFzDIT5Fem2046524528; csrf_token=ImFjMWE4YTc4MDFkMTAyZjYyYWZhZWVhYzllZGFlNTJiZjc1NWE1MDEi.Eo7TwA.f39WinRhrzJgSTG4as2EjhD6za0
steam_cookie = ActListPageSize=100; steamMachineAuth76561198251761676=B89D7B0897180E54C9F2E93F8AAFA4583CAADE7D; timezoneOffset=28800,0; _ga=GA1.2.1902489943.1551205764; steamMachineAuth76561198874249759=E46DCE6095514E3D489CAF1E7CBC3F9F8CD3ACC6; browserid=1066728544083117486; recentlyVisitedAppHubs=271590%2C80%2C730; Steam_Language=english; steamCountry=US%7C4705a9aaf22f908f9e4452081abd865a; sessionid=56b51232f9f3936a0ebbf88d; _gid=GA1.2.1847664544.1605190173; steamLoginSecure=76561198251761676%7C%7CE4B6E3BBDD5AF069692D8C8A56755ECBB34ECC68; steamRememberLogin=76561198251761676%7C%7Ca5e43585d1cd13db87c3d856d7676178; webTradeEligibility=%7B%22allowed%22%3A1%2C%22allowed_at_time%22%3A0%2C%22steamguard_required_days%22%3A15%2C%22new_device_cooldown_days%22%3A7%2C%22time_checked%22%3A1605190194%7D
# 提供一个代理来访问Steam社区市场
# 提供一个代理来访问Steam社区市场。如果不需要代理即可直接访问市场,直接留空即可
proxy = socks5://127.0.0.1:10808

# 控制程序行为
[BEHAVIOR]
# 警告:鉴于buff现在爬得太多会账号冷却一段时间,建议搞大一点!!!
# 爬取间隔调大之后,爬的速度会很慢,建议使用下面的category_white_list/category_black_list缩小目标饰品范围
# 爬取间隔下限:2s
frequency_interval_low=2
# 爬取间隔上限:4s。即:每2-4s爬取一次
frequency_interval_high=4
# 爬取间隔下限:4s
frequency_interval_low=4
# 爬取间隔上限:8s。即:每4-8s爬取一次
frequency_interval_high=8
# 重新爬取已缓存文件的小时间隔
url_cache_hour = 6
# 无视缓存爬取数据
Expand All @@ -35,7 +35,7 @@ crawl_max_price_item = 160
min_sold_threshold = 70
# 爬取类别白名单,如只想爬取AK和M4(A1 & A4),则设置为:["weapon_ak47", "weapon_m4a1", "weapon_m4a1_silencer"]
# 具体类别参考`config/reference/category.md`,详见README
# 黑白名单均支持通配符匹配,如'weapon_knife*'等,更多用法请搜索 "Shell 通配符"
# 黑白名单均支持通配符匹配,如'weapon_knife*'等,更多用法请搜索 "Shell 通配符",不懂也无所谓
category_white_list = []
# 爬取类别黑名单。如果黑名单白名单同时存在,白名单优先级更高
# 默认的黑名单加了以下内容,排除掉乱七八糟的武器箱音乐盒印花探员之类的,刀也排除掉了(不会真有人steam里卖刀吧:D)
Expand Down
Loading

0 comments on commit 4dfac6d

Please sign in to comment.