Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build(deps): bump twisted from 18.9.0 to 19.7.0 #2

Open
wants to merge 699 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
699 commits
Select commit Hold shift + click to select a range
4aca8fb
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie Apr 23, 2019
47511a6
Merge branches 'master' and 'master' of github.com:Jannchie/biliob
Jannchie Apr 23, 2019
2ba92f2
Merge branches 'master' and 'master' of github.com:Jannchie/biliob
Jannchie Apr 23, 2019
8218f91
Merge branches 'master' and 'master' of github.com:Jannchie/biliob
Jannchie Apr 23, 2019
2250ee7
Merge branch 'master' into dev
Jannchie Apr 27, 2019
17540d5
Merge branch 'master' into dev
Jannchie Apr 27, 2019
084ab10
Merge branch 'master' into dev
Jannchie Apr 27, 2019
043484c
base tracer task
Jannchie Apr 29, 2019
3cac612
base tracer task
Jannchie Apr 29, 2019
ff551d3
base tracer task
Jannchie Apr 29, 2019
91b26fd
feature: optimize the base task
Jannchie Apr 29, 2019
528548d
feature: optimize the base task
Jannchie Apr 29, 2019
cd206e5
feature: optimize the base task
Jannchie Apr 29, 2019
534d421
feature: finished base task for dectectiing whether the process alive
Jannchie Apr 29, 2019
c0560c9
feature: finished base task for dectectiing whether the process alive
Jannchie Apr 29, 2019
1ac951e
feature: finished base task for dectectiing whether the process alive
Jannchie Apr 29, 2019
8f1bd58
feature: upgrade user operation priority
Jannchie Apr 29, 2019
057cb1d
feature: upgrade user operation priority
Jannchie Apr 29, 2019
df5e772
feature: upgrade user operation priority
Jannchie Apr 29, 2019
8249368
feature: tracer send message to mongodb
Jannchie Apr 29, 2019
676a1e1
feature: tracer send message to mongodb
Jannchie Apr 29, 2019
55f2253
feature: tracer send message to mongodb
Jannchie Apr 29, 2019
42efd24
feature: add task trace
Jannchie Apr 29, 2019
5f04800
feature: add task trace
Jannchie Apr 29, 2019
33696c1
feature: add task trace
Jannchie Apr 29, 2019
e6c79ad
feature: redesign the generator of link to be crawled.
Jannchie Apr 29, 2019
ac5e11d
feature: redesign the generator of link to be crawled.
Jannchie Apr 29, 2019
10450a2
feature: redesign the generator of link to be crawled.
Jannchie Apr 29, 2019
51f3c63
feature: update subchannel2channel dict
Jannchie Apr 29, 2019
6a98fd3
feature: update subchannel2channel dict
Jannchie Apr 29, 2019
b706211
feature: update subchannel2channel dict
Jannchie Apr 29, 2019
7b607a4
refactor: spider
Jannchie Apr 29, 2019
919f7a2
refactor: spider
Jannchie Apr 29, 2019
102c924
refactor: spider
Jannchie Apr 29, 2019
8b7ecac
feature: add tracer to redis spider
Jannchie Apr 29, 2019
b0d86a6
fix: author rate not write to the Mongodb
Jannchie Apr 29, 2019
fa249d2
feature: add tracer to redis spider
Jannchie Apr 29, 2019
283642c
fix: author rate not write to the Mongodb
Jannchie Apr 29, 2019
667a254
feature: add tracer to redis spider
Jannchie Apr 29, 2019
919b5f7
fix: author rate not write to the Mongodb
Jannchie Apr 29, 2019
5a18ead
feature: update tracer
Jannchie Apr 29, 2019
90aa37f
feature: update tracer
Jannchie Apr 29, 2019
05ee6f7
feature: update tracer
Jannchie Apr 29, 2019
13da266
fix: remove zero data of author
Jannchie Apr 29, 2019
d18a3d4
fix: remove zero data of author
Jannchie Apr 29, 2019
a672773
fix: remove zero data of author
Jannchie Apr 29, 2019
51c85ad
update: start_spider_task shell script
Jannchie Apr 30, 2019
0f3d628
update: start_spider_task shell script
Jannchie Apr 30, 2019
dd326f1
update: start_spider_task shell script
Jannchie Apr 30, 2019
68842b0
featue: modify the tracer in the spider
Jannchie Apr 30, 2019
204f388
featue: modify the tracer in the spider
Jannchie Apr 30, 2019
a3cf3a5
featue: modify the tracer in the spider
Jannchie Apr 30, 2019
cae4c73
update: start spider task shell script
Jannchie Apr 30, 2019
66246c2
update: start spider task shell script
Jannchie Apr 30, 2019
ea83b3f
update: start spider task shell script
Jannchie Apr 30, 2019
2438dca
update: schedule
Jannchie Apr 30, 2019
61fa2a1
update: schedule
Jannchie Apr 30, 2019
b7bf0e4
update: schedule
Jannchie Apr 30, 2019
a16e609
update: start scheduler script
Jannchie Apr 30, 2019
ded1143
update: start scheduler script
Jannchie Apr 30, 2019
47bb90a
update: start scheduler script
Jannchie Apr 30, 2019
38dfa5a
Merge branch 'dev'
Jannchie Apr 30, 2019
f392e6a
Merge branch 'dev'
Jannchie Apr 30, 2019
96a0c91
Merge branch 'dev'
Jannchie Apr 30, 2019
d931216
delete: useless file
Jannchie Apr 30, 2019
43b5c35
delete: useless file
Jannchie Apr 30, 2019
31630de
delete: useless file
Jannchie Apr 30, 2019
f36dc20
update: git ignore
Jannchie May 1, 2019
faa0bfc
update: git ignore
Jannchie May 1, 2019
a359838
update: git ignore
Jannchie May 1, 2019
beed5f5
Merge branch 'master' into dev
Jannchie May 1, 2019
35209bd
Merge branch 'master' into dev
Jannchie May 1, 2019
c282071
Merge branch 'master' into dev
Jannchie May 1, 2019
1f7f393
update: requirement.txt
Jannchie May 1, 2019
73f1db7
update: requirement.txt
Jannchie May 1, 2019
566ce69
update: requirement.txt
Jannchie May 1, 2019
d317ae5
Merge branch 'master' of github.com:Jannchie/biliob-spider
Jannchie May 1, 2019
0257f33
Merge branch 'master' of github.com:Jannchie/biliob-spider
Jannchie May 1, 2019
1d6c53a
Merge branch 'master' of github.com:Jannchie/biliob-spider
Jannchie May 1, 2019
7ddcc54
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie May 1, 2019
285fe4e
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie May 1, 2019
2f197c7
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie May 1, 2019
08f552e
feature: design SpiderTask to record the number of crawling and faile…
Jannchie May 1, 2019
5781ecb
feature: design SpiderTask to record the number of crawling and faile…
Jannchie May 1, 2019
34786e1
feature: design SpiderTask to record the number of crawling and faile…
Jannchie May 1, 2019
968de62
Merge branch 'dev'
Jannchie May 1, 2019
dceea34
Merge branch 'dev'
Jannchie May 1, 2019
cff4abd
Merge branch 'dev'
Jannchie May 1, 2019
1b4adb9
hotfix: fix arguements number error
Jannchie May 1, 2019
69e842a
hotfix: fix arguements number error
Jannchie May 1, 2019
7e0d7ae
hotfix: fix arguements number error
Jannchie May 1, 2019
a1cfa5a
fix: upload SpiderTask data when update
Jannchie May 1, 2019
c907ca1
fix: upload SpiderTask data when update
Jannchie May 1, 2019
23376b6
fix: upload SpiderTask data when update
Jannchie May 1, 2019
63dc05e
fix: danmaku aggregate
Jannchie May 2, 2019
8391b25
fix: danmaku aggregate
Jannchie May 2, 2019
9a6d108
fix: danmaku aggregate
Jannchie May 2, 2019
8a4dd46
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie May 2, 2019
5c58880
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie May 2, 2019
0d09871
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie May 2, 2019
00e0761
add analytic
Jannchie May 4, 2019
1cd98f6
add analytic
Jannchie May 4, 2019
374114c
add analytic
Jannchie May 4, 2019
9e16fb6
fix: author not crawl
Jannchie May 4, 2019
19494c3
fix: author not crawl
Jannchie May 4, 2019
3c41688
fix: author not crawl
Jannchie May 4, 2019
4f8a508
fix: fans get zero
Jannchie May 7, 2019
956ea14
fix: fans get zero
Jannchie May 7, 2019
bf3a97b
fix: fans get zero
Jannchie May 7, 2019
ad44bab
update
Jannchie May 7, 2019
b52e6c6
update
Jannchie May 7, 2019
ccecde5
update
Jannchie May 7, 2019
8df0fb6
feature: add retry system for get user coin
Jannchie May 10, 2019
083b3f8
feature: add retry system for get user coin
Jannchie May 10, 2019
67fc895
feature: add retry system for get user coin
Jannchie May 10, 2019
cfc72d6
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie May 10, 2019
1ac6483
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie May 10, 2019
f51cfd3
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie May 10, 2019
170d2e0
fix: update fans data occured get zero error
Jannchie May 17, 2019
bd404b8
fix: update fans data occured get zero error
Jannchie May 17, 2019
918fffc
fix: update fans data occured get zero error
Jannchie May 17, 2019
8edb373
fix: exit tracer task when disconnect database
Jannchie May 18, 2019
093f13f
fix: exit tracer task when disconnect database
Jannchie May 18, 2019
91d1f16
fix: exit tracer task when disconnect database
Jannchie May 18, 2019
1d13f8f
fix: remove useless code
Jannchie May 18, 2019
f51be68
fix: remove useless code
Jannchie May 18, 2019
7cff95f
fix: remove useless code
Jannchie May 18, 2019
ced56ed
update: progress task
Jannchie May 20, 2019
08e56aa
update: progress task
Jannchie May 20, 2019
cedcdec
update: progress task
Jannchie May 20, 2019
794b3d0
update: progress task
Jannchie May 20, 2019
8e65c84
update: progress task
Jannchie May 20, 2019
d4b454d
update: progress task
Jannchie May 20, 2019
efc2d3e
feature: add analytic schedule
Jannchie May 21, 2019
4647ff6
feature: add analytic schedule
Jannchie May 21, 2019
21345d4
feature: add analytic schedule
Jannchie May 21, 2019
f9f8353
update: fans watcher
Jannchie May 25, 2019
2a7749b
update: fans watcher
Jannchie May 25, 2019
a1bda5c
update: fans watcher
Jannchie May 25, 2019
371554c
update: tracer tasks
Jannchie May 25, 2019
600d712
update: tracer tasks
Jannchie May 25, 2019
8c5defa
update: tracer tasks
Jannchie May 25, 2019
caf603e
feature: start_scheduler bash script
Jannchie May 25, 2019
a3687cc
feature: start_scheduler bash script
Jannchie May 25, 2019
c2035b7
feature: start_scheduler bash script
Jannchie May 25, 2019
00bf895
update: bash script
Jannchie May 25, 2019
7c4e5ef
update: bash script
Jannchie May 25, 2019
6ba42f8
update: bash script
Jannchie May 25, 2019
8f453ff
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie May 25, 2019
ca4cf43
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie May 25, 2019
5037864
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie May 25, 2019
8080a47
add ranking 188
Jannchie May 27, 2019
a30c861
add ranking 188
Jannchie May 27, 2019
e126519
add ranking 188
Jannchie May 27, 2019
accaa42
feature: tag spider
Jannchie May 27, 2019
a9b662d
feature: tag spider
Jannchie May 27, 2019
a58c019
feature: tag spider
Jannchie May 27, 2019
e9ddc8d
update: dict
Jannchie May 27, 2019
a9fddb2
update: dict
Jannchie May 27, 2019
a52dc6c
update: dict
Jannchie May 27, 2019
fb3bf95
Merge branch 'master' into dev
Jannchie May 30, 2019
011f5b6
Merge branch 'master' into dev
Jannchie May 30, 2019
c7d755f
Merge branch 'master' into dev
Jannchie May 30, 2019
a9eb44d
update: dict
Jannchie May 30, 2019
56d353e
update: dict
Jannchie May 30, 2019
bab756c
update: dict
Jannchie May 30, 2019
db23794
feature add tracer task
Jannchie May 30, 2019
75e9329
feature add tracer task
Jannchie May 30, 2019
d721629
feature add tracer task
Jannchie May 30, 2019
baf516b
enhance: tracer task
Jannchie May 30, 2019
aaca55f
enhance: tracer task
Jannchie May 30, 2019
64ea971
enhance: tracer task
Jannchie May 30, 2019
8cc367d
Merge branch 'master' of github.com:Jannchie/biliob-spider
Jannchie May 30, 2019
d0ac970
update deploy script
Jannchie May 31, 2019
cd70718
update deploy script
Jannchie May 31, 2019
45d10eb
update deploy script
Jannchie May 31, 2019
f6a2156
Merge branch 'dev'
Jannchie May 31, 2019
72fdbd6
Merge branch 'dev'
Jannchie May 31, 2019
ff1d5e4
Merge branch 'dev'
Jannchie May 31, 2019
1976daf
fix: tag adder spider
Jannchie May 31, 2019
7a045b6
fix: tag adder spider
Jannchie May 31, 2019
c2840c9
fix: tag adder spider
Jannchie May 31, 2019
900d7dc
update: restart system
Jannchie Jun 11, 2019
aaa46f4
update: restart system
Jannchie Jun 11, 2019
ccbc48f
update: video from kanbilibili
Jannchie Jul 6, 2019
a168a55
update dict
Jannchie Jul 6, 2019
c432e5e
update dict
Jannchie Jul 6, 2019
2aa58a3
update dict
Jannchie Jul 6, 2019
3527d5d
merge conflict
Jannchie Jul 6, 2019
9cdb137
merge conflict
Jannchie Jul 6, 2019
8575b04
merge conflict
Jannchie Jul 6, 2019
55f519f
fix confict
Jannchie Jul 6, 2019
09d985c
fix confict
Jannchie Jul 6, 2019
cbc8fd5
fix confict
Jannchie Jul 6, 2019
4370f71
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie Jul 6, 2019
b56bd8f
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie Jul 6, 2019
5ca2e47
merge confict
Jannchie Jul 6, 2019
ab697ed
merge confict
Jannchie Jul 6, 2019
54fc183
merge conflict
Jannchie Jul 6, 2019
36533be
merge conflict
Jannchie Jul 6, 2019
5d35f2a
merge config
Jannchie Jul 6, 2019
1305153
merge config
Jannchie Jul 6, 2019
62486ff
update api
Jannchie Jul 6, 2019
26f9ca7
update api
Jannchie Jul 6, 2019
db8d09f
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie Jul 8, 2019
05fbc63
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie Jul 8, 2019
7e85f76
update: from kan
Jannchie Jul 9, 2019
13691b1
update: from kan
Jannchie Jul 9, 2019
b8b3560
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie Jul 11, 2019
af972f8
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie Jul 11, 2019
2cc45c5
update: api limit
Jannchie Jul 11, 2019
577d47f
update: api limit
Jannchie Jul 11, 2019
034aa77
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie Jul 11, 2019
dcf97b4
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie Jul 11, 2019
9246fa5
update: git ignore
Jannchie Jul 11, 2019
817516f
update: git ignore
Jannchie Jul 11, 2019
d8cde1b
update: dict
Jannchie Aug 17, 2019
a016245
update: dict
Jannchie Aug 17, 2019
0cd08c9
update: get data
Jannchie Aug 17, 2019
6479c67
update: get data
Jannchie Aug 17, 2019
ef18cd3
Merge branch 'master' of github.com:Jannchie/biliob-spider
Jannchie Aug 17, 2019
1c0b30b
Merge branch 'master' of github.com:Jannchie/biliob-spider
Jannchie Aug 17, 2019
6002b43
update: get data script
Jannchie Sep 5, 2019
8d91b08
update: get data script
Jannchie Sep 5, 2019
6cf60fe
update: let author spider crawls quicker
Jannchie Sep 18, 2019
269c131
update: let author spider crawls quicker
Jannchie Sep 18, 2019
c6244fb
update: video spider
Jannchie Sep 29, 2019
2533d4e
update: video spider
Jannchie Sep 29, 2019
aad5de8
update: video analyzer
Jannchie Oct 1, 2019
dc6880a
update: video analyzer
Jannchie Oct 1, 2019
0020529
fix
Jannchie Oct 18, 2019
bc51249
fix
Jannchie Oct 18, 2019
e471a9e
update
Jannchie Oct 22, 2019
26ace32
update
Jannchie Oct 22, 2019
c3e875d
update: check
Jannchie Oct 23, 2019
d987b97
update: check
Jannchie Oct 23, 2019
0082b24
update: check
Jannchie Oct 23, 2019
27c6338
update: db connection
Jannchie Oct 24, 2019
07db288
fix: null object id bug
Jannchie Oct 24, 2019
5f06b1f
Merge branch 'master' of github.com:Jannchie/biliob-spider
Jannchie Oct 24, 2019
82457c0
Merge branch 'master' of github.com:Jannchie/biliob-spider
Jannchie Oct 24, 2019
eef16eb
Merge branch 'master' of github.com:Jannchie/biliob-spider
Jannchie Oct 24, 2019
6454131
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie Oct 24, 2019
eff40a7
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie Oct 24, 2019
ba2dbc9
update: event
Jannchie Oct 24, 2019
52c61d2
fix: redis connection
Jannchie Oct 24, 2019
aa49962
rm: db.py
Jannchie Oct 24, 2019
97fe502
fix: filter
Jannchie Oct 24, 2019
2841867
Merge branch 'master' of github.com:Jannchie/biliob
Jannchie Nov 14, 2019
224f71b
build(deps): bump twisted from 18.9.0 to 19.7.0
dependabot[bot] Nov 14, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
*.pyc
*.log
*.out
biliob_spider.log
debug.py
mail.py
get_data/color.py
get_data/face.py
.vscode/*
4 changes: 4 additions & 0 deletions 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
nohup: ignoring input
nohup: ignoring input
nohup: ignoring input
nohup: ignoring input
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
# BiliOB

BiliOB观测者是一个观测B站UP主及视频数据变化,并予以分析的Web应用程序。
Empty file added biliob_analyzer/__init__.py
Empty file.
3 changes: 3 additions & 0 deletions biliob_analyzer/add_credit.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from db import db
u = db['user']
u.update_many({}, {'$inc': {'credit': 50}})
11 changes: 11 additions & 0 deletions biliob_analyzer/add_focus.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from db import db
from pymongo import MongoClient
# 链接mongoDB

coll = db['author'] # 获得collection的句柄
docs = coll.find({'focus': {'$exists': False}}).batch_size(60)
for each_doc in docs:
if 'mid' in each_doc:
each_doc['focus'] = True
coll.update_one({'mid': each_doc['mid']}, {'$set': each_doc})
print('已修复mid' + str(each_doc['mid']))
11 changes: 11 additions & 0 deletions biliob_analyzer/add_focus_video.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from db import db
from pymongo import MongoClient
# 链接mongoDB

coll = db['video'] # 获得collection的句柄
docs = coll.find({'focus': {'$exists': False}}).batch_size(60)
for each_doc in docs:
if 'aid' in each_doc:
each_doc['focus'] = True
coll.update_one({'aid': each_doc['aid']}, {'$set': each_doc})
print('已修复aid' + str(each_doc['aid']))
156 changes: 156 additions & 0 deletions biliob_analyzer/add_keyword.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
from pymongo import ReturnDocument
import jieba
from db import db
from time import sleep
# 载入字典
from biliob_tracer.task import ProgressTask


class KeywordAdder():

def __init__(self):
self.mongo_author = db['author']
self.mongo_video = db['video']
self.mongo_word = db['search_word']
jieba.load_userdict('./biliob_analyzer/dict.txt')

def get_video_kw_list(self, aid):
# 关键字从name和official中提取
video = self.mongo_video.find_one(
{'aid': aid}, {'_id': 0, 'title': 1, 'channel': 1, 'subChannel': 1, 'author': 1, 'tag': 1})
kw = []
for each_key in video:
if each_key != 'keyword' or each_key != 'tag':
kw.append(str(video[each_key]).lower())
elif each_key == 'tag':
kw += video['tag']
else:
kw += video['keyword']
seg_list = jieba.lcut_for_search(
' '.join(kw), True) # 搜索引擎模式

# 全名算作关键字
if 'author' in video and video['author'].lower() not in seg_list:
seg_list.append(video['author'].lower())

while ' ' in seg_list:
seg_list.remove(' ')
while '、' in seg_list:
seg_list.remove('、')
return list(set(seg_list))

def add_to_video(self, aid, seg_list):
sleep(0.01)
self.mongo_video.update_one({'aid': aid}, {'$set': {
'keyword': seg_list
}})

def add_video_kw(self, aid):
self.add_to_video(aid, self.get_video_kw_list(aid))
return True

def get_author_kw_list(self, mid):
# 关键字从name和official中提取
author = self.mongo_author.find_one(
{'mid': mid}, {'_id': 0, 'name': 1, 'official': 1, 'keyword': 1})
kw = []
for each_key in author:
if each_key != 'keyword':
kw.append(str(author[each_key]).lower())
else:
kw += author['keyword']
seg_list = jieba.lcut_for_search(
' '.join(kw), True) # 搜索引擎模式

# 全名算作关键字
if 'name' in author and author['name'].lower() not in seg_list:
seg_list.append(author['name'].lower())

while ' ' in seg_list:
seg_list.remove(' ')
while '、' in seg_list:
seg_list.remove('、')
return list(set(seg_list))

def add_author_kw(self, mid):
self.add_to_author(mid, self.get_author_kw_list(mid))
return True

def add_to_author(self, mid, seg_list):
sleep(0.01)
self.mongo_author.update_one(
{'mid': mid}, {'$set': {'keyword': seg_list}})

def add_all_author(self):
authors = self.mongo_author.find(
{
'$or': [
{
'keyword': []
}, {
'keyword': {
'$exists': False
}
}
]
}, {'_id': 0, 'mid': 1}).batch_size(200)
for each_author in authors:
mid = each_author['mid']
self.add_author_kw(mid)

def add_all_video(self):
videos = self.mongo_video.find({
'$or': [
{
'keyword': []
}, {
'keyword': {
'$exists': False
}
}
]
}, {'_id': 0, 'aid': 1}).batch_size(200)
for each_video in videos:
aid = each_video['aid']
self.add_video_kw(aid)

def refresh_all_author(self):
authors = self.mongo_author.find(
{}, {'_id': 0, 'mid': 1}).batch_size(500)
for each_author in authors:
mid = each_author['mid']
print("[mid]"+str(mid))
self.add_author_kw(mid)

def refresh_all_video(self):
videos = self.mongo_video.find(
{}, {'_id': 0, 'aid': 1}).batch_size(500)
for each_video in videos:
aid = each_video['aid']
print("[aid]"+str(aid))
self.add_video_kw(aid)

def add_omitted(self):
total_value = self.mongo_word.count_documents({})
if self.mongo_word.count_documents({}) < 100:
return
t = ProgressTask("更新查询关键词字典", total_value=total_value,collection=db['tracer'])
d = open('./biliob_analyzer/dict.txt', 'r',
encoding='utf8').read().split('\n')
for each in self.mongo_word.find():
if 'aid' in each and each['aid'] not in d:
d.append(each['aid'])
elif 'mid' in each and each['mid'] not in d:
d.append(each['mid'])
t.current_value += 1
pass
t.finished = True
o = open('./biliob_analyzer/dict.txt',
'w', encoding='utf8', newline='')
for each in d:
o.write(each+'\n')
o.close()
self.mongo_word.delete_many({})
jieba.load_userdict('./biliob_analyzer/dict.txt')
self.refresh_all_author()
self.refresh_all_video()
69 changes: 69 additions & 0 deletions biliob_analyzer/author_analyzer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
from db import db
from pymongo import MongoClient
from datetime import datetime
from datetime import timedelta
import logging

logging.basicConfig(level=logging.INFO,
format='[%(asctime)s] %(levelname)s @ %(name)s: %(message)s')
logger = logging.getLogger(__name__)


class AuthorAnalyzer(object):
def __init__(self):
self.db = db # 获得数据库的句柄
self.coll = self.db['author'] # 获得collection的句柄

def author_filter(self):
pre_fans = -1
c_fans = -1
delta = timedelta(1)
pre_date = datetime
c_date = datetime
count_unfocus = 0
count_focus = 0
for each_doc in self.coll.find({'focus': True, 'cFans': {'lt': 50000}}):
flag_cool = 0
if 'data' in each_doc:
each_doc['data'].reverse()
for each_data in each_doc['data']:
if pre_fans == -1:
pre_fans = each_data['fans']
pre_date = each_data['datetime']
continue
c_fans = each_data['fans']
c_date = each_data['datetime']
if pre_date + delta > c_date:
continue
if abs(c_fans-pre_fans) < 100:
flag_cool += 1
else:
flag_cool = 0
pre_fans = c_fans
pre_date = c_date

# 连续30日日均涨粉小于100且粉丝数小于100000则不追踪
if flag_cool > 30 and each_data['fans'] < 100000:
focus = False
break
elif flag_cool > 15 and each_data['fans'] < 5000:
focus = False
break
elif flag_cool > 7 and each_data['fans'] < 1000:
focus = False
break
else:
focus = True

if focus:
count_focus += 1
else:
count_unfocus += 1
pre_fans = -1
c_fans = -1
logger.info("· 本轮筛选结果:")
logger.info("× 不再追踪总数:"+str(count_unfocus))
logger.info("√ 持续追踪总数:"+str(count_focus))

def fans_variation(self):
pass
Loading