-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/seasearch add wiki search sup #366
Conversation
09f88bf
to
9f1909d
Compare
254f4eb
to
444663d
Compare
@@ -159,6 +159,8 @@ def search_files(self, repos, keyword, start=0, size=10, suffixes=None, search_p | |||
bulk_search_params.append(data) | |||
search_path = None | |||
|
|||
|
|||
logger.debug('search in repo_filename_index params: %s', json.dumps(bulk_search_params)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个去掉
wiki_index.delete_index_by_index_name(wiki_index_name) | ||
wiki_status_index.delete_documents_by_repo(wiki_id) | ||
|
||
def keyword_search(self, query, repos, repo_filename_index, count, suffixes=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
上面有一个 keyword_search 了
seasearch/index_store/wiki_index.py
Outdated
def check_index(self, index_name): | ||
return self.seasearch_api.check_index_mapping(index_name).get('is_exist') | ||
|
||
def query_data_by_doc_uuid(self, index_name, doc_uuids_list, start, size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改成 query_data_by_doc_uuids
seasearch/index_store/wiki_index.py
Outdated
|
||
|
||
SEASEARCH_WIKI_BULK_OPETATE_LIMIT = 25 | ||
SEASEARCH_WIKI_QUERY_DOC_UUID_STEP = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的数值设置的太小了吧
app/app.py
Outdated
@@ -8,8 +8,9 @@ | |||
from seafevents.repo_metadata.index_worker import RepoMetadataIndexWorker | |||
from seafevents.repo_metadata.slow_task_handler import SlowTaskHandler | |||
from seafevents.seafevent_server.seafevent_server import SeafEventServer | |||
from seafevents.app.config import ENABLE_METADATA_MANAGEMENT | |||
from seafevents.app.config import ENABLE_METADATA_MANAGEMENT, ENABLE_WIKI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是老版wiki的配置
seasearch/utils/constants.py
Outdated
WIKI_INDEX_PREFIX = 'wiki_' | ||
|
||
SEASEARCH_QUERY_DOC_UUID_STEP = 20 | ||
SEASEARCH_BULK_OPETATE_LIMIT = 25 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个常量没用吧
seasearch/utils/__init__.py
Outdated
@@ -99,3 +100,26 @@ def need_index_metadata_info(repo_id, session): | |||
return False | |||
|
|||
return True | |||
|
|||
|
|||
def is_wiki(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个命名改一下吧,这里是判断是否是的wiki中的文件,不是这个文件是不是wiki
seasearch/index_store/wiki_index.py
Outdated
doc_uuids = [page['docUuid'] for page in config['pages'] if page['id'] in navigation_ids] | ||
return doc_uuids | ||
|
||
def extract_deleted_doc_uuids(self, config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个和上面的extract_doc_uuids 是不是可以放到一个方法里面处理,更合适?
seasearch/index_store/wiki_index.py
Outdated
def get_wiki_conf(self, wiki_id): | ||
# Get wiki config dict | ||
conf_path = posixpath.join(WIKI_CONFIG_PATH, WIKI_CONFIG_FILE_NAME) | ||
conf_id = seafile_api.get_file_id_by_path(wiki_id, conf_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不应该叫conf_id吧
|
||
if wiki_status.need_recovery(): | ||
logger.warning('%s: wiki index inrecovery', wiki_id) | ||
wiki_index.update(index_name, wiki_id, commit_id, to_commit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
按照现在的逻辑这里还能正常recovery吗?
@@ -35,9 +39,6 @@ class RepoStatusIndex(object): | |||
'updatingto': { | |||
'type': 'keyword' | |||
}, | |||
'metadata_updated_time': { | |||
'type': 'keyword' | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里把之前加的属性给删了,别的功能无法工作了
08447c9
to
7e21984
Compare
index_local.run() | ||
|
||
logger.info('\n\nWiki index updated, statistic report:\n') | ||
logger.info('[commit read] %s', commit_mgr.read_count()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其他几项统计为什么去掉了
seasearch/index_store/wiki_index.py
Outdated
conf = self.get_wiki_conf(wiki_id) | ||
|
||
doc_uuids = self.extract_doc_uuids(conf) | ||
deleted_doc_uuids = self.extract_doc_uuids(conf, deleted=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc_uuids 和 deleted_doc_uuids 可以一次性通过extract_doc_uuids 获得,否则相同的提取操作还有执行两次
seasearch/script/update.lock
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件删掉
seasearch/index_store/wiki_index.py
Outdated
|
||
return content.strip() | ||
|
||
def get_wiki_conf(self, wiki_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个需要改成通过commit_id 获取config,否则逻辑上是不对的
seafevent_server/request_handler.py
Outdated
|
||
|
||
@app.route('/wiki-search', methods=['POST']) | ||
def search_wikis(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里应该改成单数形式,其他的相应的也要修改
seasearch/index_store/wiki_index.py
Outdated
get_library_diff_files(wiki_id, old_commit_id, new_commit_id) | ||
|
||
conf = self.get_wiki_conf(wiki_id, new_commit_id) | ||
if conf is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是通过处理异常得到的None, 这个异常不应该处理,否则程序会认为这次索引已经更新好了
seasearch/index_store/wiki_index.py
Outdated
if deleted_doc_uuids: | ||
delete_documents(deleted_doc_uuids) | ||
|
||
def normal_search(self, index_name, dsl): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个用不到吧
seasearch/index_store/wiki_index.py
Outdated
title_match.append(r_t) | ||
|
||
# Search in wiki name | ||
name_match = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这部分去掉
seasearch/index_store/wiki_index.py
Outdated
|
||
# Search in wiki title | ||
title_match = [] | ||
for doc_uuid, title, wiki_id in title_info: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要加wiki_title吗,而且这样加的话,会造成搜索结果中有两条相同的wiki页面
seasearch/index_store/wiki_index.py
Outdated
title_info.append((page_uuid, page["name"], wiki)) | ||
|
||
# Get wiki name | ||
wiki = seafile_api.get_repo(wiki) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个还有用吗
seasearch/index_store/wiki_index.py
Outdated
if bulk_add_params: | ||
self.seasearch_api.bulk(index_name, bulk_add_params) | ||
|
||
def delete_files(self, index_name, files, deleted_doc_uuids): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个看下filename 索引中怎么实现的,不需要再查出id了
seasearch/index_store/wiki_index.py
Outdated
doc_uuids, deleted_doc_uuids = self.extract_doc_uuids(conf) | ||
|
||
need_deleted_files = deleted_files + modified_files | ||
self.delete_files(index_name, need_deleted_files, deleted_doc_uuids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不需要删除modified_files吧
seasearch/index_store/wiki_index.py
Outdated
else: | ||
continue | ||
|
||
index_info = {'index': {'_index': index_name, '_id': md5(path)}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_id 换成 doc_uuid 吧
seasearch/index_store/wiki_index.py
Outdated
'doc_uuid':{ | ||
'type': 'keyword', | ||
}, | ||
'type':{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个有用吗?我看你设置了content一种类型啊
seasearch/index_store/wiki_index.py
Outdated
if highlight_content := hit.get('highlight').get('content', [None])[0]: | ||
r.update(content=highlight_content) | ||
content_match.append(r) | ||
content_match = sorted(content_match, key=lambda row: row['score'], reverse=True)[:size] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里还需要排序吗? content_match 这个变量名也换一下
cba51d5
to
7e34e24
Compare
seasearch/index_store/wiki_index.py
Outdated
|
||
need_added_files = added_files + modified_files | ||
|
||
recently_restore_uuid_path = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改成 recently_restore_uuid_to_path
seasearch/index_store/wiki_index.py
Outdated
old_cfg = self.get_wiki_conf(wiki_id, old_commit_id) | ||
new_cfg = self.get_wiki_conf(wiki_id, new_commit_id) | ||
prev_path, prev_recycled = self.get_uuid_path_mapping(old_cfg) | ||
curr_path, curr_recycled = self.get_uuid_path_mapping(new_cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
变量至少要是个名词吧?这里都应该是复数形式吧
0dc90b3
to
99320a3
Compare
4a225ae
to
736a84f
Compare
No description provided.