Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/seasearch add wiki search sup #366

Merged
merged 1 commit into from
Nov 4, 2024

Conversation

cir9no
Copy link
Contributor

@cir9no cir9no commented Aug 18, 2024

No description provided.

@cir9no cir9no force-pushed the feat/seasearch-add-wiki-search-sup branch from 09f88bf to 9f1909d Compare August 26, 2024 07:34
@cir9no cir9no force-pushed the feat/seasearch-add-wiki-search-sup branch from 254f4eb to 444663d Compare September 25, 2024 07:16
@@ -159,6 +159,8 @@ def search_files(self, repos, keyword, start=0, size=10, suffixes=None, search_p
bulk_search_params.append(data)
search_path = None


logger.debug('search in repo_filename_index params: %s', json.dumps(bulk_search_params))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个去掉

wiki_index.delete_index_by_index_name(wiki_index_name)
wiki_status_index.delete_documents_by_repo(wiki_id)

def keyword_search(self, query, repos, repo_filename_index, count, suffixes=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上面有一个 keyword_search 了

def check_index(self, index_name):
return self.seasearch_api.check_index_mapping(index_name).get('is_exist')

def query_data_by_doc_uuid(self, index_name, doc_uuids_list, start, size):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成 query_data_by_doc_uuids



SEASEARCH_WIKI_BULK_OPETATE_LIMIT = 25
SEASEARCH_WIKI_QUERY_DOC_UUID_STEP = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的数值设置的太小了吧

app/app.py Outdated
@@ -8,8 +8,9 @@
from seafevents.repo_metadata.index_worker import RepoMetadataIndexWorker
from seafevents.repo_metadata.slow_task_handler import SlowTaskHandler
from seafevents.seafevent_server.seafevent_server import SeafEventServer
from seafevents.app.config import ENABLE_METADATA_MANAGEMENT
from seafevents.app.config import ENABLE_METADATA_MANAGEMENT, ENABLE_WIKI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是老版wiki的配置

WIKI_INDEX_PREFIX = 'wiki_'

SEASEARCH_QUERY_DOC_UUID_STEP = 20
SEASEARCH_BULK_OPETATE_LIMIT = 25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个常量没用吧

@@ -99,3 +100,26 @@ def need_index_metadata_info(repo_id, session):
return False

return True


def is_wiki(path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个命名改一下吧,这里是判断是否是的wiki中的文件,不是这个文件是不是wiki

doc_uuids = [page['docUuid'] for page in config['pages'] if page['id'] in navigation_ids]
return doc_uuids

def extract_deleted_doc_uuids(self, config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个和上面的extract_doc_uuids 是不是可以放到一个方法里面处理,更合适?

def get_wiki_conf(self, wiki_id):
# Get wiki config dict
conf_path = posixpath.join(WIKI_CONFIG_PATH, WIKI_CONFIG_FILE_NAME)
conf_id = seafile_api.get_file_id_by_path(wiki_id, conf_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里不应该叫conf_id吧


if wiki_status.need_recovery():
logger.warning('%s: wiki index inrecovery', wiki_id)
wiki_index.update(index_name, wiki_id, commit_id, to_commit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

按照现在的逻辑这里还能正常recovery吗?

@@ -35,9 +39,6 @@ class RepoStatusIndex(object):
'updatingto': {
'type': 'keyword'
},
'metadata_updated_time': {
'type': 'keyword'
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里把之前加的属性给删了,别的功能无法工作了

@cir9no cir9no force-pushed the feat/seasearch-add-wiki-search-sup branch 3 times, most recently from 08447c9 to 7e21984 Compare October 16, 2024 02:15
index_local.run()

logger.info('\n\nWiki index updated, statistic report:\n')
logger.info('[commit read] %s', commit_mgr.read_count())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其他几项统计为什么去掉了

conf = self.get_wiki_conf(wiki_id)

doc_uuids = self.extract_doc_uuids(conf)
deleted_doc_uuids = self.extract_doc_uuids(conf, deleted=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc_uuids 和 deleted_doc_uuids 可以一次性通过extract_doc_uuids 获得,否则相同的提取操作还有执行两次

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件删掉


return content.strip()

def get_wiki_conf(self, wiki_id):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个需要改成通过commit_id 获取config,否则逻辑上是不对的



@app.route('/wiki-search', methods=['POST'])
def search_wikis():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该改成单数形式,其他的相应的也要修改

get_library_diff_files(wiki_id, old_commit_id, new_commit_id)

conf = self.get_wiki_conf(wiki_id, new_commit_id)
if conf is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是通过处理异常得到的None, 这个异常不应该处理,否则程序会认为这次索引已经更新好了

if deleted_doc_uuids:
delete_documents(deleted_doc_uuids)

def normal_search(self, index_name, dsl):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个用不到吧

title_match.append(r_t)

# Search in wiki name
name_match = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这部分去掉


# Search in wiki title
title_match = []
for doc_uuid, title, wiki_id in title_info:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要加wiki_title吗,而且这样加的话,会造成搜索结果中有两条相同的wiki页面

title_info.append((page_uuid, page["name"], wiki))

# Get wiki name
wiki = seafile_api.get_repo(wiki)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个还有用吗

if bulk_add_params:
self.seasearch_api.bulk(index_name, bulk_add_params)

def delete_files(self, index_name, files, deleted_doc_uuids):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个看下filename 索引中怎么实现的,不需要再查出id了

doc_uuids, deleted_doc_uuids = self.extract_doc_uuids(conf)

need_deleted_files = deleted_files + modified_files
self.delete_files(index_name, need_deleted_files, deleted_doc_uuids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不需要删除modified_files吧

else:
continue

index_info = {'index': {'_index': index_name, '_id': md5(path)}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_id 换成 doc_uuid 吧

'doc_uuid':{
'type': 'keyword',
},
'type':{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个有用吗?我看你设置了content一种类型啊

if highlight_content := hit.get('highlight').get('content', [None])[0]:
r.update(content=highlight_content)
content_match.append(r)
content_match = sorted(content_match, key=lambda row: row['score'], reverse=True)[:size]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里还需要排序吗? content_match 这个变量名也换一下

@cir9no cir9no force-pushed the feat/seasearch-add-wiki-search-sup branch 3 times, most recently from cba51d5 to 7e34e24 Compare October 24, 2024 05:53

need_added_files = added_files + modified_files

recently_restore_uuid_path = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成 recently_restore_uuid_to_path

old_cfg = self.get_wiki_conf(wiki_id, old_commit_id)
new_cfg = self.get_wiki_conf(wiki_id, new_commit_id)
prev_path, prev_recycled = self.get_uuid_path_mapping(old_cfg)
curr_path, curr_recycled = self.get_uuid_path_mapping(new_cfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

变量至少要是个名词吧?这里都应该是复数形式吧

@cir9no cir9no force-pushed the feat/seasearch-add-wiki-search-sup branch 7 times, most recently from 0dc90b3 to 99320a3 Compare November 4, 2024 02:54
@cir9no cir9no force-pushed the feat/seasearch-add-wiki-search-sup branch from 4a225ae to 736a84f Compare November 4, 2024 05:48
@freeplant freeplant merged commit 59250c3 into master Nov 4, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants