add_summary_index #379

shenzheng-1 · 2024-09-03T10:18:07Z

No description provided.

JoinTyang · 2024-09-11T06:00:06Z

seasearch/index_store/repo_file_name_index.py

@@ -29,6 +30,10 @@ class RepoFileNameIndex(object):
                    },
                },
            },
+            'description': {
+                'type': 'text',
+                'analyzer': 'standard'


不需要指定，默认就是standard

JoinTyang · 2024-09-11T08:10:07Z

seasearch/script/repo_filename_index_local.py

@@ -104,7 +104,7 @@ def thread_task(self, repos_queue):
                repo_id = queue_data[0]
                commit_id = queue_data[1]
                try:
-                    self.index_manager.update_library_filename_index(repo_id, commit_id, self.repo_filename_index, self.repo_status_filename_index)
+                    self.index_manager.update_library_filename_index(repo_id, commit_id, self.repo_filename_index, self.repo_status_filename_index, 3600 * 24 * 365)


脚本中只更新一年的吗

JoinTyang · 2024-09-11T08:31:42Z

seasearch/index_store/repo_file_name_index.py

@@ -98,6 +103,8 @@ def _make_match_query(field, key_word, **kw):
                }
            }
        })
+        if need_index_description(repo_id, session, metadata_server_api):


这个判断是不是可以不加？否则每次全局查询都有查很多次数据库和metadata

JoinTyang · 2024-09-12T06:59:35Z

seasearch/index_store/repo_file_name_index.py

+        per_size = SEASEARCH_BULK_OPETATE_LIMIT
+        start = 0
+        while True:
+            hits, total = self.query_data_by_paths(index_name, paths, start, per_size)


改成循环paths

JoinTyang · 2024-09-12T07:22:01Z

seasearch/index_store/index_manager.py

+                if interval:
+                    last_update_time = datetime.now() - timedelta(seconds=interval)
+                    last_update_time = timestamp_to_isoformat_timestr(last_update_time.timestamp())
+                    sql = f"SELECT `_id`, `_mtime`, `_description`, `_parent_dir`, `_name` FROM `{METADATA_TABLE.name}` WHERE `_mtime` >= '{last_update_time}'"


加上文件类型的筛选条件

JoinTyang · 2024-09-19T08:53:26Z

seasearch/index_store/repo_file_name_index.py


        need_added_files = added_files + modified_files
-        self.add_files(index_name, repo_id, need_added_files)
+        update_paths = []
+        add_rows = {}


类似这样的变量都需要改成名词

JoinTyang · 2024-09-19T09:01:48Z

seasearch/index_store/index_manager.py

-
-            if new_commit_id == from_commit:
-                return
+            description_updated_time = repo_status.description_updated_time


这里需要考虑初次创建索引时为空的情况

JoinTyang · 2024-09-19T09:07:58Z

seasearch/index_store/index_manager.py

+            if need_index_description(repo_id, self.session, self.metadata_server_api):
+                if description_updated_time:
+                    last_update_time = timestamp_to_isoformat_timestr(float(description_updated_time))
+                    sql = f"SELECT `_id`, `_mtime`, `_description`, `_parent_dir`, `_name`, `_obj_id` FROM `{METADATA_TABLE.name}` WHERE `_is_dir` = False AND `_mtime` >= '{last_update_time}'"


这里改成分页查询，因为有的资料库文件可能会比较多

JoinTyang · 2024-09-19T09:17:33Z

seasearch/index_store/index_manager.py

+                else:
+                    sql = f"SELECT `_id`, `_mtime`, `_description`, `_parent_dir`, `_name`, `_obj_id` FROM `{METADATA_TABLE.name}` WHERE `_is_dir` = False"
+                query_timestamp = time.time()
+                rows = self.metadata_server_api.query_rows(repo_id, sql, []).get('results', [])


这里加下判断 rows 为空并且commit没有变化就return

JoinTyang · 2024-09-20T05:37:46Z

seasearch/index_store/repo_file_name_index.py

@@ -213,6 +217,7 @@ def add_files(self, index_name, repo_id, files):
                'path': path,
                'suffix': suffix,
                'filename': filename,
+                'description': rows.get(obj_id, ''),


不能通过obj_id 获取description，因为新建的文件obj_id都是一样的

JoinTyang · 2024-09-20T05:47:49Z

seasearch/index_store/index_manager.py

+                    description_updated_time = datetime(1970, 1, 1).timestamp()
+                last_update_time = timestamp_to_isoformat_timestr(float(description_updated_time))
+                sql = f"SELECT `_id`, `_mtime`, `_description`, `_parent_dir`, `_name`, `_obj_id` FROM `{METADATA_TABLE.name}` WHERE `_is_dir` = False AND `_mtime` >= '{last_update_time}'"
+                query_timestamp = time.time()


query_timestamp 这个放到每次启动这个定时任务时获取，否则会错过一些索引更新期间添加的description

shenzheng-1 force-pushed the add_summary_index branch from d49901f to 613bbee Compare September 10, 2024 08:11

JoinTyang reviewed Sep 11, 2024

View reviewed changes

JoinTyang reviewed Sep 12, 2024

View reviewed changes

zheng.shen added 19 commits September 19, 2024 17:03

add_summary_index

5619e04

update

f5c2e26

update

c87d9a0

update

e7ede7f

update

06f5ab0

update

5f97a2d

update

e47de7a

update

a5e6394

update

b189d0f

update

b0c92a7

update

19bd40c

update

2d79660

update

0d2b35b

update

01ddf5d

update

b11eecb

update

d210f18

update query time

910238f

update

d25b1f5

update

37605eb

shenzheng-1 force-pushed the add_summary_index branch from 272023e to 37605eb Compare September 19, 2024 09:03

JoinTyang reviewed Sep 19, 2024

View reviewed changes

zheng.shen added 3 commits September 19, 2024 17:47

update

f498491

update

4eced07

update

add4733

JoinTyang reviewed Sep 20, 2024

View reviewed changes

update

1efd869

JoinTyang force-pushed the add_summary_index branch from 91ff1ae to 452bf35 Compare September 21, 2024 07:09

optimize code

ac32b77

JoinTyang force-pushed the add_summary_index branch from 452bf35 to ac32b77 Compare September 21, 2024 07:20

freeplant merged commit f340bd3 into master Sep 21, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add_summary_index #379

add_summary_index #379

shenzheng-1 commented Sep 3, 2024

JoinTyang Sep 11, 2024

JoinTyang Sep 11, 2024

JoinTyang Sep 11, 2024

JoinTyang Sep 12, 2024

JoinTyang Sep 12, 2024

JoinTyang Sep 19, 2024

JoinTyang Sep 19, 2024

JoinTyang Sep 19, 2024 •

edited

Loading

JoinTyang Sep 19, 2024

JoinTyang Sep 20, 2024

JoinTyang Sep 20, 2024

add_summary_index #379

add_summary_index #379

Conversation

shenzheng-1 commented Sep 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoinTyang Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoinTyang Sep 19, 2024 •

edited

Loading