fix: document truncation and loss in notion document sync #5631
+15
−16
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
notion extractor
only retrieves the first page of many blocks, and the subsequent blocks are lost.According to the introduction of Pagination in the Notion Developers document, when the number of Blocks contained in a Pagination exceeds 100, it is necessary to obtain them in a paginated manner to get the complete content of the Notion Page.
However, the acquisition method in
notion_extractor.py
can only successfully obtain the first blocks page of the Notion Page (up to 100). It is not difficult to find out from the Notion Developers document that the reason is that when callinghttps://api.notion.com/v1/blocks/{block_id}/children
, the start_cursor of the next page is mistakenly passed asblock_id
, while in factstart_cursor
is passed through the Query Params of the GET request.In addition, the parameter transmission method of the Query Params of the GET request is also wrong (formal parameter: json -> params).
Fixes # (issue)
Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Find a longer Notion Page (with more than 100 Blocks) and perform the
Sync from Notion
operation inKnowledge
to verify that the version after this PR can synchronize the complete Notion Page content, while the previous version can only obtain the content of the first 100 Blocks, and the other content is lost.Suggested Checklist:
dev/reformat
(backend) andcd web && npx lint-staged
(frontend) to appease the lint godsoptional
I have made corresponding changes to the documentationoptional
I have added tests that prove my fix is effective or that my feature worksoptional
New and existing unit tests pass locally with my changes