Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use uuid5 associated with file SHA1 instead of randomly generated uuid4 #11

Merged
merged 12 commits into from
Dec 31, 2023

Conversation

pk5ls20
Copy link
Collaborator

@pk5ls20 pk5ls20 commented Dec 30, 2023

No description provided.

@pk5ls20 pk5ls20 requested a review from hv0905 December 30, 2023 13:52
@hv0905
Copy link
Owner

hv0905 commented Dec 30, 2023

Using the uuid5 directly will rewrite the old entry in database with the new entry directly, if two images have a same SHA1.

Maybe we should query the database to make sure the image doesn't exist in the db before copying and inserting?

@pk5ls20
Copy link
Collaborator Author

pk5ls20 commented Dec 30, 2023

Using the uuid5 directly will rewrite the old entry in database with the new entry directly, if two images have a same SHA1.

Yes, but if the SHA1 of both images is the same, we can assume that both images are the same, i.e. only necessary to index once

Maybe we should query the database before copying and inserting to make sure the images don't exist in the database?

In a sense, there will be no difference in the final database contents between the two, whether you query the database for the same id entry already existing before insertion or just insert it. But maybe the former will save indexing time?

I'll test the time difference between the two later

Due to considering the statefulness of the content in the payload, direct overwriting is not a good option, and the best way to do this is to perform appropriate checks to ensure that any data is not accidentally overwritten

@pk5ls20 pk5ls20 marked this pull request as draft December 30, 2023 16:15
@pk5ls20 pk5ls20 marked this pull request as ready for review December 30, 2023 21:36
scripts/local_indexing.py Outdated Show resolved Hide resolved
scripts/local_utility.py Show resolved Hide resolved
@pk5ls20 pk5ls20 self-assigned this Dec 31, 2023
@pk5ls20 pk5ls20 marked this pull request as draft December 31, 2023 13:53
@pk5ls20 pk5ls20 requested a review from hv0905 December 31, 2023 13:54
@pk5ls20
Copy link
Collaborator Author

pk5ls20 commented Dec 31, 2023

@hv0905 I think this PR is now ready to be merged into master (:з」∠)

@pk5ls20 pk5ls20 marked this pull request as ready for review December 31, 2023 15:15
@hv0905 hv0905 merged commit a230014 into master Dec 31, 2023
4 checks passed
@hv0905 hv0905 deleted the uuid5 branch December 31, 2023 17:57
@pk5ls20 pk5ls20 removed their assignment Dec 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants