Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AI-GCS: Implement efficient hamming distance fingerprint matching, e.g., the core search #5

Open
pombredanne opened this issue Oct 17, 2024 · 2 comments
Assignees

Comments

@pombredanne
Copy link
Member

This is to implement efficient search and matching of fingerprints in hamming distance and should implement the selected design with a basic search ranking procedure as efficient search and matching engine code.

@pombredanne pombredanne converted this from a draft issue Oct 17, 2024
@pombredanne pombredanne added this to the 2-Initial development milestone Oct 17, 2024
@JonoYang
Copy link
Member

We have implemented the halohash algorithm for approximate file matching at https://github.com/aboutcode-org/matchcode-toolkit/blob/main/src/matchcode_toolkit/halohash.py

This is the core hashing algorithm that we use to create the file fingerprint halo1, the snippet hashes, and the directory matching hashes.

@JonoYang
Copy link
Member

JonoYang commented Nov 8, 2024

This is available at https://pypi.org/project/matchcode-toolkit/

@pombredanne pombredanne moved this from In Progress to Validated in 06-AI-generated Code Search Dec 23, 2024
@pombredanne pombredanne moved this from Validated to In Progress in 06-AI-generated Code Search Dec 23, 2024
@pombredanne pombredanne self-assigned this Dec 23, 2024
@pombredanne pombredanne moved this from In Progress to Validated in 06-AI-generated Code Search Dec 26, 2024
@pombredanne pombredanne moved this from Validated to In Review in 06-AI-generated Code Search Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

No branches or pull requests

2 participants