Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Zingg More Usable - Part 2 #928

Open
sonalgoyal opened this issue Oct 28, 2024 · 2 comments
Open

Make Zingg More Usable - Part 2 #928

sonalgoyal opened this issue Oct 28, 2024 · 2 comments
Assignees
Milestone

Comments

@sonalgoyal
Copy link
Member

The current findTrainingData phase can be optimized to show more positive samples than negative so that we can converge to the correct models faster. Currently, we do 10 pos and 10 neg each. In the earlier rounds, even the pos we sample are mostly neg, leading to longer training cycles by running ftd and label. What if we changed this to 15 pos and 5 neg?

Introduce a new phase findtrainingDataV2 and let us see if that helps building models faster. If it works better based on our own testing and user feedback, we can make it the default going forward.

@sonalgoyal sonalgoyal added this to the 0.5.0 milestone Oct 28, 2024
@sonalgoyal sonalgoyal assigned sania-16 and unassigned Nitish1814 Nov 5, 2024
@sonalgoyal sonalgoyal moved this to Todo in 0.5.0 Nov 5, 2024
@sonalgoyal sonalgoyal added this to 0.5.0 Nov 5, 2024
@sania-16
Copy link
Contributor

sania-16 commented Nov 6, 2024

running FTD - 10 pos, 10 neg on febrl120k on a new model
first round of ftd and label - start with 0 matches and 22 pairs for labeling
second round of ftd and label - start with 0 matches and 20 pairs for labeling
third round of ftd and label - start with 20 matches and 20 pairs for labeling
fourth round of ftd and label - start with 33 matches and 20 pairs for labeling
fifth round of ftd and label - start with 39 matches and 20 pairs for labeling

In fifth round - we get 40 matching pairs
trained model on 41 pos and 41 neg pairs
cc converged in 3 iterations
with 5:05 mins to run match

@sania-16
Copy link
Contributor

sania-16 commented Nov 6, 2024

running FTD - 15 pos, 5 neg on febrl120k on a new model
first round of ftd and label - start with 0 matches and 18 pairs for labeling
second round of ftd and label - start with 0 matches and 10 pairs for labeling
third round of ftd and label - start with 10 matches and 20 pairs for labeling
fourth round of ftd and label - start with 26 matches and 20 pairs for labeling
fifth round of ftd and label - start with 30 matches and 20 pairs for labeling

In fifth round - we get 40 matching pairs
trained model on 41 pos and 47 neg pairs
cc converged in 3 iterations
with 5:07 mins to run match

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

3 participants