You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current findTrainingData phase can be optimized to show more positive samples than negative so that we can converge to the correct models faster. Currently, we do 10 pos and 10 neg each. In the earlier rounds, even the pos we sample are mostly neg, leading to longer training cycles by running ftd and label. What if we changed this to 15 pos and 5 neg?
Introduce a new phase findtrainingDataV2 and let us see if that helps building models faster. If it works better based on our own testing and user feedback, we can make it the default going forward.
The text was updated successfully, but these errors were encountered:
running FTD - 10 pos, 10 neg on febrl120k on a new model
first round of ftd and label - start with 0 matches and 22 pairs for labeling
second round of ftd and label - start with 0 matches and 20 pairs for labeling
third round of ftd and label - start with 20 matches and 20 pairs for labeling
fourth round of ftd and label - start with 33 matches and 20 pairs for labeling
fifth round of ftd and label - start with 39 matches and 20 pairs for labeling
In fifth round - we get 40 matching pairs
trained model on 41 pos and 41 neg pairs
cc converged in 3 iterations with 5:05 mins to run match
running FTD - 15 pos, 5 neg on febrl120k on a new model
first round of ftd and label - start with 0 matches and 18 pairs for labeling
second round of ftd and label - start with 0 matches and 10 pairs for labeling
third round of ftd and label - start with 10 matches and 20 pairs for labeling
fourth round of ftd and label - start with 26 matches and 20 pairs for labeling
fifth round of ftd and label - start with 30 matches and 20 pairs for labeling
In fifth round - we get 40 matching pairs
trained model on 41 pos and 47 neg pairs
cc converged in 3 iterations with 5:07 mins to run match
The current findTrainingData phase can be optimized to show more positive samples than negative so that we can converge to the correct models faster. Currently, we do 10 pos and 10 neg each. In the earlier rounds, even the pos we sample are mostly neg, leading to longer training cycles by running ftd and label. What if we changed this to 15 pos and 5 neg?
Introduce a new phase findtrainingDataV2 and let us see if that helps building models faster. If it works better based on our own testing and user feedback, we can make it the default going forward.
The text was updated successfully, but these errors were encountered: