You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, many open issues in some projects have a GFI probability of 99.99%, and some of these issues clearly should not be marked as GFI.
The performance metric of the model is also unusually high.
I examined the code and found two features that may be problematic. The first is 'created_at_timestamp', which is not one of the features and should not be included in X (def get_x_y() in gfibot/model/utils.py). The second one is 'rpt_gfi_ratio', when I try to drop this feature, the model performance metrics appear to drop significantly.
The problems can be solved by the following steps:
Now we use the following list. A new issues list like issues = [i for i in user.issues if i.closed_at <= t] should be created for calculating gfi_ratio and gfi_num later.
There may be a situation where most of the prediction probabilities are close to 0 after the above features are corrected because of the imbalance of positive and negative instances in the training data, which can be solved by balancing the training dataset using methods such as SMOTE and ADASYN. Then we can check whether the '99.99% probabilities' problem is solved.
The text was updated successfully, but these errors were encountered:
Currently, many open issues in some projects have a GFI probability of 99.99%, and some of these issues clearly should not be marked as GFI.
The performance metric of the model is also unusually high.
I examined the code and found two features that may be problematic. The first is 'created_at_timestamp', which is not one of the features and should not be included in X (def get_x_y() in gfibot/model/utils.py). The second one is 'rpt_gfi_ratio', when I try to drop this feature, the model performance metrics appear to drop significantly.
The problems can be solved by the following steps:
gfi-bot/gfibot/model/utils.py
Line 135 in 7ed0761
gfi-bot/gfibot/model/dataloader.py
Line 112 in 7ed0761
gfi-bot/gfibot/model/dataloader.py
Line 118 in 7ed0761
Now we use the following list. A new issues list like
issues = [i for i in user.issues if i.closed_at <= t]
should be created for calculating gfi_ratio and gfi_num later.gfi-bot/gfibot/data/dataset.py
Line 205 in 7ed0761
There may be a situation where most of the prediction probabilities are close to 0 after the above features are corrected because of the imbalance of positive and negative instances in the training data, which can be solved by balancing the training dataset using methods such as SMOTE and ADASYN. Then we can check whether the '99.99% probabilities' problem is solved.
The text was updated successfully, but these errors were encountered: