accuracy degraded when using latest sklearn version #9

bobbui · 2019-08-23T05:29:02Z

I tried to retrain using the train.py and exact the same train data file. but running the accuracy-test side by side between the model generated by sklearn 0.22 and the existing one (sklearn 0.20), the one from 0.22 perform significantly worse than the one from 0,20.
Any idea why this happens?
thanks.

dimitrismistriotis · 2020-05-03T18:56:30Z

Can confirm that this is what happens:
https://github.com/dimitrismistriotis/profanity-check/blob/create_models_from_clean_data/profanity_check/train_models.py

with the following pytest output:

================================================================ test session starts ================================================================
platform linux -- Python 3.8.0, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /home/dimitry/projects/profanity-check
collected 2 items                                                                                                                                   

tests/test_profanity_check.py F.                                                                                                              [100%]

===================================================================== FAILURES ======================================================================
___________________________________________________________________ test_accuracy ___________________________________________________________________

    def test_accuracy():
      texts = [
        'Hello there, how are you',
        'Lorem Ipsum is simply dummy text of the printing and typesetting industry.',
        '!!!! Click this now!!! -> https://example.com',
        'fuck you',
        'fUcK u',
        'GO TO hElL, you dirty scum',
      ]
>     assert list(predict(texts)) == [0, 0, 0, 1, 1, 1]
E     assert [0, 0, 0, 0, 0, 0] == [0, 0, 0, 1, 1, 1]
E       At index 3 diff: 0 != 1
E       Use -v to get the full diff

dimitrismistriotis · 2020-05-05T08:29:04Z

https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html

Copy+paste of default values from current version(https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html):

 class sklearn.svm.LinearSVC(penalty='l2', loss='squared_hinge', dual=True, tol=0.0001, C=1.0, multi_class='ovr', fit_intercept=True, intercept_scaling=1, class_weight=None, verbose=0, random_state=None, max_iter=1000)

Copy+paste from 0.20.4, closest version to one in this library (https://scikit-learn.org/0.20/modules/generated/sklearn.svm.LinearSVC.html):

class sklearn.svm.LinearSVC(penalty='l2', loss='squared_hinge', dual=True, tol=0.0001, C=1.0, multi_class='ovr', fit_intercept=True, intercept_scaling=1, class_weight=None, verbose=0, random_state=None, max_iter=1000)

They are the same unless I am missing something.
Perhaps we need to check if the implementation has changed. Another cause could be that the blog post code is not the one that ended up being used to generate the models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accuracy degraded when using latest sklearn version #9

accuracy degraded when using latest sklearn version #9

bobbui commented Aug 23, 2019

dimitrismistriotis commented May 3, 2020

dimitrismistriotis commented May 5, 2020

accuracy degraded when using latest sklearn version #9

accuracy degraded when using latest sklearn version #9

Comments

bobbui commented Aug 23, 2019

dimitrismistriotis commented May 3, 2020

dimitrismistriotis commented May 5, 2020