Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accuracy degraded when using latest sklearn version #9

Open
bobbui opened this issue Aug 23, 2019 · 2 comments
Open

accuracy degraded when using latest sklearn version #9

bobbui opened this issue Aug 23, 2019 · 2 comments

Comments

@bobbui
Copy link

bobbui commented Aug 23, 2019

I tried to retrain using the train.py and exact the same train data file. but running the accuracy-test side by side between the model generated by sklearn 0.22 and the existing one (sklearn 0.20), the one from 0.22 perform significantly worse than the one from 0,20.
Any idea why this happens?
thanks.

@dimitrismistriotis
Copy link
Contributor

Can confirm that this is what happens:
https://github.com/dimitrismistriotis/profanity-check/blob/create_models_from_clean_data/profanity_check/train_models.py

with the following pytest output:

================================================================ test session starts ================================================================
platform linux -- Python 3.8.0, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /home/dimitry/projects/profanity-check
collected 2 items                                                                                                                                   

tests/test_profanity_check.py F.                                                                                                              [100%]

===================================================================== FAILURES ======================================================================
___________________________________________________________________ test_accuracy ___________________________________________________________________

    def test_accuracy():
      texts = [
        'Hello there, how are you',
        'Lorem Ipsum is simply dummy text of the printing and typesetting industry.',
        '!!!! Click this now!!! -> https://example.com',
        'fuck you',
        'fUcK u',
        'GO TO hElL, you dirty scum',
      ]
>     assert list(predict(texts)) == [0, 0, 0, 1, 1, 1]
E     assert [0, 0, 0, 0, 0, 0] == [0, 0, 0, 1, 1, 1]
E       At index 3 diff: 0 != 1
E       Use -v to get the full diff

@dimitrismistriotis
Copy link
Contributor

https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html

Copy+paste of default values from current version(https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html):

 class sklearn.svm.LinearSVC(penalty='l2', loss='squared_hinge', dual=True, tol=0.0001, C=1.0, multi_class='ovr', fit_intercept=True, intercept_scaling=1, class_weight=None, verbose=0, random_state=None, max_iter=1000)

Copy+paste from 0.20.4, closest version to one in this library (https://scikit-learn.org/0.20/modules/generated/sklearn.svm.LinearSVC.html):

class sklearn.svm.LinearSVC(penalty='l2', loss='squared_hinge', dual=True, tol=0.0001, C=1.0, multi_class='ovr', fit_intercept=True, intercept_scaling=1, class_weight=None, verbose=0, random_state=None, max_iter=1000)

They are the same unless I am missing something.
Perhaps we need to check if the implementation has changed. Another cause could be that the blog post code is not the one that ended up being used to generate the models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants