Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent values of KS-statistics #3

Open
vfilimonov opened this issue Jun 12, 2014 · 4 comments
Open

Inconsistent values of KS-statistics #3

vfilimonov opened this issue Jun 12, 2014 · 4 comments

Comments

@vfilimonov
Copy link

It seemed that sometimes values of KS statistics, that are stored in _xmin_kstest are different from those that are calculated when plfit.plfit is called with fixed xmin.

For example for the data from https://gist.github.com/vfilimonov/1072e402e922712ad980#file-tst-csv:

y = np.genfromtxt('tst.csv', delimiter=',')
tst_fit = plfit.plfit(y)

def func(xmin):
    ff = plfit.plfit(y, xmin=xmin, quiet=True, silent=True)
    return ff._ks

tst_KS_plfit = [func(xmin) for xmin in tst_fit._xmins]

Values in tst_fit._xmin_kstest and in tst_KS_plfit will be different:

plot(tst_fit._xmins, tst_fit._xmin_kstest, 'r.-')
plot(tst_fit._xmins, tst_KS_plfit, 'g.-')

pvals

@keflavich
Copy link
Owner

That's a small but significant difference... thanks for finding it. I'll try to pin down the problem...

@keflavich
Copy link
Owner

I can reproduce the error but haven't yet figured out the source of the problem. I thought it might be a failure to use only the unique data in the kstest_ function, but that's not it.

@keflavich
Copy link
Owner

Part of the problem is that your data are continuous, but the discrete/continuous "guesser" is identifying them as discrete. Pass discrete=False and you'll get a better result... but there is still some underlying inconsistency I haven't resolved.

@vfilimonov
Copy link
Author

Adam, thanks.

As a temporary workaround one can manually sweep along xmins - in this case KS-statistics is correct.

On 13 Jun 2014, at 17:15 , Adam Ginsburg [email protected] wrote:

Part of the problem is that your data are continuous, but the discrete/continuous "guesser" is identifying them as discrete. Pass discrete=False and you'll get a better result... but there is still some underlying inconsistency I haven't resolved.


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants