-
Notifications
You must be signed in to change notification settings - Fork 221
Fix sampling rate issue when aggregating #4024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Different versions of Kilosort output different sampling rate (sometimes rounding it to 2 decimal places, sometimes not) This makes a crash when trying to aggregate both sortings together This PR fixes this
for more information, see https://pre-commit.ci
I don't know why the tests fail, it has nothing to do with my changes |
updates in probeinterface that I think should be fixed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me this seems fine. Recordings these days are between 20kHz and 30Khz so we are looking for a tolerance of 0.01 or 30_000.01 which is a fraction of a percent off from each other. @DradeAW you think this is just a float storage issue and why choose 0.01 vs 1e-1 or 1e-3?
Floating point errors can happen, but I'm not talking about that In my case, I have multiple sortings for a single recording, and because of how some spike sorting algorithms work, the sampling frequency is sometimes rounded and sometimes not, creating a difference |
Can we fix these in the output of the sorting algorithms at least here? This would be safer that the proposed solution. For neuropixels I have seen variations of 0.01 that do add up with long recordings so I don't think that we should omit those differences by default. |
But we don't control the algorithms for getting the sorting sampling rate. So you want our wrappers to change them? |
Yes, that's what I am proposing. if we are so sure that they are equal that we want to relax the criteria here then why not just change them there? This seems safer than increasing the false positives here by relaxing the criteria. |
For me I think the problem is that we allow fractional sampling rates (ie floats) in general. If sampling rates were always ints then rounding wouldn't be an issue. Could you explain this more:
I am actually curious how sampling rate issues across long recordings get condensed into the singular sampling rate post a sorting. For example if you sampling rate varies in a long recording between 30_000.01, 30_000.02, 30_000.03, the sorting has to pick one value to use for its sampling rate so how do we dealing with that at the recording level. |
[EDIT] Ok, we discussed in slack Zach meant that all known acquisition systems have a nominal sampling rate that is always an integer. |
One minor edit. I assume most are nominally an integer. I'm sure there are always exceptions to this rule. |
This is just to show that some systems don't have nominal sampling rate, right @h-mayorquin ? I think we can merge this one |
Actually it is two fold. One it shows that some do have truly float. But the actually convo (partially on slack) was Heberto is against this PR because it could make it easier to aggregate incorrectly. I was in favor of this because I think float issues with rounding is so common. Heberto pulled me a little closer to his side but I’ve moved more to neutral. So I think it would be helpful for the debate for @alejoe91 to comment why you are in favor. |
In device this is not the sample rate which is an integer but the sampling period (and more often multiple of a shorter sampling period). So the sampling rate should in fact be very rarely a interger |
That's a great point @samuelgarcia. My point was machines often have you select a nominal int (which more often for the various machines I use these days are integer sampling rates--> float periods), but the other way would be to in general have int sample periods which would often convert to float sampling rate. I guess where are you getting the idea that machines are period based rather than sample rate based? You could do either right? I'm definitely willing to be convinced that my sample rate is actually a bias from what I have worked on myself. But if this is true wouldn't it be better for us to store info as a sample period int rather than a rate. Because the comparison of the int period won't have float issues? |
Yes, to show that the sampling rate is not always an integer which I think is a requirement for this PR to be sound. Unless I am misunderstanding. |
I don't think sampling frequency being ints is a requirement for the PR, since the
This holds with float sampling frequencies too, no? |
Different versions of Kilosort output different sampling rate (sometimes rounding it to 2 decimal places, sometimes not)
This makes a crash when trying to aggregate both sortings together This PR fixes this