Skip to content

Fix sampling rate issue when aggregating #4024

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

DradeAW
Copy link
Contributor

@DradeAW DradeAW commented Jul 2, 2025

Different versions of Kilosort output different sampling rate (sometimes rounding it to 2 decimal places, sometimes not)

This makes a crash when trying to aggregate both sortings together This PR fixes this

DradeAW and others added 2 commits July 2, 2025 16:13
Different versions of Kilosort output different sampling rate (sometimes rounding it to 2 decimal places, sometimes not)

This makes a crash when trying to aggregate both sortings together
This PR fixes this
@DradeAW
Copy link
Contributor Author

DradeAW commented Jul 2, 2025

I don't know why the tests fail, it has nothing to do with my changes

@zm711
Copy link
Member

zm711 commented Jul 2, 2025

updates in probeinterface that I think should be fixed now.

Copy link
Member

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this seems fine. Recordings these days are between 20kHz and 30Khz so we are looking for a tolerance of 0.01 or 30_000.01 which is a fraction of a percent off from each other. @DradeAW you think this is just a float storage issue and why choose 0.01 vs 1e-1 or 1e-3?

@DradeAW
Copy link
Contributor Author

DradeAW commented Jul 2, 2025

Floating point errors can happen, but I'm not talking about that

In my case, I have multiple sortings for a single recording, and because of how some spike sorting algorithms work, the sampling frequency is sometimes rounded and sometimes not, creating a difference

@h-mayorquin
Copy link
Collaborator

Can we fix these in the output of the sorting algorithms at least here? This would be safer that the proposed solution.

For neuropixels I have seen variations of 0.01 that do add up with long recordings so I don't think that we should omit those differences by default.

@zm711
Copy link
Member

zm711 commented Jul 2, 2025

Can we fix these in the output of the sorting algorithms at least here? This would be safer that the proposed solution.

But we don't control the algorithms for getting the sorting sampling rate. So you want our wrappers to change them?

@h-mayorquin
Copy link
Collaborator

Can we fix these in the output of the sorting algorithms at least here? This would be safer that the proposed solution.

But we don't control the algorithms for getting the sorting sampling rate. So you want our wrappers to change them?

Yes, that's what I am proposing. if we are so sure that they are equal that we want to relax the criteria here then why not just change them there? This seems safer than increasing the false positives here by relaxing the criteria.

@zm711
Copy link
Member

zm711 commented Jul 2, 2025

For me I think the problem is that we allow fractional sampling rates (ie floats) in general. If sampling rates were always ints then rounding wouldn't be an issue.

Could you explain this more:

For neuropixels I have seen variations of 0.01 that do add up with long recordings so I don't think that we should omit those differences by default.

I am actually curious how sampling rate issues across long recordings get condensed into the singular sampling rate post a sorting. For example if you sampling rate varies in a long recording between 30_000.01, 30_000.02, 30_000.03, the sorting has to pick one value to use for its sampling rate so how do we dealing with that at the recording level.

@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Jul 2, 2025

[EDIT] Ok, we discussed in slack

Zach meant that all known acquisition systems have a nominal sampling rate that is always an integer.
I then mentioned that sometimes the rate needs to be adjusted by really small numbers. For example, a sampling rate of 30_000 vs 30_000.001 means that in an hour those two difference process will be off by three samples. You can set precise timestamps or correct the sampling rate but the point is that sometimes float sampling rates do matter for fine adjustment.

@zm711
Copy link
Member

zm711 commented Jul 2, 2025

One minor edit. I assume most are nominally an integer. I'm sure there are always exceptions to this rule.

@alejoe91 alejoe91 added the core Changes to core module label Jul 3, 2025
@alejoe91 alejoe91 added this to the 0.103.0 milestone Jul 3, 2025
@h-mayorquin
Copy link
Collaborator

The sampling rate of some Axon files that I got today:

image

@alejoe91
Copy link
Member

alejoe91 commented Jul 4, 2025

The sampling rate of some Axon files that I got today:

image

This is just to show that some systems don't have nominal sampling rate, right @h-mayorquin ?

I think we can merge this one

@zm711
Copy link
Member

zm711 commented Jul 4, 2025

Actually it is two fold. One it shows that some do have truly float. But the actually convo (partially on slack) was Heberto is against this PR because it could make it easier to aggregate incorrectly. I was in favor of this because I think float issues with rounding is so common. Heberto pulled me a little closer to his side but I’ve moved more to neutral. So I think it would be helpful for the debate for @alejoe91 to comment why you are in favor.

@samuelgarcia
Copy link
Member

Zach meant that all known acquisition systems have a nominal sampling rate that is always an integer. I then mentioned that sometimes the rate needs to be adjusted by really small numbers. For example, a sampling rate of 30_000 vs 30_000.001 means that in an hour those two difference process will be off by three samples. You can set precise timestamps or correct the sampling rate but the point is that sometimes float sampling rates do matter for fine adjustment.

In device this is not the sample rate which is an integer but the sampling period (and more often multiple of a shorter sampling period). So the sampling rate should in fact be very rarely a interger

@zm711
Copy link
Member

zm711 commented Jul 9, 2025

That's a great point @samuelgarcia. My point was machines often have you select a nominal int (which more often for the various machines I use these days are integer sampling rates--> float periods), but the other way would be to in general have int sample periods which would often convert to float sampling rate.

I guess where are you getting the idea that machines are period based rather than sample rate based? You could do either right? I'm definitely willing to be convinced that my sample rate is actually a bias from what I have worked on myself.

But if this is true wouldn't it be better for us to store info as a sample period int rather than a rate. Because the comparison of the int period won't have float issues?

@h-mayorquin
Copy link
Collaborator

This is just to show that some systems don't have nominal sampling rate, right @h-mayorquin ?

I think we can merge this one

Yes, to show that the sampling rate is not always an integer which I think is a requirement for this PR to be sound. Unless I am misunderstanding.

@alejoe91
Copy link
Member

This is just to show that some systems don't have nominal sampling rate, right @h-mayorquin ?
I think we can merge this one

Yes, to show that the sampling rate is not always an integer which I think is a requirement for this PR to be sound. Unless I am misunderstanding.

I don't think sampling frequency being ints is a requirement for the PR, since the math.isclose would work for floats too.

You can set precise timestamps or correct the sampling rate but the point is that sometimes float sampling rates do matter for fine adjustment.

This holds with float sampling frequencies too, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to core module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants