Skip to content

2. How it all works

carsen-stringer edited this page Mar 3, 2024 · 5 revisions

NOTE: This Wiki is kept here for reference for previous versions of Kilosort. For updated information about Kilosort4, please refer to kilosort.readthedocs.io instead.


See the Kilosort4 paper for all methods details.

The pipeline is divided into four steps, which we explain separately. The newest feature of Kilosort2 is drift correction (step 2), which we go into more detail in the next wiki entry. We also have a detailed wiki entry on template matching (step 3), which Kilosort2 inherits from Kilosort1, but is different from the more classical workflows you might be used to, which start with threshold crossings, then PCA feature computations, then clustering.

1) Data preprocessing and channel selection.

First we determine which channels contain significant numbers of negative threshold crossings (default 0.1Hz). In our experience, channels with very few threshold crossings are not useful and can only confuse the sorting. We high-pass filter the data (default 150Hz), and "whiten" the channels. Whitening is the process of removing the correlations between a channel and its neighboring channels by subtracting a weighted sum of the neighbors. This increases the discriminability of spikes that are localized on the probe and thus sortable into single units, as contrasted with faraway spikes that are seen over a large number of channels and are typically not sortable.

2) Drift calculation and batch reordering.

Kilosort2 tracks waveforms as they change as a function of time, or as a function of drift. Because changes as a function of time are mostly due to probe drift, it is advantageous to "re-order time" so that Kilosort2 can traverse the data approximately in the order of vertical probe location. The preprocessed data is already broken up into small batches (default is 2s). The goal of this step is to determine the dissimilarity matrix between all batches, and re-order the batches so that the dissimilarity matrix becomes more diagonal, or at least the dissimilarities along the diagonal are smallest. To achieve this, spikes are extracted via threshold crossings from each batch, and then clustered via scaled k-means. To compare two batches, we compute the average minimum distance from each cluster in one batch to the clusters in the other batch, then symmetrize this matrix and z-score it.

Re-ordering the dissimilarity matrix is related to the traveling salesman problem. We implemented two efficient approximate solutions. The default solution assigns a phase variable to each batch, and optimizes these phases by gradient descent so that the similarity between all batches x and y is approximated by cos(phase_x - phase_y). The alternative solution uses the rastermap algorithm. If the re-ordering is successful, the re-ordered matrix should clearly contain most high-similarity values along the diagonal.

3) The main optimization.

To cluster neurons, Kilosort2 traverses the data twice, in the order computed at the previous step. On the first pass, a template of the waveforms is determined for each cluster. This template is allowed to change from batch to batch, retaining a memory of the previous ~N spikes (default N starts at 20 and increases to 400 during optimization). Because the batches have been sorted in order of electrode depth, as the depth changes slowly, the template will similarly change to track the moving waveforms. The iterative template matching procedure is similar to Kilosort1: spikes are detected from the raw data based on peak correlation with the set of templates, and a scaled version of this template is subtracted at that location, thus "peeling away" that spike. This allows other nearby spikes to be extracted even if they have a large overlap with the spikes that were peeled away.

Unlike Kilosort, Kilosort2 automatically determines the right number of templates. The algorithm starts with no templates, and continually scans the residual voltage data after template matching to find any spikes that were not matched to existing templates. Such spikes are introduced as new templates. Conversely, templates which do not capture spikes (default minFR rate is 1/50 Hz) are dropped during the optimization. This results in a continual turnover of spikes which Kilosort2 "tries on" to see if they can account for any of the residual voltage data. This procedure results in automated determination of the number of templates, and more importantly controls the degree of over-splitting. Because templates are introduced gradually, local minima are avoided in which two templates can co-exist that account for the same neuron, unless that neuron has a very large degree of amplitude variability. These relatively few cases are handled by periodically merging clusters which have very correlated waveforms. If any oversplits survive through the entire, they are handled in the final step below.

Note that both the first "optimization" pass, and the final "extraction" pass use the exact same template matching and subtraction algorithm. In contrast, Kilosort1 did not do template subtraction in the optimization phase, and was thus unable to determine whether the residual voltage contains waveforms unaccounted for by existing templates.

4) Final merges and splits.

These steps are meant to mimic the actions which a user would take refining the automated output of Kilosort in the Phy GUI. They result in a set of "good" neurons that are similar to that produced by a human curator in the Phy GUI.

Merges: on rare occasions (~ 5 per 300 neurons), the amplitude variability is so high that the neuron is oversplit and returned as a few different clusters. In these cases, we perform an additional merging step after the main optimization, where a merge goes through if the waveforms are correlated >0.5 AND the cross-correlogram has very few refractory violations, as would be expected if the clusters belonged to the same neuron.

Splits: by design, Kilosort2 over-merges units, which has several advantageous consequences. For example, similar units also "drift together". Therefore, changes to the template for one neuron will generalize to track the template from another neuron. Since sparsely-firing neurons are hard to track, grouping them together with a nearby unit can be beneficial. Another advantageous consequence is that the post-optimization steps are more restricted and independent: each cluster is split into multiple units completely independently of the other clusters, and the pieces are never re-merged. To determine directions of splits for each cluster, we developed a pursuit algorithm similar to independent components analysis (ICA), where we pursue bimodality rather than sparseness. The optimized projection contains a local maximum of bimodality, such that the one-dimensional distribution of spike projections on that dimension are well modelled by a mixture of Gaussians. The inferred overlap between the Gaussians in the mixture is taken as a measure of the contamination of the two putative clusters, and the AUC is used as a score. Only splits with an AUC score above splitAUC (default 0.9) can be made. The splits are veto-ed by the shape of the spike times cross-correlogram (CCG) after the putative split. If the CCG appears to be refractory, the split is veto-ed and the clusters are merged back together.

Splits (1/2 and 2/2): two sets of splits are performed by default to account for local minima of the pursuit procedure. The first pass is initialized with the top principal component of the cluster. The second pass is initialized with the mean template as the putative bimodal projection axis. The spikes are high-pass filtered at this step, to account for the slow changes induced by drift.

Threshold detection: finally, we found that some neurons at the noise floor can benefit from adaptive setting of the threshold for each neuron. We start this procedure with ops.Th(1), and independently for each neuron lower the threshold in 0.5 decrements, as long as the number of refractory violations in the autocorrelogram does not increase beyond the acceptable quality threshold (of 20% estimated contamination). For this step to work well, ops.Th(2) used in the final pass should be relatively small, so that all potential spikes are collected.