In [1]:

import numpy, scipy, matplotlib.pyplot as plt, sklearn, librosa, mir_eval, IPython.display, urllib
plt.rcParams['figure.figsize'] = (14, 4)

K-Means Clustering¶

Sometimes, an unsupervised learning technique is preferred. Perhaps you do not have access to adequate training data, or perhaps the training data's labels are not completely clear. Maybe you just want to quickly sort real-world, unseen, data into groups based on its feature similarity.

In such cases, clustering is a great option!

Download an audio file:

In [2]:

filename = '125_bounce.wav'
urllib.urlretrieve('http://audio.musicinformationretrieval.com/' + filename,
                   filename=filename)

Out[2]:

('125_bounce.wav', <httplib.HTTPMessage instance at 0x1138a6c20>)

Play the audio file:

In [3]:

IPython.display.Audio(filename)

Out[3]:

Your browser does not support the audio element.

Load the audio file into an array.

In [4]:

x, fs = librosa.load(filename)
print fs

22050

Plot audio signal:

In [5]:

librosa.display.waveplot(x, fs)

Out[5]:

<matplotlib.collections.PolyCollection at 0x115623690>

Onset Detection¶

Detect onsets:

In [6]:

onset_frames = librosa.onset.onset_detect(x, sr=fs, delta=0.04, wait=4)
onset_times = librosa.frames_to_time(onset_frames, sr=fs)
onset_samples = librosa.frames_to_samples(onset_frames)

Listen to detected onsets:

In [7]:

x_with_beeps = mir_eval.sonify.clicks(onset_times, fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)

Out[7]:

Your browser does not support the audio element.

Feature Extraction¶

Let's compute the zero crossing rate and energy for each detected onset.

Plot the zero crossing rate:

In [8]:

def extract_features(x, fs):
    zcr = librosa.zero_crossings(x).sum()
    energy = scipy.linalg.norm(x)
    return [zcr, energy]

In [9]:

frame_sz = fs*0.090
features = numpy.array([extract_features(x[i:i+frame_sz], fs) for i in onset_samples])
print features.shape

(36, 2)

Feature Scaling¶

Scale the features (using the scale function) from -1 to 1.

In [10]:

min_max_scaler = sklearn.preprocessing.MinMaxScaler(feature_range=(-1, 1))
features_scaled = min_max_scaler.fit_transform(features)
print features_scaled.shape
print features_scaled.min(axis=0)
print features_scaled.max(axis=0)

(36, 2)
[-1. -1.]
[ 1.  1.]

Plot the features.

In [11]:

plt.scatter(features_scaled[:,0], features_scaled[:,1])
plt.xlabel('Zero Crossing Rate (scaled)')
plt.ylabel('Spectral Centroid (scaled)')

Out[11]:

<matplotlib.text.Text at 0x115603410>

Using K-Means¶

Time to cluster! Let's initialize the algorithm to find three clusters.

In [12]:

model = sklearn.cluster.KMeans(n_clusters=2)
labels = model.fit_predict(features_scaled)
print labels

[0 0 1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 1 0 1 1 1 1 0 0]

Plot the results.

In [13]:

plt.scatter(features_scaled[labels==0,0], features_scaled[labels==0,1], c='b')
plt.scatter(features_scaled[labels==1,0], features_scaled[labels==1,1], c='r')
plt.xlabel('Zero Crossing Rate (scaled)')
plt.ylabel('Energy (scaled)')
plt.legend(('Class 0', 'Class 1'))

Out[13]:

<matplotlib.legend.Legend at 0x115c3ded0>

Listen to onsets assigned to Class 0:

In [14]:

x_with_beeps = mir_eval.sonify.clicks(onset_times[labels==0], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)

Out[14]:

Your browser does not support the audio element.

Class 1:

In [15]:

x_with_beeps = mir_eval.sonify.clicks(onset_times[labels==1], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)

Out[15]:

Your browser does not support the audio element.

Affinity Propagation¶

In scikit-learn, other clustering algorithms such as affinity propagation can cluster without defining the number of clusters beforehand.

All we need to do is swap out KMeans for AffinityPropagation:

In [16]:

model = sklearn.cluster.AffinityPropagation()
labels = model.fit_predict(features_scaled)
print labels

[0 0 2 1 2 0 1 0 2 2 0 1 2 1 2 0 1 0 2 2 0 1 2 1 2 0 1 0 2 0 2 1 2 2 0 0]

Plot features:

In [17]:

plt.scatter(features_scaled[labels==0,0], features_scaled[labels==0,1], c='b')
plt.scatter(features_scaled[labels==1,0], features_scaled[labels==1,1], c='r')
plt.scatter(features_scaled[labels==2,0], features_scaled[labels==2,1], c='y')
plt.xlabel('Zero Crossing Rate (scaled)')
plt.ylabel('Energy (scaled)')
plt.legend(('Class 0', 'Class 1', 'Class 2'))

Out[17]:

<matplotlib.legend.Legend at 0x115c81590>

Play a beep upon each frame in the same cluster:

Class 0:

In [18]:

x_with_beeps = mir_eval.sonify.clicks(onset_times[labels==0], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)

Out[18]:

Your browser does not support the audio element.

Class 1:

In [19]:

x_with_beeps = mir_eval.sonify.clicks(onset_times[labels==1], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)

Out[19]:

Your browser does not support the audio element.

Class 2:

In [20]:

x_with_beeps = mir_eval.sonify.clicks(onset_times[labels==2], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)

Out[20]:

Your browser does not support the audio element.

← Back to Index

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kmeans.md

kmeans.md

K-Means Clustering¶

Onset Detection¶

Feature Extraction¶

Feature Scaling¶

Using K-Means¶

Affinity Propagation¶

Files

kmeans.md

Latest commit

History

kmeans.md

File metadata and controls

K-Means Clustering¶

Onset Detection¶

Feature Extraction¶

Feature Scaling¶

Using K-Means¶

Affinity Propagation¶