clustering

4. Clustering

-- Yang Eric Li

Once we have normalized the data and perfromed the differential expression analysis, we can cluster the samples relevant to the biological questions. It is a hard problem to do the unsupervised clustering without prior knowledge. That is, we need to identify groups of samples based on the similarities of the transcriptomes. Moreover, in most situations we do not even know the number of clusters a priori. The problem is made even more challenging due to the high level of noise (both technical and biological) and the large number of dimensions (i.e. genes).