Skip to content

How to generate embedding

Suhas Srinivasan edited this page Dec 29, 2018 · 3 revisions

An additional script is provided to generate 2D embedding of the scRNA-seq data or for the latent features from DAWN. The steps to create visualizations are listed below.

For scRNA-seq data

  1. The scRNA-seq data should be in a Cell x Gene matrix, where Cells are the rows and Genes are the columns.
  2. The matrix values can be of counts or one of the four RNA-seq expression units (RPM, TPM, FPKM and RPKM).
    Note: Log normalized values should not be used.
  3. Perform any necessary filtering of cells based on your quality criteria.
  4. Remove all row and column labels, only the numerical matrix should be present and saved as a comma-separated values (CSV) file.
  5. Generate embedding: python visualizer.py <path to data csv file>.

For DAWN features

  1. Unlike scRNA-seq data, no additional steps are required.
  2. To generate embedding: python visualizer.py <path to latent features csv file>.

Outputs for the embedding

Two files are created after the visualizer completes.

  1. A CSV file which contains the (X, Y) coordinates for the samples. This file is named similar to the input file but has the suffix: 2d_coord.
  2. A TIF image containing the plot for the embedding. This file is also named similar to the input file but has the suffix: 2d_viz.

Using an embedding to generate EM clustering

Typically, a 2D embedding contains some cell clusters with good separation. This number of clusters in the embedding can be used as numClusters for EM clustering.