-
Notifications
You must be signed in to change notification settings - Fork 11
Embeddings visualization
In this documentation, all panels would be marked with Bold Italic, any components (buttons, sliders, menus) in the tool would be marked with code format
. Similar to before, the file names are also marked with code format
.
Let's first get familiar with the tools on the Embedding visualization panel:
-
Pan
: when activated, the mouse drag operation would be defined as moving the coordinates. -
Lasso select
: when activated, the mouse drag operation would be defined as the lasso selection of dots. -
Box select
: when activated, the mouse drag operation would be defined as box selection (rectangular shape) of dots. -
Zoom
: when activated, the mouse scroll wheel can be used to zoom in and zoom out of the scatter plots. -
Click select
: when activated, the user can click on the dots to select the dots -
Reset
: when clicked, the scatter plot would be restored to the default scale and range. -
Save figure
: when clicked, the user can save the scatterplot. -
Display info when hover
: when activated, when the user hovers the mouse over a dot, detailed information of the corresponding cell would show.
By default, Higashi-vis uses the first two principal components (referred to as PC1 and PC2) to visualize the embeddings. However, in some cases, PC1 largely corresponds to read depths, batch effects, or even outliers. Thus, one can use the x-axis/y-axis
dropdown menu to choose the definition of the x/y axis to be the third principal component.
Note: when using visualization methods such as TSNE/UMAP, if both x and y-axis is defined as 1 or 2, the embeddings will be projected to a 2-dimensional space. If any of x or y-axis is defined as 3, the embeddings will be projected to a 3-dimensional space, and the selected dimensions will be visualized in the scatter plot.
For scHi-C datasets with a large number of cells, PCA might not be the appropriate way to visualize the embeddings. In Higashi-vis, one can choose to use TSNE, UMAP, and a lot of visualization methods that are commonly used for visualizing single-cell datasets. Use the dropout down menu Vis method
to choose the appropriate visualization method.
Use the color scheme
dropdown menu to decide how to color the scatter plot. Higashi-vis automatically assumes information stored at the label_info.pickle
file which is part of the input dataset as potential coloring schemes. Higashi-vis supports both discrete and continuous color schemes. When selecting a specific color scheme, not only will the scatter plot be colored correspondingly, the Statistics visualization panel would also visualize a barplot of the distribution of the labels. When the color scheme is continuous, the Statistics visualization panel would visualize a histogram of the distribution instead.
In Higashi-vis, besides the label information provided in the label_info.pickle
, it also provides three extra coloring schemes.
- kde: log pdf value from kernel density estimation
- kde_ratio: log difference of pdf from kernel density estimation with different kernel bandwidth (which can be regarded as local density)
- read_count: log10 of the read counts of the selected chromosome
Use the scatter size
slider to choose the size of the dots in the scatter plot
When your mouse hover over elements such as the dots in the scatter plot or the bar in the Statistics visualization panel, more detailed information such as the cell index, how many cells are there for a specific group would be displayed.
Update after 2021-04-01: If cell_name_higashi
is provided in the label_info.pickle
, the cell name information would also be displayed here when the mouse hover over a dot in the scatter plot.
The scatter plot in Higashi-vis supports drag and zoom-in/zoom-out. Click the zoom
button to activate zoom-in/out with the mouse scroll wheel. Click the reset
button for the default scale of the scatter plot.
Note: The reset
button in the scatter plot and the Reload
green button in the Control panel does different things. The former one only changes the scale of the scatter plot without reloading the embeddings from the disk while the latter one would reload the embeddings first. The Reload
green button is helpful when the user is trying to visualize the embeddings in the middle of the training process to inspect if the model has converged.
Higashi ~ ~ Wiki
- Input files
- Usage (API)
- [Fast-Higashi initialized Higashi (Under construction)]
- Runtime of Fast-Higashi