ctwpy

Generate and Upload Cell Type Worksheets to the UCSC Cell Atlas from a scanpy anndata object, or from tsv files. If you want to generate a worksheet from a seurat object, use ctwseurat instead.

What is a Cell Type Worksheet?

A Cell Type Worksheet is an application designed to ease the burden of manual cell type annotation from single cell mRNA sequencing experiments. It lets you explore the specificity of markers across clusters and label the clusters with a cell type annotation.

The web application provides three interactive components for this goal:

An editable dot plot visualizing marker specificity and cell type annotation across all clusters.
A scatter plot visualizing gene expression across all cells.
A table of gene metric rankings per cluster.

Here's a rough visual of the layout of the application, the gene metrics are explored via the table at the bottom.

This python package manipulates a scanpy object or tsv files into the ctw format and provides an avenue for uploading a worksheet to the UCSC Cell Atlas.

Install

Requirements: python3.4+git, pip, and virtualenv

If you haven't done so already, head over to the Cell Atlas registry and make an account, remember to answer the confirmation email. You'll be using your email and password to upload data to the server.

Clone the repository and make a virtual environment.

git clone https://github.com/Stuartlab-UCSC/ctwpy.git
cd ctwpy
# Create a python3 virtual environment to install the package in.
virtualenv -p $(which python3) env

Once inside your virtual environment use pip3 to install dependencies.

# Enter virtual environment
source env/bin/activate
# Use pip to install the ctwingest dependency
pip3 install --editable git+git://github.com/Stuartlab-UCSC/ctwingest.git#egg=ctwingest
# Use pip to install this package
pip3 install --editable .

Now you'll be able to access the applications command line interface. The command line interface is available anytime you enter the environment.

Command Line Interface

# Enter virtual environment
source env/bin/activate

# Check out the help documentation:
ctw-from-scanpy --help

ctw-from-tsv --help

ctw-upload --help

# Create a Cell Type Worksheet formatted file from a scanpy object.
ctw-from-scanpy worksheet-name dataset-filename.h5ad

# Or create a Cell Type Worksheet formatted file from tsv files.
ctw-from-tsv worksheet-name myTsvFileDir

# Send the created Cell Type Worksheet to the UCSC Cell Atlas.
ctw-upload worksheet-name.ctw.tgz credentials.json

Upload Data

To upload a worksheet to the server, you'll notice the credentials.json file is necessary. Use our example for a starting place.

Prepare TSV Files

If you want to create a worksheet from tsv files rather than a scanpy object, those file formats are described here.

The Cell Type Worksheet ingest tsv files consist of a minimum of 3 tab delimited files, and two optional files:

Expression Matrix

gene	AAACCTGCAAACTGTC	AAACCTGCAAGGGTCA	AAACCTGCAAGTAATG	...
TP53	0	0	0	...
ALKBH6	1	0	1	...
MYLH1	2	1	3	...
TMNT2	0	4.5	0	...
TTN	3.4	0	2	...

 + File name is "exp.tsv"
 + Gene names are rows, Cell IDs are columns
 + Can be filtered down to genes of interest

Cell to Cluster Assignment

cellids	cluster
AAACCTGCAAACTGTC	1
AAACCTGCAAGGGTCA	1
AAACCTGCAAGTAATG	2
AAACCTGCACATAACC	3
AAACCTGCAGACGCCT	3

 + File name is "clustering.tsv"
 + First column contains cell IDs
 + Second column contains cluster assignment

XY Coordinates

cellids	x	y
AAACCTGCAAACTGTC	1.1	0.4
AAACCTGCAAGGGTCA	1.5	0.8
AAACCTGCAAGTAATG	2.2	3.2
AAACCTGCACATAACC	3.3	4.5
AAACCTGCAGACGCCT	3.4	4.7

 + File name is "xys.tsv"
 + First column contains cell IDs
 + Second Column contains x coordinates
 + Third Column contains y coordinates

Gene Metrics Per Cluster (optional)

gene	t-statistic	pct.exp	avg.exp.scaled	...	cluster
TP53	3.4	46	2.2	...	1
ALKBH6	-0.86	0	-0.1	...	1
TP53	-0.1	15.2	-0.01	...	2
ALKBH6	1.2	35	0.95	...	2
TP53	3.8	88.2	2.5	...	3
ALKBH6	3.4	100	2.5	...	3

 + File name is "markers.tsv"
 + First column contains genes
 + Last column contains cluster IDs
 + At least 2 columns in-between "gene" and "cluster", e.g. "avg.exp" and "pct.exp"
 + If this file is omitted, gene metrics will be calculated from your data

Cluster cell counts and cell types (optional)

cluster	cell_count	cell_type
1	5313	T-cell
2	2562
...

 + File name is "clusters.tsv"
 + First column contains cluster IDs
 + Second column contains the cell counts
 + Last column contains cell types
 + cell_type values are optional
 + If this file is omitted, cell counts will be summed for you and clusters will have no cell_types

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
ctwpy		ctwpy
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cell_atlas_layout.png		cell_atlas_layout.png
credentials.json		credentials.json
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ctwpy

Table of Contents

What is a Cell Type Worksheet?

Install

Command Line Interface

Upload Data

Prepare TSV Files

About

Releases

Packages

Contributors 3

Languages

License

Stuartlab-UCSC/ctwpy

Folders and files

Latest commit

History

Repository files navigation

ctwpy

Table of Contents

What is a Cell Type Worksheet?

Install

Command Line Interface

Upload Data

Prepare TSV Files

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages