POLII_pausing

This git repository contains the code accompanying the manuscript

Predictive model of transcriptional elongation control identifies trans regulatory factors from chromatin signatures

Toray S. Akcan, Matthias Heinig

Script Overview

Transcriptional_Pausing.R: Main script to perform analyses; Uses all other R-script files.

data_preprocessing.R: Functions for preprocessing (parsing) of all data sets for usage.

data_processing.R: Functions for processing of all data sets for downstream analyses.

data_analyses.R: Actual analyses of processed datasets.

helper_functions.R: Generic helper functions.

Dependencies

R-version: 4.0.3

Bioconductor-Version: 3.12 (BiocManager 1.30.15), R 4.0.3 (2020-10-10)

CRAN-Packages: readr, fastcluster, pvclust, scales, reticulate", parallel, foreach, doParallel, LSD, plyr, feather, msigdbr, log4r, plyr, ggplot2, optparse, tools, DBI, VennDiagram, bedr, Rcpp, rlang, tidyr, stringi, rlang, magrittr, tidyverse, viridis, dplyr, magrittr, circlize, dynamicTreeCut, h2o, reshape2, protr, gridExtra, caret, log4r, optparse, fitdistrplus, ROCR, reticulate, xgboost, data.table, here, jsonlite, mclust, igraph, quantreg, stringr, SHAPforxgboost, anchors, seqinr, ape, ggpubr, ggforce, reconPlots, cowplot, ggsci, gridGraphics, WriteXLS

Bioconductor-Packages: msa, GenomicFeatures,CAGEr, GenomicRanges, biomaRt, Biostrings, topGO, goSTAG, tracktables, GenomicAlignments, Rsamtools, BSgenome.Hsapiens.UCSC.hg19, groHMM, HiTC, rtracklayer, RCAS, TFutils, universalmotif, org.Hs.eg.db, GOSim

External-Packages: reconPlots

All neccessary packages will be installed automatically upon sourcing init.R, however system-level dependencies required by specific packages might exist.

Data Availability

The data folder structure has been replicated in the data folder and each subfolder contains a README.txt with detailed steps to obtain relevant data sets pertaining to each subfolder. This repository is accompanied by a Zenodo repository that hosts relevant data sets available at: Please refer to resources/folder-structure.txt for an overview of the data folder structure.

All necessary data sets with associated accession numbers are also listed in individual xlsx-sheets in file data-accessions.xlsx located in the resources folder.

Computational Requirements

The following computational resources are recommended to execute the whole script

360 GB RAM
16 Cores

Script Execution

Clone this repository with git clone https://github.com/heiniglab/POLII_pausing.git
Specify working directory in Transcriptional_Pausing.R (line 2); Specify number of available cores for low, average and high load computations (line 9); Defaults to 6, 12, 18 cores respectively
Obtain all relevant data, see section Data Availability
Run/Source Transcriptional_Pausing.R
All plots will be available under src/plots and resulting R-data structures under the results folder

Key Data Structures

File /results/Predictions/model_data/feature.vectors.RDS contains feature matrices for each cell line
File /results/Predictions/model_data/model.matrices.RDS contains matrices with features (and feature sub-spaces) and targets for each cell line to train predictive models
File /results/Predictions/model_evaluation/model.training.results.RDS contains all model results (incl. model performance table) obtained by training XGB tree models
File /results/chipseq.peaks.RDS contains a genomic ranges object of protein coding transcripts bound by specific proteins on the DNA
File /results/eclipseq.peaks.RDS contains a genomic ranges object of protein coding transcripts bound by specific proteins on the RNA
File /results/rn7sk.binders.RDS contains a genomic ranges object of 7SK transcript variants bound by proteins for each cell line identified from eCLIP-seq data
File /results/traveling_ratio.RDS contains a genomic ranges object with pausing indices/traveling ratios for protein coding transcripts of each cell line

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

POLII_pausing

Table of Contents

Script Overview

Dependencies

Data Availability

Computational Requirements

Script Execution

Key Data Structures

R Session Info

About

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
data		data
resources		resources
results		results
src		src
README.md		README.md

heiniglab/POLII_pausing

Folders and files

Latest commit

History

Repository files navigation

POLII_pausing

Table of Contents

Script Overview

Dependencies

Data Availability

Computational Requirements

Script Execution

Key Data Structures

R Session Info

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages