Cross-correlation Analysis: Using phantompeakqualtools for strand cross-correlation (NSC/RSC metrics) Peak Statistics: Using bedtools to analyze peak width distributions, distances between peaks, and genomic feature overlaps Coverage Comparisons: Using deepTools multiBigwigSummary and plotCorrelation to compare signal profiles Enrichment Analysis: Using GREAT or similar tools for genomic region enrichment Motif Analysis: Using MEME-ChIP or HOMER for motif discovery in peaks Peak Conservation: Analyzing conservation scores within peaks using phyloP/phastCons
Code used for laboratory analysis Most important scripts are found in core_scripts/ directory.
Because I perform the next-generation sequencing analysis using my institution's cluster, I have to use the version of the tools that are installed there for the most part. For this reason, I use R 4.2.0 to perform the analyses.
The analysis are done locally or in a linux computing cluster. The linux cluster is the condition that dictates what dependencies are used, especially for the next generation sequencing analysis.
- R 4.2.0
- Command line utils
- bowtie2/2.3.5.1
- fastp/0.20.0
- fastqc/0.11.5
- deeptools/3.0.1
- gatk
- python/2.7
- miniforge
- macs2
- picard
- java
git clone https://github.com/luised94/lab_utils.git
Most scripts can be used by running the script from the command line.
./script.sh <args>
Rscript script.R <args>
Most scripts output some sort of log file (stdout and stderr) that can be inspected with a text editor. The log files can usually be verified with vim ~/data/
/logs/9004526_1.out.I have a set of tags that I try to use to put marks on code for future reference. The form of the tags is . recursive (-r) grep can be used to find the tags.
TODO: Tasks that I have to complete for that particular code file. HOWTO: Designates different code snippets for reference when I want to see how to do a particular thing. FIXME: Highlight areas that need fixing. NOTE: Add important notes or explanations. BUG: Mark known bugs or issues. OPTIMIZE: Indicate areas that could be optimized for better performance. REFACTOR: for code that needs refactoring TEST: for testing purposes
Each documentation section has a troubleshooting section that lets the user know about common errors that could be encountered, such as the scripts depending on the name of the files.
Notes I take while developing the scripts.