- Load the DNA sequence
fishes.fna.gzusing functions from theseqinrpackage and theBiostringspackage. Note the differences between the created variables.
- Next, focus on the
Biostringspackage. Practice working with loaded data:- Check the number of loaded sequences:
length(seq) - Determine the lengths of each sequence:
width(seq[1])
- View the sequence names (FASTA headers):
names(seq) - Assign the first sequence including the name to the variable
seq1:seq1 <- seq[1]
- Assign the first sequence without the name to the variable
seq1_sequence:seq1_sequence <- seq[[1]]
- Assign the first sequence as a vector of characters to the variable
seq1_string:seq1_string <- toString(seq[1])
- Learn more about the
XStringSetclass and theBiostringspackage:help(XStringSet)
- Check the number of loaded sequences:
- Translate and globally align the two selected sequences using the BLOSUM62 matrix, a gap opening cost of -1 and a gap extension cost of 1.
- Practice working with regular expressions:
- Create a list of names, e.g.:
names_list <- c("anna", "jana", "kamil", "norbert", "pavel", "petr", "stanislav", "zuzana")
- Search for name
jana:grep("jana", names_list, perl = TRUE)
- Search for all names containing letter
nat least once:grep("n+", names_list, perl = TRUE)
- Search for all names containing letters
nn:grep("n{2}", names_list, perl = TRUE)
- Search for all names starting with
n:grep("^n", names_list, perl = TRUE)
- Search for names
AnnaorJana:grep("Anna|Jana", names_list, perl = TRUE)
- Search for names starting with
zand ending witha:grep("^z.*a$", names_list, perl = TRUE)
- Create a list of names, e.g.:
- Load an amplicon sequencing run from 454 Junior machine
fishes.fna.gz. - Get a sequence of a sample (avoid conditional statements), that is tagged by forward and reverse MID
ACGAGTGCGT. - How many sequences are there in the sample?
-
Create a function
Demultiplexer()for demultiplexing of sequencing data. -
Input:
- a string with path to fasta file
- a list of forward MIDs
- a list of reverse MIDs
- a list of samples labels
-
Output:
- fasta files that are named after the samples and contain sequences of the sample without MIDs (perform MID trimming)
- table named
report.txtcontaining samples‘ names and the number of sequences each sample has
-
Check the functionality again on the
fishes.fna.gzfile, the list of samples and MIDs can be found in the corresponding tablefishes_MIDs.csv.
Basic Git settings
- Configure the Git editor
git config --global core.editor notepad- Configure your name and email address
git config --global user.name "Zuzana Nova" git config --global user.email [email protected]- Check current settings
git config --global --list
-
Create a fork on your GitHub account. On the GitHub page of this repository find a Fork button in the upper right corner.
-
Clone forked repository from your GitHub page to your computer:
git clone <fork repository address>- In a local repository, set new remote for a project repository:
git remote add upstream https://github.com/mpa-prg/exercise_02.gitCreate a new commit and send new changes to your remote repository.
- Add file to a new commit.
git add <file_name>- Create a new commit, enter commit message, save the file and close it.
git commit- Send a new commit to your GitHub repository.
git push origin main