Introduction to (R and) R/Bioconductor and Regular Expressions

Introduction to (R and) R/Bioconductor

Task 1

Load the DNA sequence fishes.fna.gz using functions from the seqinr package and the Biostrings package. Note the differences between the created variables.

Task 2

Next, focus on the Biostrings package. Practice working with loaded data:
- Check the number of loaded sequences:
```
length(seq)
```
- Determine the lengths of each sequence:
```
width(seq[1])
```
- View the sequence names (FASTA headers):
```
names(seq)
```
- Assign the first sequence including the name to the variable seq1:
```
seq1 <- seq[1]
```
- Assign the first sequence without the name to the variable seq1_sequence:
```
seq1_sequence <- seq[[1]]
```
- Assign the first sequence as a vector of characters to the variable seq1_string:
```
seq1_string <- toString(seq[1])
```
- Learn more about the XStringSet class and the Biostrings package:
```
help(XStringSet)
```

Task 3

Translate and globally align the two selected sequences using the BLOSUM62 matrix, a gap opening cost of -1 and a gap extension cost of 1.

Regular Expressions

Task 4

Practice working with regular expressions:

Create a list of names, e.g.:

names_list <- c("anna", "jana", "kamil", "norbert", "pavel", "petr", "stanislav", "zuzana")

Search for name jana:
```
grep("jana", names_list, perl = TRUE)
```
Search for all names containing letter n at least once:
```
grep("n+", names_list, perl = TRUE)
```
Search for all names containing letters nn:
```
grep("n{2}", names_list, perl = TRUE)
```
Search for all names starting with n:
```
grep("^n", names_list, perl = TRUE)
```

Search for names Anna or Jana:

grep("Anna|Jana", names_list, perl = TRUE)

Search for names starting with z and ending with a:
```
grep("^z.*a$", names_list, perl = TRUE)
```

Task 5

Load an amplicon sequencing run from 454 Junior machine fishes.fna.gz.
Get a sequence of a sample (avoid conditional statements), that is tagged by forward and reverse MID ACGAGTGCGT.
How many sequences are there in the sample?

Task 6

Create a function Demultiplexer() for demultiplexing of sequencing data.
Input:
- a string with path to fasta file
- a list of forward MIDs
- a list of reverse MIDs
- a list of samples labels
Output:
- fasta files that are named after the samples and contain sequences of the sample without MIDs (perform MID trimming)
- table named report.txt containing samples‘ names and the number of sequences each sample has
Check the functionality again on the fishes.fna.gz file, the list of samples and MIDs can be found in the corresponding table fishes_MIDs.csv.

Download files from GitHub

Basic Git settings

Configure the Git editor
git config --global core.editor notepad
Configure your name and email address
git config --global user.name "Zuzana Nova"
git config --global user.email [email protected]
Check current settings
git config --global --list

Create a fork on your GitHub account. On the GitHub page of this repository find a Fork button in the upper right corner.
Clone forked repository from your GitHub page to your computer:

git clone <fork repository address>

In a local repository, set new remote for a project repository:

git remote add upstream https://github.com/mpa-prg/exercise_02.git

Send files to GitHub

Create a new commit and send new changes to your remote repository.

Add file to a new commit.

git add <file_name>

Create a new commit, enter commit message, save the file and close it.

git commit

Send a new commit to your GitHub repository.

git push origin main

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
fishes.fna.gz		fishes.fna.gz
fishes_MIDs.csv		fishes_MIDs.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Introduction to (R and) R/Bioconductor and Regular Expressions

Introduction to (R and) R/Bioconductor

Task 1

Task 2

Task 3

Regular Expressions

Task 4

Task 5

Task 6

Download files from GitHub

Send files to GitHub

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

mpa-prg/exercise_02

Folders and files

Latest commit

History

Repository files navigation

Introduction to (R and) R/Bioconductor and Regular Expressions

Introduction to (R and) R/Bioconductor

Task 1

Task 2

Task 3

Regular Expressions

Task 4

Task 5

Task 6

Download files from GitHub

Send files to GitHub

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages