Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working with RSEM #2

Open
rbadmi opened this issue May 5, 2023 · 0 comments
Open

Working with RSEM #2

rbadmi opened this issue May 5, 2023 · 0 comments
Assignees

Comments

@rbadmi
Copy link
Owner

rbadmi commented May 5, 2023

Here I will describe how I installed RSEM and got it to working. This might be quite obvious for some but I am sure there are many beginners who would appreciate the minor details. This is intended for beginners who are just learning how to analyze RNA-seq data, therefore I have eliminated all the optional arguments to avoid overwhelming the newbies. It is easier to understand the different options once you know the basics.

Step 1: Install RSEM:

  1. In your terminal (assuming Linux Ubuntu) run this command to download RSEM zipped folder. This will save in your pwd (present working directory) wget https://github.com/deweylab/RSEM/archive/refs/tags/v1.3.2.tar.gz
  2. Now you should see a folder named v1.3.2.tar.gz in your pwd
  3. You should now unzip it using this code: tar --zxvf v1.3.3.tar.gz
  4. You will now see the folder RSEM-1.3.3. Go into the folder using cd RSEM-1.3.3 command
  5. Then you run sudo make install command to install. Please note that using just make install (as explained in the RSEM github page: https://github.com/deweylab/RSEM) won't completely install samtools, this happened in my case. This incomplete installation could be fine for rsem-prepare-reference command but not for rsem-calculate-expression. So beware! Alternatively you can try putting the RSEM directory in your environment's PATH variable, but that didn't work for me.

Step 2: Prepare reference files: (transcriptome)

  1. Choose a reference transcriptome file in a fasta format that contains all your sequences. Put this in a folder and navigate to this folder through the terminal to make it a pwd
  2. Now run the command rsem-prepare-reference your_transcriptome_ref_file.fasta outputfilesname \ --bowtie \ --bowtie2. If you find that this gives some error like multiple commands failed etc then please remove \ and try.
  3. Please note that the above command will create multiple files (6 to 12 files depending), all of which are reference files. All these files will have name prefixes with your myreferencefiles. Once you see these files are generated, the reference files are prepared.
  4. You can now use these reference files to map your RNA-seq reads and calculate expression.

Step 3: Calculate expression:

  1. It is easier if you have your RNA-seq read files in the same folder as your reference files from above. Otherwise you just have to tweak your command a little bit to include the address of your RNA-seq read files, which I will show below.
  2. But before that you should have your read files in .fq or .fasta format. It is more likely that the raw data you got will be in the .fq.gz or .fasta.gz format. If this is the case then you have to first unzip your files. Navigate to pwd and run this command gunzip yourRNAseqdata.fq.gz and wait for sometime, you should see a file with yourRNAseqdata.fq, which is now unzipped. If you have two files for the same sample as in the case of paired-end reads you can unzip both at the same time using gunzip yourRNAseqdata1.fq.gz yourRNAseqdata2.fq.gz
  3. Now you should navigate to the folder where your reference files are present and run the below command.
  4. Once you have unzipped your files, you can now continue to calculate expression using the command rsem-calculate-expression --bowtie2 --paired-end /workspace/rawdata/sample1/yourRNAseqdata1.fq /workspace/rawdata/sample1/yourRNAseqdata2.fq myreferencefiles--bowtie /workspace/resultfiles/sample1mapped
  5. If you can manage all the files in one folder (when you have small number of samples) then you can remove the addresses for each files above. If you have large number of samples then it is easier to organize your rawdata, result files etc in different folders. In that case you have to find out the address for each of your samples by navigating into that folder and running the pwd command in the terminal. You should then see the address of that directory/folder which you can then use in the command above.
  6. This command will take several hours to complete depending upon the size of your transcriptome, processing power of your computer etc. It could also continue running until about 20 hours if you have larger sample files and lower processing power, so do not be concerned if you don't see anything happening on the screen. Alternatively you can navigate to the result folder from the desktop (not the terminal) and refresh the folder to see different folders being created and changing sizes at times. This is confirmation that things are happening.
  7. Once the analysis is complete, you will see atleast four different files and a folder created in your result files folder. Among them, the fine ending with .genes.results will contain the read numbers for all your genes in that sample. You can use these numbers to proceed calculating differential expression.
@rbadmi rbadmi converted this from a draft issue May 5, 2023
@rbadmi rbadmi self-assigned this May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

1 participant