Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommandation for eukaryotic species #16

Open
kullrich opened this issue Sep 7, 2022 · 2 comments
Open

Recommandation for eukaryotic species #16

kullrich opened this issue Sep 7, 2022 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@kullrich
Copy link

kullrich commented Sep 7, 2022

Hi,

are there any recommandation for eukaryotic species?

I am currently comparing two highly similar eukaryotic genome sequences, but get no synteny nor any rearrangements at all?

wget http://ftp.ensembl.org/pub/current_fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
wget http://ftp.ensembl.org/pub/current_fasta/pan_troglodytes/dna/Pan_troglodytes.Pan_tro_3.0.dna.toplevel.fa.gz
gunzip Pan_troglodytes.Pan_tro_3.0.dna.toplevel.fa.gz
smashpp -n 32 -m 5000 -f 10000 -fs L -r Homo_sapiens.GRCh38.dna.primary_assembly.fa -t Pan_troglodytes.Pan_tro_3.0.dna.toplevel.fa

The results are empty, however I would expect to see some differences between human and chimp.

====[ PREPARE DATA ]==================================
[+] Homo_sapiens.GRCh38.dna.primary_assembly.fa (FASTA) -> Homo_sapiens.GRCh38.dna.primary_assembly.seq (seq) finished.
[+] Pan_troglodytes.Pan_tro_3.0.dna.toplevel.fa (FASTA) -> Pan_troglodytes.Pan_tro_3.0.dna.toplevel.seq (seq) finished.

====[ REGULAR MODE ]==================================
[+] Creating model of Homo_sapiens.GRCh38.dna.primary_assembly.fa done.
[+] Filtering Pan_troglodytes.Pan_tro_3.0.dna.toplevel.fa done => 0 segments

====[ INVERTED MODE ]=================================
[+] Creating model of Homo_sapiens.GRCh38.dna.primary_assembly.fa done.
[+] Filtering Pan_troglodytes.Pan_tro_3.0.dna.toplevel.fa done => 0 segments

Thank you in anticipation

Best regards

Kristian

@smortezah smortezah added the question Further information is requested label Sep 7, 2022
@pratas
Copy link
Collaborator

pratas commented Sep 16, 2022

Dear Kristian,

First lets understand the characteristics of the data
I've followed your instructions and got this:

-rw-rw-r-- 1 x x 504569856 set 16 10:47 Homo_sapiens.GRCh38.dna.primary_assembly.fa
-rw-rw-r-- 1 x x 3151425857 jun 4 09:50 Homo_sapiens.GRCh38.dna.primary_assembly.fa_bk

It seems that you are using this Homo_sapiens.GRCh38.dna.primary_assembly.fa sequences that contains less than 500 MB (while the Homo_sapiens.GRCh38.dna.primary_assembly.fa_bk seems to have all the info).

Is it supposed? What represents this sequence?

Best regards,
Diogo

@kullrich
Copy link
Author

Hi,
if I do it I get the following, so 800MB for the gz and the unzipped file has 3006MB so in my case the full reference genome is present?

-bash-4.2$ wget http://ftp.ensembl.org/pub/current_fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
--2022-09-17 10:01:48--  http://ftp.ensembl.org/pub/current_fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.193.139
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.139|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 881211416 (840M) [application/x-gzip]
Saving to: ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’

100%[======================================================>] 881,211,416 43.6MB/s   in 20s    

2022-09-17 10:02:11 (41.8 MB/s) - ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’ saved [881211416/881211416]

-bash-4.2$ gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
-bash-4.2$ ls -al --block-size=M Homo_sapiens.GRCh38.dna.primary_assembly.fa
3006M Jun  4 10:50 Homo_sapiens.GRCh38.dna.primary_assembly.fa

Best regards
Kristian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants