Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read.bismark doesn't finish with new R. #142

Open
desmodus1984 opened this issue Sep 23, 2024 · 1 comment
Open

read.bismark doesn't finish with new R. #142

desmodus1984 opened this issue Sep 23, 2024 · 1 comment

Comments

@desmodus1984
Copy link

Hello,

I use RStudio, and I told that the best is to have the latest up-to-date version of the software, so I updated and installed the latest version of RStudio 2024 B 735 with R 4.4.1.
I am trying to reanalyze a dataset of 44 samples with dmrseq which uses read.bismark to create the dataset. I then tried reading some samples for making a set with files created with methyldackel; they loaded relatively fast in previous version, but now, it's been almost 5 hours (2 days now), and the log is frozen in the "Parsing files and constructing 'M' and 'Cov' matrices ..." step.

[read.bismark] Parsing files and constructing valid loci ...
Done in 62.1 secs
[read.bismark] Parsing files and constructing 'M' and 'Cov' matrices ...

I am working in a workstation with huge memory, 400GB. Since I was reading a small dataset 9 files, I realized that that is not my full dataset- thus, I I tried loading the full 44 samples.
Almost 5 hours (2 days now) and read.bismark step has not finished.
With the aim of increasing speed I set the parameter nThread to 8L, and 10L, for 9 samples and full 44 samples dataset, respectively.

Do you have any idea why this step is taking so much time to finish now with the new R?
I have to tell that with the new R I had to install dmrseq again- which installed all the required software again, so I don't know what might be slowing it down so much.

Thank;

Juan Pablo

@PeteHaitch
Copy link
Contributor

I can't think of anything off-hand that has changed in the last 5-7 years with read.bismark().

How many methylation loci are in these files?

In general, I suggest sticking with nThread = 1, particularly if you are using other parallelisation via the BPPARAM parameter.
I would test how long it takes to read 1 file, then 2, then 4, etc. to narrow down when it's breaking down on your system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants