-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update HUNePi models #539
Comments
I increased the number of models to include all combinations of four summary statistics but the structure of the RDS object seems to be the same: # Current version
> mods <- readRDS("infections_increasing_model_comparisons.rds")
> length(mods)
[1] 8
> sapply(mods, function(x) class(x))
H U Ne pi HUNe HU HUNePi HUPi
[1,] "glm" "glm" "glm" "glm" "glm" "glm" "glm" "glm"
[2,] "lm" "lm" "lm" "lm" "lm" "lm" "lm" "lm"
# Revised version
> length(increasing_mods)
[1] 15
> sapply(increasing_mods, function(x) class(x))
H U Ne pi HU HNe HPi UNe UPi NePi HUNe HUPi HNePi
[1,] "glm" "glm" "glm" "glm" "glm" "glm" "glm" "glm" "glm" "glm" "glm" "glm" "glm"
[2,] "lm" "lm" "lm" "lm" "lm" "lm" "lm" "lm" "lm" "lm" "lm" "lm" "lm"
UNePi HUNePi
[1,] "glm" "glm"
[2,] "lm" "lm" |
@GopiGugan reports that |
|
Got trees and label CSV files from @GopiGugan, attempted to run but got a (venv) art@aziraphale:~/git/HUNePI$ python scripts/simulation/get_summary_stats.py data/BA.5.2.6.nwk data/labels.BA.5.2.6.csv
Traceback (most recent call last):
File "/home/art/git/HUNePI/scripts/simulation/get_summary_stats.py", line 194, in <module>
label_dict = parse_labels(args.labels)
File "/home/art/git/HUNePI/scripts/simulation/get_summary_stats.py", line 74, in parse_labels
if row['index'] not in results:
KeyError: 'index'
(venv) art@aziraphale:~/git/HUNePI$ head data/labels.BA.5.2.6.csv
0,['hCoV-19/Ireland/D-BH-070014231/2023|Europe / Ireland / Dublin|EPI_ISL_18377214|2023-10-06']
1,['hCoV-19/Denmark/DCGC-660914/2023|Europe / Denmark|EPI_ISL_18360507|2023-09-18']
2,['hCoV-19/Canada/SK-RRPL-641967/2023|North America / Canada / Saskatchewan|EPI_ISL_18235362|2023-08-21']
3,['hCoV-19/Ireland/D-CS0380/2023|Europe / Ireland / Dublin|EPI_ISL_18146885|2023-08-04']
4,['hCoV-19/USA/NY-Wadsworth-23046175-01/2023|North America / USA / New York / Albany County|EPI_ISL_18134984|2023-08-03']
5,['hCoV-19/Canada/SK-RRPL-639662/2023|North America / Canada / Saskatchewan|EPI_ISL_18111086|2023-07-28']
6,['hCoV-19/Italy/CAM-TIGEM-IZSM-COLLI-48166/2023|Europe / Italy / Campania|EPI_ISL_18281259|2023-07-20']
7,['hCoV-19/Italy/CAM-TIGEM-IZSM-COLLI-48202/2023|Europe / Italy / Campania|EPI_ISL_18281287|2023-07-17']
8,['hCoV-19/Italy/CAM-TIGEM-IZSM-COLLI-48203/2023|Europe / Italy / Campania|EPI_ISL_18281288|2023-07-07']
9,['hCoV-19/Ireland/D-SJH-CS0365/2023|Europe / Ireland / Dublin|EPI_ISL_18037744|2023-07-04'] Looks like CSV files are not in the correct format, doing a quick patch now: Python 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> handle = open("data/labels.BA.5.2.6.csv")
>>> outfile = open("data/fixed.BA.5.2.6.csv", 'w')
>>> outfile.write("name,index\n")
11
>>> for line in handle:
...
KeyboardInterrupt
>>> import csv
>>> rows = csv.reader(handle)
>>> for row in rows:
... idx, names = row
... names = eval(names)
... for nm in names:
... _ = outfile.write(f"{nm},{idx}\n")
...
>>> outfile.close()
>>> |
Working through one of the test files, this part of > for (tip_place_in_table in 1:nrow(add_tip_count)){
+ tip_name = add_tip_count[tip_place_in_table,1]
+ freq = add_tip_count[tip_place_in_table,2]
+ if(freq != 0){
+ for (counter in 1:freq){
+ tree <- phytools::bind.tip(tree, paste0(tip_name,"_", counter), edge.length = 0,
+ where=which(tree$tip.label == tip_name))
+ }
+ }
+ } |
I wrote a Python script that is much faster than the above R code at adding tips to the tree based on the labels CSV file. I tried running the resulting tree through > require(LambdaSkyline)
Loading required package: LambdaSkyline
> require(ape)
Loading required package: ape
> phy <- read.tree("test.nwk")
> phy
Phylogenetic tree with 7573 tips and 5912 internal nodes.
Tip labels:
4273, 3941, 4341_0, 4341_1, 4949, 4343, ...
Rooted; includes branch lengths.
> alpha = betacoal.maxlik(phy)
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In lbeta(x - alpha, n - x + alpha) : NaNs produced
2: In lbeta(2 - alpha, alpha) : NaNs produced
> alpha
p1 value fevals gevals niter convcode kkt1 kkt2 xtime
bobyqa 1.86258 29798.75 26 NA NA 0 TRUE NA 154.697
> skyline <- (skyline.multi.phylo(phy, alpha$pi))
Error in lbeta(2 - alpha, alpha) :
non-numeric argument to mathematical function |
Tried this with another tree of similar size and got a different error message: > require(LambdaSkyline)
> phy <- read.tree("~/git/HUNePI/data/EG.5.1.expand.nwk")
> phy
Phylogenetic tree with 6376 tips and 5642 internal nodes.
Tip labels:
2152, 4298_0, 4298_1, 3605, 4771, 4371, ...
Rooted; includes branch lengths.
> alpha <- betacoal.maxlik(phy)
Error in optimx.check(par, optcfg$ufn, optcfg$ugr, optcfg$uhess, lower, :
Function returned is infinite or NA (non-computable)
In addition: Warning message:
In lbeta(Involved[i] - alpha, Inter$lineages[i] - Involved[i] + :
NaNs produced |
I wrote a Python script to randomly downsample the tree to 100 tips, and I still got warnings from LambdaSkyline: > phy <- read.tree("~/git/HUNePI/data/EG.5.1.n100.nwk")
> alpha <- betacoal.maxlik(phy)
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In lbeta(x - alpha, n - x + alpha) : NaNs produced
2: In lbeta(2 - alpha, alpha) : NaNs produced |
Even a random coalescent tree fails: > phy <- rcoal(100)
> alpha <- betacoal.maxlik(phy)
There were 50 or more warnings (use warnings() to see the first 50) |
Talked to @pmagbor. Using |
Now when I launch renv 1.0.11 was loaded from project library, but this project is configured to use renv 1.0.7.
- Use `renv::record("[email protected]")` to record renv 1.0.11 in the lockfile.
- Use `renv::restore(packages = "renv")` to install renv 1.0.7 into the project library.
ℹ Using R 4.3.2 (lockfile was generated with R 4.2.2)
- Project '~/git/HUNePI' loaded. [renv 1.0.11]
- One or more packages recorded in the lockfile are not installed.
- Use `renv::status()` for more details. |
> renv::restore(packages='renv')
The following package(s) will be updated:
# CRAN -----------------------------------------------------------------------
- renv [1.0.11 -> 1.0.7]
Do you want to proceed? [Y/n]: y
# Downloading packages -------------------------------------------------------
- Downloading renv from CRAN ... ERROR [error code 22]
- Downloading renv from CRAN ... ERROR [error code 22]
- Downloading renv from CRAN ... ERROR [error code 22]
- Downloading renv from CRAN ... ERROR [error code 22]
- Downloading renv from CRAN ... ERROR [error code 22]
- Downloading renv from CRAN ... OK [file is up to date]
Successfully downloaded 1 package in 4.1 seconds.
# Installing packages --------------------------------------------------------
- Installing renv ... OK [built from source and cached in 8.2s]
The following loaded package(s) have been updated:
- renv
Restart your R session to use the new versions. |
ℹ Using R 4.3.2 (lockfile was generated with R 4.2.2)
- Project '~/git/HUNePI' loaded. [renv 1.0.7]
- One or more packages recorded in the lockfile are not installed.
- Use `renv::status()` for more details.
> renv::status()
...
The following package(s) are out of sync [lockfile != library]:
# CRAN -----------------------------------------------------------------------
- BiocManager [1.30.23 != 1.30.25]
- boot [1.3-28 != 1.3-28.1]
- class [7.3-20 != 7.3-22]
- codetools [0.2-18 != 0.2-19]
- foreign [0.8-83 != 0.8-85]
- KernSmooth [2.23-20 != 2.23-22]
- lattice [0.20-45 != 0.21-9]
- MASS [7.3-59 != 7.3-60]
- Matrix [1.5-1 != 1.6-1.1]
- mgcv [1.8-41 != 1.9-0]
- nlme [3.1-160 != 3.1-163]
- nnet [7.3-18 != 7.3-19]
- rpart [4.1.19 != 4.1.21]
- spatial [7.3-15 != 7.3-17]
- survival [3.4-0 != 3.5-7]
The lockfile was generated with R 4.2.2, but you're using R 4.3.2.
|
I'll do a local install of 4.2.2 binaries and see if I can restore the environment completely. |
I compiled R-4.2.2 from source into a local directory and ran it: # Bootstrapping renv 1.0.7 ---------------------------------------------------
- Downloading renv ... OK
- Installing renv ... OK
- Project '~/git/HUNePI' loaded. [renv 1.0.7]
- One or more packages recorded in the lockfile are not installed.
- Use `renv::status()` for more details.
> renv::status()
A large number of files (9799 in total) have been discovered.
It may take renv a long time to crawl these files for dependencies.
Consider using .renvignore to ignore irrelevant files.
See `?renv::dependencies` for more information.
Set `options(renv.config.dependencies.limit = Inf)` to disable this warning.
The following package(s) are in an inconsistent state:
package installed recorded used
abind n y ?
ade4 n y ?
...
The following package(s) are out of sync [lockfile != library]:
# CRAN -----------------------------------------------------------------------
- BiocManager [1.30.23 != 1.30.25]
- MASS [7.3-59 != 7.3-58.1]
See ?renv::status() for advice on resolving these issues. |
> renv::restore()
The following package(s) will be updated:
# Bioconductor ---------------------------------------------------------------
- BiocVersion [* -> 3.16.0]
# CRAN -----------------------------------------------------------------------
- BiocManager [1.30.25 -> 1.30.23]
- MASS [7.3-58.1 -> 7.3-59]
- abind [* -> 1.4-5]
- ade4 [* -> 1.7-22]
- alphavantager [* -> 0.1.3]
- zip [* -> 2.3.1]
- zoo [* -> 1.8-12]
# GitHub ---------------------------------------------------------------------
- LambdaSkyline [* -> phoscheit/LambdaSkyline@HEAD]
Do you want to proceed? [Y/n]: Y
# Downloading packages -------------------------------------------------------
- Downloading MASS from CRAN ... OK [file is up to date]
- Downloading BiocManager from CRAN ... OK [file is up to date]
- Downloading BH from CRAN ... OK [13.4 Mb in 1.3s]
- Downloading BiocVersion from Bioconductor ... OK [file is up to date]
- Downloading DBI from CRAN ... OK [file is up to date]
- Downloading LambdaSkyline from GitHub ... - GitHub authentication credentials are not available.
- Please set GITHUB_PAT, or ensure the 'gitcreds' package is installed.
- See https://usethis.r-lib.org/articles/git-credentials.html for more details.
OK [7.5 Kb in 0.21s]
- Downloading ape from CRAN ... OK [1.5 Mb in 0.25s]
...
...
|
The The following loaded package(s) have been updated:
- BiocManager
Restart your R session to use the new versions.
>
Save workspace image? [y/n/c]: n
(venv) art@aziraphale:~/git/HUNePI$ R
R version 4.3.2 (2023-10-31) -- "Eye Holes"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
ℹ Using R 4.3.2 (lockfile was generated with R 4.2.2)
- Project '~/git/HUNePI' loaded. [renv 1.0.7]
- One or more packages recorded in the lockfile are not installed.
- Use `renv::status()` for more details.
> As a result, I had to rerun - Installing class ... FAILED
Error: Error installing package 'class':
=================================
* installing *source* package ‘class’ ...
** package ‘class’ successfully unpacked and MD5 sums checked
** using staged installation
** libs
using C compiler: ‘gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0’
gcc -I"/opt/R/4.3.2/lib/R/include" -DNDEBUG -I/usr/local/include -fpic -g -O2 -c class.c -o class.o
class.c:31:9: error: unknown type name ‘Sint’; did you mean ‘uint’?
31 | VR_knn1(Sint *pntr, Sint *pnte, Sint *p, double *train, Sint *class,
| ^~~~
| uint
class.c:31:21: error: unknown type name ‘Sint’; did you mean ‘uint’?
31 | VR_knn1(Sint *pntr, Sint *pnte, Sint *p, double *train, Sint *class,
| ^~~~
| uint
class.c:31:33: error: unknown type name ‘Sint’; did you mean ‘uint’?
31 | VR_knn1(Sint *pntr, Sint *pnte, Sint *p, double *train, Sint *class,
| ^~~~
| uint
class.c:31:57: error: unknown type name ‘Sint’; did you mean ‘uint’?
31 | VR_knn1(Sint *pntr, Sint *pnte, Sint *p, double *train, Sint *class,
| ^~~~
| uint
class.c:32:23: error: unknown type name ‘Sint’; did you mean ‘uint’?
32 | double *test, Sint *res, Sint *votes, Sint *nc, double *dists)
| ^~~~
| uint
class.c:32:34: error: unknown type name ‘Sint’; did you mean ‘uint’?
32 | double *test, Sint *res, Sint *votes, Sint *nc, double *dists)
| ^~~~
| uint
class.c:32:47: error: unknown type name ‘Sint’; did you mean ‘uint’?
32 | double *test, Sint *res, Sint *votes, Sint *nc, double *dists)
| ^~~~
| uint
class.c:95:8: error: unknown type name ‘Sint’; did you mean ‘uint’?
95 | VR_knn(Sint *kin, Sint *lin, Sint *pntr, Sint *pnte, Sint *p,
| ^~~~
| uint
class.c:95:19: error: unknown type name ‘Sint’; did you mean ‘uint’?
95 | VR_knn(Sint *kin, Sint *lin, Sint *pntr, Sint *pnte, Sint *p,
| ^~~~
| uint
class.c:95:30: error: unknown type name ‘Sint’; did you mean ‘uint’?
95 | VR_knn(Sint *kin, Sint *lin, Sint *pntr, Sint *pnte, Sint *p,
| ^~~~
| uint
class.c:95:42: error: unknown type name ‘Sint’; did you mean ‘uint’?
95 | VR_knn(Sint *kin, Sint *lin, Sint *pntr, Sint *pnte, Sint *p,
| ^~~~
| uint
class.c:95:54: error: unknown type name ‘Sint’; did you mean ‘uint’?
95 | VR_knn(Sint *kin, Sint *lin, Sint *pntr, Sint *pnte, Sint *p,
| ^~~~
| uint
class.c:96:23: error: unknown type name ‘Sint’; did you mean ‘uint’?
96 | double *train, Sint *class, double *test, Sint *res, double *pr,
| ^~~~
| uint
class.c:96:50: error: unknown type name ‘Sint’; did you mean ‘uint’?
96 | double *train, Sint *class, double *test, Sint *res, double *pr,
| ^~~~
| uint
class.c:97:8: error: unknown type name ‘Sint’; did you mean ‘uint’?
97 | Sint *votes, Sint *nc, Sint *cv, Sint *use_all)
| ^~~~
| uint
class.c:97:21: error: unknown type name ‘Sint’; did you mean ‘uint’?
97 | Sint *votes, Sint *nc, Sint *cv, Sint *use_all)
| ^~~~
| uint
class.c:97:31: error: unknown type name ‘Sint’; did you mean ‘uint’?
97 | Sint *votes, Sint *nc, Sint *cv, Sint *use_all)
| ^~~~
| uint
class.c:97:41: error: unknown type name ‘Sint’; did you mean ‘uint’?
97 | Sint *votes, Sint *nc, Sint *cv, Sint *use_all)
| ^~~~
| uint
class.c:210:24: error: unknown type name ‘Sint’; did you mean ‘uint’?
210 | VR_olvq(double *alpha, Sint *pn, Sint *p, double *x, Sint *cl,
| ^~~~
| uint
class.c:210:34: error: unknown type name ‘Sint’; did you mean ‘uint’?
210 | VR_olvq(double *alpha, Sint *pn, Sint *p, double *x, Sint *cl,
| ^~~~
| uint
class.c:210:54: error: unknown type name ‘Sint’; did you mean ‘uint’?
210 | VR_olvq(double *alpha, Sint *pn, Sint *p, double *x, Sint *cl,
| ^~~~
| uint
class.c:211:9: error: unknown type name ‘Sint’; did you mean ‘uint’?
211 | Sint *pncodes, double *xc, Sint *clc, Sint *niter,
| ^~~~
| uint
class.c:211:36: error: unknown type name ‘Sint’; did you mean ‘uint’?
211 | Sint *pncodes, double *xc, Sint *clc, Sint *niter,
| ^~~~
| uint
class.c:211:47: error: unknown type name ‘Sint’; did you mean ‘uint’?
211 | Sint *pncodes, double *xc, Sint *clc, Sint *niter,
| ^~~~
| uint
class.c:212:9: error: unknown type name ‘Sint’; did you mean ‘uint’?
212 | Sint *iters)
| ^~~~
| uint
class.c:245:24: error: unknown type name ‘Sint’; did you mean ‘uint’?
245 | VR_lvq1(double *alpha, Sint *pn, Sint *p, double *x, Sint *cl,
| ^~~~
| uint
class.c:245:34: error: unknown type name ‘Sint’; did you mean ‘uint’?
245 | VR_lvq1(double *alpha, Sint *pn, Sint *p, double *x, Sint *cl,
| ^~~~
| uint
class.c:245:54: error: unknown type name ‘Sint’; did you mean ‘uint’?
245 | VR_lvq1(double *alpha, Sint *pn, Sint *p, double *x, Sint *cl,
| ^~~~
| uint
class.c:246:9: error: unknown type name ‘Sint’; did you mean ‘uint’?
246 | Sint *pncodes, double *xc, Sint *clc, Sint *niter,
| ^~~~
| uint
class.c:246:36: error: unknown type name ‘Sint’; did you mean ‘uint’?
246 | Sint *pncodes, double *xc, Sint *clc, Sint *niter,
| ^~~~
| uint
class.c:246:47: error: unknown type name ‘Sint’; did you mean ‘uint’?
246 | Sint *pncodes, double *xc, Sint *clc, Sint *niter,
| ^~~~
| uint
class.c:247:9: error: unknown type name ‘Sint’; did you mean ‘uint’?
247 | Sint *iters)
| ^~~~
| uint
class.c:276:37: error: unknown type name ‘Sint’; did you mean ‘uint’?
276 | VR_lvq2(double *alpha, double *win, Sint *pn, Sint *p, double *x,
| ^~~~
| uint
class.c:276:47: error: unknown type name ‘Sint’; did you mean ‘uint’?
276 | VR_lvq2(double *alpha, double *win, Sint *pn, Sint *p, double *x,
| ^~~~
| uint
class.c:277:9: error: unknown type name ‘Sint’; did you mean ‘uint’?
277 | Sint *cl, Sint *pncodes, double *xc, Sint *clc,
| ^~~~
| uint
class.c:277:19: error: unknown type name ‘Sint’; did you mean ‘uint’?
277 | Sint *cl, Sint *pncodes, double *xc, Sint *clc,
| ^~~~
| uint
class.c:277:46: error: unknown type name ‘Sint’; did you mean ‘uint’?
277 | Sint *cl, Sint *pncodes, double *xc, Sint *clc,
| ^~~~
| uint
class.c:278:9: error: unknown type name ‘Sint’; did you mean ‘uint’?
278 | Sint *niter, Sint *iters)
| ^~~~
| uint
class.c:278:22: error: unknown type name ‘Sint’; did you mean ‘uint’?
278 | Sint *niter, Sint *iters)
| ^~~~
| uint
class.c:327:54: error: unknown type name ‘Sint’; did you mean ‘uint’?
327 | VR_lvq3(double *alpha, double *win, double *epsilon, Sint *pn, Sint *p,
| ^~~~
| uint
class.c:327:64: error: unknown type name ‘Sint’; did you mean ‘uint’?
327 | VR_lvq3(double *alpha, double *win, double *epsilon, Sint *pn, Sint *p,
| ^~~~
| uint
class.c:328:20: error: unknown type name ‘Sint’; did you mean ‘uint’?
328 | double *x, Sint *cl, Sint *pncodes, double *xc, Sint *clc,
| ^~~~
| uint
class.c:328:30: error: unknown type name ‘Sint’; did you mean ‘uint’?
328 | double *x, Sint *cl, Sint *pncodes, double *xc, Sint *clc,
| ^~~~
| uint
class.c:328:57: error: unknown type name ‘Sint’; did you mean ‘uint’?
328 | double *x, Sint *cl, Sint *pncodes, double *xc, Sint *clc,
| ^~~~
| uint
class.c:329:9: error: unknown type name ‘Sint’; did you mean ‘uint’?
329 | Sint *niter, Sint *iters)
| ^~~~
| uint
class.c:329:22: error: unknown type name ‘Sint’; did you mean ‘uint’?
329 | Sint *niter, Sint *iters)
| ^~~~
| uint
class.c:386:14: error: unknown type name ‘Sint’; did you mean ‘uint’?
386 | Sint *pn, Sint *pp, Sint *pncodes, Sint *rlen)
| ^~~~
| uint
class.c:386:24: error: unknown type name ‘Sint’; did you mean ‘uint’?
386 | Sint *pn, Sint *pp, Sint *pncodes, Sint *rlen)
| ^~~~
| uint
class.c:386:34: error: unknown type name ‘Sint’; did you mean ‘uint’?
386 | Sint *pn, Sint *pp, Sint *pncodes, Sint *rlen)
| ^~~~
| uint
class.c:386:49: error: unknown type name ‘Sint’; did you mean ‘uint’?
386 | Sint *pn, Sint *pp, Sint *pncodes, Sint *rlen)
| ^~~~
| uint
class.c:431:27: error: ‘VR_knn’ undeclared here (not in a function)
431 | {"VR_knn", (DL_FUNC) &VR_knn, 14},
| ^~~~~~
class.c:432:28: error: ‘VR_knn1’ undeclared here (not in a function)
432 | {"VR_knn1", (DL_FUNC) &VR_knn1, 10},
| ^~~~~~~
class.c:433:28: error: ‘VR_lvq1’ undeclared here (not in a function)
433 | {"VR_lvq1", (DL_FUNC) &VR_lvq1, 10},
| ^~~~~~~
class.c:434:28: error: ‘VR_lvq2’ undeclared here (not in a function)
434 | {"VR_lvq2", (DL_FUNC) &VR_lvq2, 11},
| ^~~~~~~
class.c:435:28: error: ‘VR_lvq3’ undeclared here (not in a function)
435 | {"VR_lvq3", (DL_FUNC) &VR_lvq3, 12},
| ^~~~~~~
class.c:436:28: error: ‘VR_olvq’ undeclared here (not in a function)
436 | {"VR_olvq", (DL_FUNC) &VR_olvq, 10},
| ^~~~~~~
class.c:437:33: error: ‘VR_onlineSOM’ undeclared here (not in a function)
437 | {"VR_onlineSOM", (DL_FUNC) &VR_onlineSOM, 9},
| ^~~~~~~~~~~~
make: *** [/opt/R/4.3.2/lib/R/etc/Makeconf:191: class.o] Error 1
ERROR: compilation failed for package ‘class’
* removing ‘/home/art/git/HUNePI/renv/staging/1/class’
install of package 'class' failed [error code 1]
Traceback (most recent calls last):
13: renv::restore()
12: records <- renv_restore_run_actions(project, diff, current, lockfile, rebuild) at restore.R#158
11: renv_install_impl(records) at restore.R#186
10: if (staged)
renv_install_staged(records)
else
renv_install_default(records) at install.R#275
9: renv_install_default(records) at install.R#295
8: handler(package, renv_install_package(record)) at install.R#398
7: handler(package, renv_install_package(record)) at install.R#398
6: withCallingHandlers(
renv_install_package_impl(record),
error = function(e) writef("FAILED")
) at install.R#431
5: withCallingHandlers(
renv_install_package_impl(record),
error = function(e) writef("FAILED")
) at install.R#431
4: if (copyable)
renv_file_copy(path, installpath, overwrite = TRUE)
else
r_cmd_install(package, path) at install.R#641
3: if (!identical(status, 0L))
r_exec_error(package, output, "install", status) at r.R#234
2: abort(all) at r.R#52
1: stop(fallback) at abort.R#44 I went back through the console messages associated with my previous attempt to run |
I'm dumb and forgot to run my local install of R version - Project '~/git/HUNePI' loaded. [renv 1.0.7]
- The project is out-of-sync -- use `renv::status()` for details. |
Still not working: > require(LambdaSkyline)
Loading required package: LambdaSkyline
> require(ape)
Loading required package: ape
> phy <- rcoal(100)
> alpha <- betacoal.maxlik(phy)
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In lbeta(x - alpha, n - x + alpha) : NaNs produced
2: In lbeta(2 - alpha, alpha) : NaNs produced
> alpha
p1 value fevals gevals niter convcode kkt1 kkt2 xtime
bobyqa 1.999 33.81239 13 NA NA 0 FALSE NA 0.119
> sky <- skyline.multi.phylo(phy, alpha$pi)
Error in lbeta(2 - alpha, alpha) :
non-numeric argument to mathematical function |
Ooof. I'm dumb. I've been calling |
Ok this works: > phy <- rcoal(100)
> alpha <- betacoal.maxlik(phy)
There were 50 or more warnings (use warnings() to see the first 50)
> alpha
p1 value fevals gevals niter convcode kkt1 kkt2 xtime
bobyqa 1.999 37.0464 13 NA NA 0 FALSE NA 0.109
> str(sky)
List of 6
$ time : num [1:102] 0.00 2.22e-16 4.44e-16 6.70e-04 8.08e-04 ...
$ interval.length: num [1:102] 0.00 2.22e-16 2.22e-16 6.70e-04 1.38e-04 ...
$ population.size: num [1:102] 3.304 3.304 3.304 3.304 0.668 ...
$ parameter.count: int 102
$ epsilon : num 0
$ logL : num -37
- attr(*, "class")= chr "skyline" But this is still running into problems: > tree <- read.tree("data/EG.5.1.expand.nwk")
> tree
Phylogenetic tree with 6376 tips and 5642 internal nodes.
Tip labels:
2152, 4298_0, 4298_1, 3605, 4771, 4371, ...
Rooted; includes branch lengths.
> alpha <- betacoal.maxlik(tree)
Error in optimx.check(par, optcfg$ufn, optcfg$ugr, optcfg$uhess, lower, :
Function returned is infinite or NA (non-computable)
In addition: Warning message:
In lbeta(Involved[i] - alpha, Inter$lineages[i] - Involved[i] + :
NaNs produced
> alpha
Error: object 'alpha' not found Looks like it has something to do with tree size: > tree <- read.tree("data/EG.5.1.n100.nwk")
> tree
Phylogenetic tree with 100 tips and 99 internal nodes.
Tip labels:
2280, 2688, 1668_2, 2383_6, 778, 2749, ...
Rooted; includes branch lengths.
> alpha <- betacoal.maxlik(tree)
There were 50 or more warnings (use warnings() to see the first 50)
> alpha
p1 value fevals gevals niter convcode kkt1 kkt2 xtime
bobyqa 1.999 365.5645 13 NA NA 0 FALSE NA 0.19
> sky <- skyline.multi.phylo(tree, alpha$p1)
> str(sky)
List of 6
$ time : num [1:197] 0 0.0662 2.7766 3.0424 5.1165 ...
$ interval.length: num [1:197] 0 0.0662 2.7104 0.2658 2.0741 ...
$ population.size: num [1:197] 21.9 21.9 21.9 21.9 21.9 ...
$ parameter.count: int 197
$ epsilon : num 0
$ logL : num -366
- attr(*, "class")= chr "skyline" |
@GopiGugan can you please try to reproduce the |
Getting an error when trying to find Ne: > str(tree)
List of 6
$ edge : int [1:6909, 1:2] 4915 4916 4917 4917 4915 4918 4919 4919 4919 4915 ...
$ edge.length: num [1:6909] 2.847 0.891 0 0 1.16 ...
$ Nnode : int 1996
$ node.label : chr [1:1996] "3442|3061|4381|2924|2223|69|4901|540|2038|389|1440|873|2624|3678|1150|623|1470|1445|3216|1471|3875|3345|397|123"| __truncated__ "14330.96" "" "29480.64" ...
$ tip.label : chr [1:4914] "709_0" "709_1" "634_0" "634_1" ...
$ root.edge : num 0
- attr(*, "class")= chr "phylo"
- attr(*, "order")= chr "cladewise"
> alpha <- betacoal.maxlik(tree)
Error in Edges[, 1] : incorrect number of dimensions covizu/covizu/utils/batch_utils.py Lines 476 to 484 in 2e932f6
|
Confirmed that I can reproduce this error in my R environment (customized for HUNePI): - Project '~/git/HUNePI' loaded. [renv 1.0.7]
> require(ape)
Loading required package: ape
> require(LambdaSkyline)
Loading required package: LambdaSkyline
> tree <- read.tree("~/Downloads/temp_tree.nwk")
> tree
Phylogenetic tree with 4914 tips and 1996 internal nodes.
Tip labels:
709_0, 709_1, 634_0, 634_1, 634_2, 2095, ...
Node labels:
3442|3061|4381|2924|2223|69|4901|540|2038|389|1440|873|2624|3678|1150|623|1470|1445|3216|1471|3875|3345|397|1239|2117|4426|1056|3041|1912|612|2984|4437|2925|3695|3521|1200|4543|1883|2927|942|152|3892|752|3519|4546|4827|2701|4227|3650|1408|2695|3861|3248|4607|1581|2559|714|1515|3669|1334|4454|4080|2362|3567|1509|3666|244|989|904|678|3557|415|4510|484|1189|4356|1879|763|1577|4491|3674|4036|2721|4245|1234|3838|2017|3999|3239|3383|1978|4278|4720|4113|3605|720|2908|3396|37651.00, 14330.96, , 29480.64, , 30670.91, ...
Rooted; includes branch lengths.
> alpha <- betacoal.maxlik(tree)
Error in Edges[, 1] : incorrect number of dimensions |
@GopiGugan can you please send me a bunch of trees, for which some run and others fail? |
I am stepping through the function(phylo,epsilon=0.0) {
Inter <- coalescent.intervals.multi(phylo)
optimx::optimx(par=1.5, fn=function(x) -skyline.multi.coalescentIntervals(Inter,x,epsilon)$logL,
lower=0.001,upper=1.999, method="bobyqa")
} so I'm stepping through > x <- read.tree("~/Downloads/temp_tree.nwk")
> class(x)
[1] "phylo"
> t <- sort(branching.times.samp(x))
> summary(t)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 38.81 41.06 40.37 42.78 45.76
> t <- signif(t,digits=10)
> Dist <- unique(t)
> lt <- length(Dist)
> lt
[1] 5649
> # interval widths
w <- numeric(lt)
w[1] <- Dist[1]
for (i in 2:lt) w[i] <- Dist[i] - Dist[i - 1] # why not use diff()?
> summary(w)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000000 0.000510 0.001280 0.008101 0.002850 5.075050
> # Logs if a coalescence happens or not
Coal <- logical(lt)
> table(Coal)
Coal
FALSE
5649 This last result is weird because it indicates that there are no coalescent events in the tree.. Maybe we are just initializing a vector of the required length before populating it? |
Continuing on: > # Logs the number of lineages involved in coalescences (breaks down in the case of simultaneous coalescences)
NumInvolved <- integer(lt)
> str(NumInvolved)
int [1:5649] 0 0 0 0 0 0 0 0 0 0 ...
l<-numeric(lt)
l[1] <- 0 # Number of lineages at present time I know that the following loop crashes at for(i in 2:lt){
Nodes <- as.integer(names(t[t==Dist[i-1]])) # Identifies the nodes (leaves or internal) with timestamp Dist[i-1]
CoalNodes <- Nodes[Nodes>length(x$tip.label)] # Finds all internal nodes, corresponding to a coalescence event
SampNodes <- Nodes[Nodes<=length(x$tip.label)] # Finds all leaves, corresponding to a sampling event
Involved <- 0 # Will record the total number of lineages coalescing at that time
if(length(CoalNodes) >0){
# if(length(CoalNodes)>1) print("Simultaneous Coalescences detected! Results meaningless!")
Coal[i-1] <- TRUE
for(j in 1:length(CoalNodes)){
Edges <- x$edge[x$edge[,1]==CoalNodes[j],] # Finds how many lineages coalesce at node CoalNodes[j]
Involved <- Involved + length(Edges[,1]) # Updates the count of coalescing lineages
NumInvolved[i-1] <- length(Edges[,1])
}
}
l[i]<-l[i-1]-Involved+length(CoalNodes)+length(SampNodes)
} Stepping through > i <- 2
> Nodes <- as.integer(names(t[t==Dist[i-1]])) # Identifies the nodes (leaves or internal) with timestamp Dist[i-1]
CoalNodes <- Nodes[Nodes>length(x$tip.label)] # Finds all internal nodes, corresponding to a coalescence event
SampNodes <- Nodes[Nodes<=length(x$tip.label)] # Finds all leaves, corresponding to a sampling event
Involved <- 0 # Will record the total number of lineages coalescing at that time
> length(CoalNodes)
[1] 0
> l[i]<-l[i-1]-Involved+length(CoalNodes)+length(SampNodes)
> l[2]
[1] 1
> i <- 3
> Nodes <- as.integer(names(t[t==Dist[i-1]]))
CoalNodes <- Nodes[Nodes>length(x$tip.label)]
SampNodes <- Nodes[Nodes<=length(x$tip.label)]
Involved <- 0
> length(CoalNodes)
[1] 0
|
> i <- 8
> Nodes <- as.integer(names(t[t==Dist[i-1]]))
CoalNodes <- Nodes[Nodes>length(x$tip.label)]
SampNodes <- Nodes[Nodes<=length(x$tip.label)]
Involved <- 0
length(CoalNodes)
[1] 1
> Coal[i-1] <- TRUE
> Edges <- x$edge[x$edge[,1]==CoalNodes[j],]
> Involved <- Involved + length(Edges[,1])
Error in Edges[, 1] : incorrect number of dimensions
> Edges
[1] 5406 1159 R is expecting |
I modified the function to handle cases where only a single branch descends from an internal node: coalescent.intervals.multi <- function(x)
{
if (class(x) != "phylo") stop("object \"x\" is not of class \"phylo\"")
# ordered branching times
t <- sort(branching.times.samp(x))
t <- signif(t,digits=10) # Required to eliminate numerical instabilities
Dist <- unique(t) # We group all identical times together
lt <- length(Dist) # Total number of intervals
# interval widths
w <- numeric(lt)
w[1] <- Dist[1]
for (i in 2:lt) w[i] <- Dist[i] - Dist[i - 1]
# Logs if a coalescence happens or not
Coal <- logical(lt)
# Logs the number of lineages involved in coalescences (breaks down in the case of simultaneous coalescences)
NumInvolved <- integer(lt)
l<-numeric(lt)
l[1] <- 0 # Number of lineages at present time
for(i in 2:lt) {
# nodes (leaves or internal) with timestamp Dist[i-1]
Nodes <- as.integer(names(t[t==Dist[i-1]]))
# all internal nodes, corresponding to a coalescence event
CoalNodes <- Nodes[Nodes>length(x$tip.label)]
# all leaves, corresponding to a sampling event
SampNodes <- Nodes[Nodes<=length(x$tip.label)]
Involved <- 0 # total number of lineages coalescing at that time
if(length(CoalNodes) >0){
# if(length(CoalNodes)>1)
# print("Simultaneous Coalescences detected! Results meaningless!")
Coal[i-1] <- TRUE
for(j in 1:length(CoalNodes)) {
# Finds how many lineages coalesce at node CoalNodes[j]
Edges <- x$edge[x$edge[,1]==CoalNodes[j],]
if (is.null(dim(Edges))) {
next
}
# Updates the count of coalescing lineages
Involved <- Involved + length(Edges[,1])
NumInvolved[i-1] <- length(Edges[,1]) # this seems irrelevant
}
}
if (Involved==0) {
Coal[i-1] <- FALSE
}
l[i]<-l[i-1]-Involved+length(CoalNodes)+length(SampNodes)
}
NumInvolved[lt] <- l[lt]
Coal[lt] <- TRUE
obj <- list(
lineages=l,
interval.length=w,
interval.count=lt,
total.depth =sum(w),
coalescences = Coal,
NumInvolved=NumInvolved)
class(obj) <- "coalescentIntervals"
return(obj)
} This seems to be running without error, although it is taking a long time: > Inter <- coalescent.intervals.multi(tree)
> res <- optimx::optimx(par=1.5, fn=function(x) -skyline.multi.coalescentIntervals(Inter, x, 0.0)$logL, lower=0.001, upper=1.999, method="bobyqa") I think a faster method would be to simply remove such nodes from the input tree: > phy <- read.tree("~/Downloads/temp_tree.nwk")
> phy
Phylogenetic tree with 4914 tips and 1996 internal nodes.
Tip labels:
709_0, 709_1, 634_0, 634_1, 634_2, 2095, ...
Node labels:
3442|3061|4381|2924|2223|69|4901|540|2038|389|1440|873|2624|3678|1150|623|1470|1445|3216|1471|3875|3345|397|1239|2117|4426|1056|3041|1912|612|2984|4437|2925|3695|3521|1200|4543|1883|2927|942|152|3892|752|3519|4546|4827|2701|4227|3650|1408|2695|3861|3248|4607|1581|2559|714|1515|3669|1334|4454|4080|2362|3567|1509|3666|244|989|904|678|3557|415|4510|484|1189|4356|1879|763|1577|4491|3674|4036|2721|4245|1234|3838|2017|3999|3239|3383|1978|4278|4720|4113|3605|720|2908|3396|37651.00, 14330.96, , 29480.64, , 30670.91, ...
Rooted; includes branch lengths.
> has.singles(phy)
[1] TRUE
> phy2 <- collapse.singles(phy)
> phy2
Phylogenetic tree with 4914 tips and 1479 internal nodes.
Tip labels:
709_0, 709_1, 634_0, 634_1, 634_2, 2095, ...
Node labels:
3442|3061|4381|2924|2223|69|4901|540|2038|389|1440|873|2624|3678|1150|623|1470|1445|3216|1471|3875|3345|397|1239|2117|4426|1056|3041|1912|612|2984|4437|2925|3695|3521|1200|4543|1883|2927|942|152|3892|752|3519|4546|4827|2701|4227|3650|1408|2695|3861|3248|4607|1581|2559|714|1515|3669|1334|4454|4080|2362|3567|1509|3666|244|989|904|678|3557|415|4510|484|1189|4356|1879|763|1577|4491|3674|4036|2721|4245|1234|3838|2017|3999|3239|3383|1978|4278|4720|4113|3605|720|2908|3396|37651.00, , , , 0.78, 36080.88, ...
Rooted; includes branch lengths. |
The optimization finished but there was an issue at a later step: > res
p1 value fevals gevals niter convcode kkt1 kkt2 xtime
bobyqa 1.596305 19503.13 18 NA NA 0 TRUE TRUE 226.902
> sky <- skyline.multi.phylo(tree, res$pi)
Error in Edges[, 1] : incorrect number of dimensions At this point I think the easier route is to remove single nodes. |
Okay that works: - Project '~/git/HUNePI' loaded. [renv 1.0.7]
> require(ape)
Loading required package: ape
> require(LambdaSkyline)
Loading required package: LambdaSkyline
> tree <- read.tree("~/Downloads/temp_tree.nwk")
> tree <- collapse.singles(tree)
> alpha <- betacoal.maxlik(tree)
> alpha
p1 value fevals gevals niter convcode kkt1 kkt2 xtime
bobyqa 1.596361 19038.5 18 NA NA 0 TRUE TRUE 181.989
> sky <- skyline.multi.phylo(tree, alpha$p1)
> str(sky)
List of 6
$ time : num [1:5134] 0 5.08 5.64 6.22 6.67 ...
$ interval.length: num [1:5134] 0 5.075 0.566 0.583 0.447 ...
$ population.size: num [1:5134] 49.1 49.1 49.1 49.1 49.1 ...
$ parameter.count: int 5134
$ epsilon : num 0
$ logL : num -19039
- attr(*, "class")= chr "skyline" |
@GopiGugan I pushed this fix to branch |
@ArtPoon are you able to share the versions of The error message is gone now, but we're getting NA values for the population size: > require(ape)
Loading required package: ape
> require(LambdaSkyline)
Loading required package: LambdaSkyline
> tree <- read.tree('temp_tree.nwk')
> tree <- collapse.singles(tree)
> alpha <- betacoal.maxlik(tree)
> alpha
p1 value fevals gevals niter convcode kkt1 kkt2 xtime
bobyqa NA 8.988466e+307 NA NA NA 9999 NA NA 0.001
> sky <- skyline.multi.phylo(tree, alpha$p1)
> str(sky)
List of 6
$ time : num [1:5134] 0 5.08 5.64 6.22 6.67 ...
$ interval.length: num [1:5134] 0 5.075 0.566 0.583 0.447 ...
$ population.size: num [1:5134] NA NA NA NA NA NA NA NA NA NA ...
$ parameter.count: int 5134
$ epsilon : num 0
$ logL : num NA
- attr(*, "class")= chr "skyline" session info: > sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora Linux 37 (Server Edition)
Matrix products: default
BLAS/LAPACK: /usr/lib64/libflexiblas.so.3.3
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] LambdaSkyline_0.1 ape_5.7-1
loaded via a namespace (and not attached):
[1] compiler_4.2.2 parallel_4.2.2 Rcpp_1.0.10
[4] nlme_3.1-160 grid_4.2.2 digest_0.6.31
[7] optimx_2022-4.30 numDeriv_2016.8-1.1 lattice_0.20-45 |
> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.5 LTS
Matrix products: default
BLAS: /home/art/R-4.2.2/lib/R/lib/libRblas.so
LAPACK: /home/art/R-4.2.2/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] LambdaSkyline_0.1 ape_5.7-1
loaded via a namespace (and not attached):
[1] minqa_1.2.5 compiler_4.2.2 BiocManager_1.30.23
[4] parallel_4.2.2 tools_4.2.2 Rcpp_1.0.10
[7] nlme_3.1-160 grid_4.2.2 digest_0.6.35
[10] nloptr_2.0.3 optimx_2023-10.21 numDeriv_2016.8-1.1
[13] renv_1.0.7 pracma_2.4.4 lattice_0.20-45 |
On BEVi: > sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /usr/local/lib64/R/lib/libRblas.so
LAPACK: /usr/local/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.2.2 |
|
We are using MPI within the same function in covizu/covizu/utils/batch_utils.py Lines 404 to 423 in ee5a68d
We could use the same approach of using |
This script will randomly downsample a tree - we could incorporate this into the part of the import argparse
import random
from Bio import Phylo
import sys
description = """
Remove tips at random from a tree until a target size is reached.
Writes resulting Newick string to stdout.
"""
parser = argparse.ArgumentParser(description=description)
parser.add_argument("nwkfile", type=argparse.FileType('r'),
help="Path to file containing Newick tree.")
parser.add_argument("target", type=int, help="Target number of tips.")
args = parser.parse_args()
phy = Phylo.read(args.nwkfile, 'newick')
tips = phy.get_terminals() # returns a list of Clade objects
if args.target >= len(tips):
sys.stderr.write(f"Error: requirement already met ({len(tips)}<={target})\n")
sys.exit()
# random permutation
random.shuffle(tips)
while len(tips) > args.target:
tip = tips[0]
_ = phy.prune(tip)
tips = tips[1:]
Phylo.write(phy, sys.stdout, 'newick') The |
Here is a simple example of # calculate the pairwise Wasserstein distance matrix
n <- length(wpps)
if (require(parallel, quietly=TRUE)) {
res <- mclapply(0:(n*n-1), # some iterable to distribute among threads as `k`
function(k) { # anonymous function
i <- k %/% n + 1 # do stuff here
j <- k %% n + 1
if (i < j) {
transport::wasserstein(wpps[[i]], wpps[[j]], p=p, prob=TRUE) # return value
} else {
0 # other return value
}
}, mc.cores = mc.cores) # the number of threads
# results are assigned to `res` as a list
wdist <- matrix(unlist(res), nrow=n, ncol=n, byrow=T) |
I wrote a python script with import rpy2.robjects as robjects
from rpy2.robjects import r, ListVector
from rpy2.robjects.packages import importr
import rpy2.robjects.vectors as rvectors
from rpy2.rinterface_lib.sexp import NALogicalType
import glob
import os
import re
import csv
nwk_files = glob.glob('ctree/*.nwk')
nwk_files_abs = [os.path.abspath(file) for file in nwk_files]
lineages = [os.path.basename(file).split('.nwk')[0] for file in nwk_files]
parallel = importr("parallel")
robjects.r('set.seed(123456)')
r_lineages = rvectors.StrVector(lineages)
r_data = rvectors.StrVector(nwk_files_abs)
r('''
find_ne <- function(file) {
library(ape)
library(LambdaSkyline)
tryCatch({
tree <- read.tree(file)
tree <- collapse.singles(tree)
alpha <- betacoal.maxlik(tree)
sky <- skyline.multi.phylo(tree, alpha$p1)
pop_sizes <- head(sky$population.size, n = 5)
return(mean(pop_sizes, na.rm = TRUE))
}, error = function(e) {
return(NA)
})
}
''')
r_results = parallel.mclapply(r_data, r('find_ne'), mc_cores=20)
r_named_results = robjects.r['setNames'](r_results, r_lineages)
results_dict = dict(zip(lineages, r_named_results))
res = {k: 'NA' if isinstance(v[0], NALogicalType) else float(v[0]) for k, v in results_dict.items()}
with open("hunepi_ne.csv", "w", newline='') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(['Lineage', 'Ne'])
for lineage, ne in res.items():
writer.writerow([lineage, ne]) After downsampling to 500 tips max, we still get |
@GopiGugan reports that downsampling and running the resulting trees on 20 cores took about ten minutes |
@GopiGugan can you please rerun the same trees with the same method to see if the same subset of trees fail ( |
I tried a random subset of trees. They all fail at the following step: > alpha <- betacoal.maxlik(tree)
Error in optimx.check(par, optcfg$ufn, optcfg$ugr, optcfg$uhess, lower, :
Function returned is infinite or NA (non-computable)
In addition: Warning message:
In lbeta(Involved[i] - alpha, Inter$lineages[i] - Involved[i] + :
NaNs produced This seems to be the same error you were getting before:
|
The same trees tend to fail: > tab <- table(is.na(full$Ne), is.na(full$pruned.ne))
> tab
FALSE TRUE
FALSE 3063 321
TRUE 30 392
> fisher.test(tab)
Fisher's Exact Test for Count Data
data: tab
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
83.83524 189.96729
sample estimates:
odds ratio
124.5724 |
It looks like part of the problem is that you are not expanding the trees into binary (dichotomous) versions: > phy <- read.tree("~/git/covizu/iss539/ctree/AY.17.nwk")
> alpha <- betacoal.maxlik(phy)
Error in optimx.check(par, optcfg$ufn, optcfg$ugr, optcfg$uhess, lower, :
Function returned is infinite or NA (non-computable)
In addition: Warning message:
In lbeta(Involved[i] - alpha, Inter$lineages[i] - Involved[i] + :
NaNs produced
> phy2 <- multi2di(phy)
> phy2
Phylogenetic tree with 62 tips and 61 internal nodes.
Tip labels:
28, 29, 20, 4, 32_0, 32_1, ...
Node labels:
1.00, , , 0.85, 0.53, 0.82, ...
Rooted; includes branch lengths.
> alpha <- betacoal.maxlik(phy2)
> alpha
p1 value fevals gevals niter convcode kkt1 kkt2 xtime
bobyqa 1.797876 62.88628 16 NA NA 0 TRUE TRUE 0.022
> sky <- skyline.multi.phylo(phy2, alpha$p1)
> sky
$time
[1] 0.00000 2.50874 2.62527 3.07890 3.32507 3.40158 3.41762 3.61494 3.84003
[10] 3.84527 3.86048 3.86503 3.99503 4.74003 4.85762 4.87048 4.95503 5.25344
[19] 5.49606 5.57126 5.64003 5.96745 6.24890 6.39282 6.41757 6.71126 6.76671
[28] 6.89565 7.21962 7.29040 7.65332 8.01444 8.03655 8.06745 8.15524 8.25016
[37] 8.70444 8.75444 8.75888 8.80009 9.88444
$interval.length
[1] 0.00000 2.50874 0.11653 0.45363 0.24617 0.07651 0.01604 0.19732 0.22509
[10] 0.00524 0.01521 0.00455 0.13000 0.74500 0.11759 0.01286 0.08455 0.29841
[19] 0.24262 0.07520 0.06877 0.32742 0.28145 0.14392 0.02475 0.29369 0.05545
[28] 0.12894 0.32397 0.07078 0.36292 0.36112 0.02211 0.03090 0.08779 0.09492
[37] 0.45428 0.05000 0.00444 0.04121 1.08435
$population.size
[1] 0.1165300 0.1165300 0.1165300 3.2272074 3.2272074 3.2272074
[7] 4.6218444 4.6218444 4.6218444 4.6218444 34.2015934 34.2015934
[13] 34.2015934 34.2015934 34.2015934 34.2015934 3.3980938 22.5725486
[19] 22.5725486 22.5725486 22.5725486 26.5043770 26.5043770 8.1894774
[25] 8.1894774 13.9256263 9.7423137 9.7423137 72.9813053 72.9813053
[31] 72.9813053 72.9813053 72.9813053 72.9813053 8.0100758 7.7336799
[37] 36.8743607 36.8743607 0.4506327 4.1825619 23.5979175
$parameter.count
[1] 41
$epsilon
[1] 0
$logL
[1] -62.88628
attr(,"class")
[1] "skyline"
|
@GopiGugan what branch are you working on? Can I jump in? |
There seem to be tips without a label (None) after pruning the tree. @ArtPoon - I tried using >>> from Bio import Phylo
>>> from covizu import clustering
>>> from Bioplus import prunetree
>>> trees = Phylo.parse('2024-08-13/BA.5.5.nwk', 'newick')
>>> tree = clustering.consensus(trees, cutoff=0.5)
>>> [i.name for i in tree.get_terminals() if i.name is None]
[]
>>> tree = prunetree.prune_tips(tree,500)
>>> len(tree.get_terminals())
501
>>> [i.name for i in tree.get_terminals() if i.name is None]
[None] |
Reproduced the problem on my machine: (venv) art@aziraphale:~/git/covizu/iss539/2024-08-13$ python
Python 3.10.12 (main, Nov 6 2024, 20:22:13) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import Phylo
>>> from covizu import clustering
>>> trees = Phylo.parse("BA.5.5.nwk", 'newick')
>>> tree = clustering.consensus(trees, cutoff=0.5)
>>> tree
Clade(branch_length=0.0, confidence=1.0)
>>> Phylo.write(tree, "consensus.nwk", 'newick')
1
>>> from Bioplus import prunetree
>>> tree = prunetree.prune_tips(tree, 500)
>>> len(tree.get_terminals())
501
>>> tips = tree.get_terminals()
>>> tips[0]
Clade(branch_length=6.917200357142859, name='4179')
>>> bad = [tip for tip in tips if tip.name is None]
>>> bad
[Clade(branch_length=7.270655824742268, confidence=0.5)]
>>> bad[0].clades
[] Looks like there is a problem with Biopython.Phylo's |
Something strange happened here: >>> phy = Phylo.read("consensus.nwk", 'newick')
>>> tips = phy.get_terminals()
>>> target = 500
>>> while len(tips) > target:
... tips = sorted(tips, key=lambda x: x.branch_length)
... tip = tips[0]
... parent = phy.prune(tip)
... tips = tips[1:]
... bad = [tip for tip in tips if tip.name is None]
... if bad:
... break
...
>>> bad
[]
>>> len(tips)
500
>>> tips = phy.get_terminals()
>>> len(tips)
501
>>> bad = [tip for tip in tips if tip.name is None]
>>> bad
[Clade(branch_length=7.27065, confidence=0.5)]
>>> parent
Clade(branch_length=0.0, confidence=1.0)
>>> tip
Clade(branch_length=6.09403, name='1483') |
This seems to work: >>> phy = Phylo.read("consensus.nwk", 'newick')
>>> tips = phy.get_terminals()
>>> while len(tips) > target:
... tips = sorted(tips, key=lambda x: x.branch_length)
... tip = tips[0]
... parent = phy.prune(tip)
... tips = phy.get_terminals()
...
>>> tip
Clade(branch_length=6.09498, name='3592')
>>> parent
Clade(branch_length=0.0, confidence=1.0)
>>> len(tips)
500
>>> bad = [tip for tip in tips if tip.name is None]
>>> bad
[]
>>> bad = [tip for tip in phy.get_terminals() if tip.name is None]
>>> bad
[] |
I've updated the Bioplus repo, @GopiGugan can you please re-run with the new |
For downsampled trees (to 500 tips), we should only run these for estimating Ne and retain the original trees for the other three statistics |
@GopiGugan to submit PR |
The text was updated successfully, but these errors were encountered: