-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Range of bins in bed file needs to exactly match all bins in matrix #21
Comments
I m a bit puzzled by the error you are seeing. The Edit: Is that the complete error you are seeing? Or is this followed by something else? |
I am using tadtool v0.81 installed using conda with python3. There was a few more lines of error. Here is the complete error msg. Keep in mind I was able to circumvent this error by sub-setting the bed file to only contain the bins used in the matrix file.
|
Ah, that makes a lot more sense now. In general, the BED entries need to match the matrix bins exactly, just as you said. Because we know that most of the time people will most likely work with matrix subsets, we have included the |
I tried tadtool subset but again, it gave me an error msg unless I start with the exact number of bins. I will subset using awk for now, but it would be nice to implement this function right in the tool. Unrelated question... is there support for multi-threading or any sort of parallelization? I notice that the the tool uses very minimal resources, and this results in a lot of processing time for large files. |
If you can provide me with (small) sample files that throw the error when using mismatched BED and matrix files, I will see how simple it would be to implement this in TADtool. The requirement of matching region and matrix files also serves as an input sanity check, so we would probably issue a warning when a mismatch is detected. We are currently not planning to implement a multi-threaded version of the insulation score calculation. Our recommendation, as outlined in the README, is to run TADtool on individual chromosomes (you can easily parallelise manually with |
When running TADtool using an iced matrix file and its corresponding bed file, generated from the HiC-Pro pipeline, I get the following error. However, when I used the sparse matrix and bed file from the examples folder, TADtool works as expected.
Upon closer inspection, I see that the matrices from the examples folder and the ones I got from HiC-pro are similar, however, the bed files are different in that HiC-Pro creates a bed filed with 4 columns, with the last one containing the bin number.
So when I use the HiC-Pro matrix, which almost always contains subset of all possible bins, TADtool fails... unless I also subset the bed file to begin and end in the exact bins used in the matrix.
I am using tadtool v0.81 installed using conda with python3.
The text was updated successfully, but these errors were encountered: