Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error during HashCounting.sh with large files (w/o LSFS) #11

Open
noriko-cassman opened this issue Dec 17, 2015 · 0 comments
Open

error during HashCounting.sh with large files (w/o LSFS) #11

noriko-cassman opened this issue Dec 17, 2015 · 0 comments

Comments

@noriko-cassman
Copy link

Hello! Thanks for the nice paper.

The test data, using the bash scripts, ran fine on our cluster system, as well as on 10K subsets of my 18 samples, using the same but modified bash scripts. However, when I ran LSA on the full files (18 samples, each about 500 MB) I was getting errors during HashCounting.sh, even when running with up to 40 threads. I am not using the LSFS system.

Here is the error message:

parallel: This job failed:
echo $(date) writing k-mer corpus for file 2;
python LSA/kmer_corpus.py -r 2 -i Vhashed_reads/ -o Vcluster_vectors/ >> VLogs/KmerCorpus.log 2>&1
printing end of last log file...
hashobject.kmer_corpus_to_disk(Kmer_Hash_Count_Files[fr],mask=M)
IndexError: list index out of range
Traceback (most recent call last):
File "LSA/kmer_corpus.py", line 33, in
hashobject.kmer_corpus_to_disk(Kmer_Hash_Count_Files[fr],mask=M)
IndexError: list index out of range
Traceback (most recent call last):
File "LSA/kmer_corpus.py", line 33, in
hashobject.kmer_corpus_to_disk(Kmer_Hash_Count_Files[fr],mask=M)
IndexError: list index out of range

Something funny, when I looked at the Log files for the test data and my subset data, I found similar errors as with the full data (attached below). Looking up the errors, I thought maybe they had to do with this: http://stackoverflow.com/questions/4964101/pep-3118-warning-when-using-ctypes-array-as-numpy-array.

Here are outputs that you requested for other issues from the run with the full dataset:
HashReads.log
KmerCorpus.log
CombineFractions.log
MergeHash.log

Note: GlobalWeights.log and CreateHash.log were empty.

ls -l Vcluster_vectors.txt
ls -l Vhashed_reads.txt
ls -l Voriginal_reads.txt

Here are the log files ad outputs from the run with your test data:
CombineFractions.log
CreateHash.log
HashReads.log
KmerClusterIndex.log
KmerCorpus.log
KmerLSI.log
MergeIntermediatePartitions.log
ReadPartitions.log

Note: these were empty
GlobalWeights.log
KmerClusterCols.log
KmerClusterMerge.log
KmerClusterParts.log
MergeHash.log

ls -l cluster_vectors
ls -l hashed_reads
ls -l original_reads

Thanks in advance,
Nori

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant