Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
benjamin-james committed Oct 22, 2018
0 parents commit 61e031e
Show file tree
Hide file tree
Showing 112 changed files with 17,449 additions and 0 deletions.
19 changes: 19 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
all: bin/Red.o bin/meshclust2

bin/Red.o:
mkdir -p bin
mkdir -p bin/exception
mkdir -p bin/nonltr
mkdir -p bin/utility
$(MAKE) -C src
bin/meshclust2: bin/Red.o
$(MAKE) -C src/cluster
cp src/cluster/meshclust2 bin

clean:
$(MAKE) clean -C src
$(MAKE) clean -C src/cluster
$(RM) -r bin

rebuild: clean all
.PHONY: all clean
74 changes: 74 additions & 0 deletions README
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
MeShClust2
Release version

Requirements: g++ 4.9.1 or later, requires Homebrew on Mac OS X

Compilation using g++ (homebrew) and GNU Make on Mac OS X
CXX=g++-7 make

see: https://stackoverflow.com/questions/29057437/compile-openmp-programs-with-gcc-compiler-on-os-x-yosemite


Linux/Unix compilation:
make

Usage: bin/meshclust2 --id 0.x [OPTIONS] *.fasta

--id The most important parameter, --id, controls the identity cutoff of the sequences.
Needs to be between 0 and 1.
If it is not specified, an identity of 0.9 is used.

--kmer decides the size of the kmers. It is by default automatically decided by average sequence length,
but if provided, MeShClust can speed up a little by not having to find the largest sequence length.
Increasing kmer size can increase accuracy, but increases memory consumption.

--mut-type {single, both, nonsingle-typical, nonsingle-all, all-but-reversion, all-but-translocation}
changes the mutation generation algorithm. By default, "single" is used, utilizing only
single point mutations. On low identity data sets, "both", which includes single mutations
and block mutations, is preferable. The option "nonsingle-typical" uses only block mutations,
disallowing single point mutations. Other options include "all", which includes single,
block, and nontypical mutations translocation and reversion.

--feat determines the combinations of features to be used. By default, "fast" allows 9 fast combinations
to be selected from. "slow" adds 2 slower features which include logarithm based features,
and "extraslow" includes 33 total features used in a previous study.

--min-feat (default 3) sets the minimum feature pairs to be used. If set to 2, at least 2 feature pairs
will be used. Recall that features include pairwise combinations of the "feat" option.

--max-feat (default 5) sets the maximum feature pairs to be used. Diminishing returns appears quickly,
so a very large maximum is not advised.

--sample selects the total number of sequences used for both training and testing.
300 is the default value. Each sequence generates 10 synthetic mutants.
That is, --sample 300 provides 3000 training pairs and 3000 testing pairs.

--min-id (default 0.35) sets the lower bound for mutation identity scores to be calculated. Shouldn't need
to be set normally, as lower identites take much longer, especially with single mutations only.

--threads sets the number of threads to be used. By default OpenMP uses the number of available cores
on your machine, but this parameter overwrites that.

--output specifies the output file, in CD-HIT's CLSTR format, described below:
A '>Cluster ' followed by an increasing index designates a cluster.
Otherwise, the sequence is printed out.
A '*' at the end of a sequence designates the center of the cluster.
An example of a small data set:

>Cluster 0
0 993nt, >seq128 template_6... *
>Cluster 1
0 1043nt, >seq235 template_10...
1 1000nt, >seq216 template_10... *
2 1015nt, >seq237 template_10...


--delta decides how many clusters are looked around in the final clustering stage.
Increasing it creates more accuracy, but takes more time. Default value is 5.

--iterations specifies how many iterations in the final stage of merging are done until convergence.
Default value is 15.



If the argument is not listed here, it is interpreted as an input (FASTA format) file.
175 changes: 175 additions & 0 deletions src/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# CXX = /usr/bin/c++
CXX ?= g++

CXXFLAGS = -O3 -g -fmessage-length=0 -Wall -march=native -std=c++11

#
# Objects
#

ORed = ../bin/Red.o

# Exception
OInvalidInputException = ../bin/exception/InvalidInputException.o
OInvalidStateException = ../bin/exception/InvalidStateException.o
OFileDoesNotExistException = ../bin/exception/FileDoesNotExistException.o
OInvalidOrderOfOperationsException = ../bin/exception/InvalidOrderOfOperationsException.o
OInvalidScoreException = ../bin/exception/InvalidScoreException.o
OInvalidOperationException = ../bin/exception/InvalidOperationException.o

# Utility
OUtil = ../bin/utility/Util.o
OLocation = ../bin/utility/Location.o
OEmptyLocation = ../bin/utility/EmptyLocation.o
OLCSLen = ../bin/utility/LCSLen.o
OAffineId = ../bin/utility/AffineId.o
OGlobAlignE = ../bin/utility/GlobAlignE.o

# Non TR
OChromosome = ../bin/nonltr/Chromosome.o
OChromosomeOneDigit = ../bin/nonltr/ChromosomeOneDigit.o
OChromosomeRandom = ../bin/nonltr/ChromosomeRandom.o
OChromListMaker = ../bin/nonltr/ChromListMaker.o
OTableBuilder = ../bin/nonltr/TableBuilder.o
OScorer = ../bin/nonltr/Scorer.o
ODetectorMaxima = ../bin/nonltr/DetectorMaxima.o
OChromDetectorMaxima = ../bin/nonltr/ChromDetectorMaxima.o
OHMM = ../bin/nonltr/HMM.o
OScanner = ../bin/nonltr/Scanner.o
OTrainer = ../bin/nonltr/Trainer.o
OLocationList = ../bin/nonltr/LocationList.o
OLocationListCollection = ../bin/nonltr/LocationListCollection.o

OBJS = $(ORed) $(OInvalidInputException) $(OInvalidStateException) $(OFileDoesNotExistException) $(OInvalidOrderOfOperationsException) $(OInvalidOperationException) $(OInvalidScoreException) $(OUtil) $(OLocation) $(OEmptyLocation) $(OChromosome) $(OChromosomeOneDigit) $(OChromosomeRandom) $(OChromListMaker) $(OTableBuilder) $(OScorer) $(ODetectorMaxima) $(OChromDetector) $(OChromDetectorMaxima) $(OHMM) $(OScanner) $(OTrainer) $(OLocationList) $(OLocationListCollection) $(OLCSLen) $(OAffineId) $(OGlobAlignE)

#
# Target
#

TRed = ../bin/Red

#
# Make RepeatsDetector
#

$(TRed): $(OBJS)
$(CXX) -o $(TRed) $(OBJS)

#
# RepeatsDetector
#

$(ORed): RepeatsDetector.cpp nonltr/KmerHashTable.h nonltr/KmerHashTable.cpp nonltr/TableBuilder.h nonltr/HMM.h nonltr/Scanner.h nonltr/Trainer.h utility/Util.h
$(CXX) $(CXXFLAGS) -c RepeatsDetector.cpp -o $(ORed)

#
# Exception
#
$(OInvalidInputException): exception/InvalidInputException.cpp exception/InvalidInputException.h
$(CXX) $(CXXFLAGS) -c exception/InvalidInputException.cpp -o $(OInvalidInputException)

$(OInvalidStateException): exception/InvalidStateException.cpp exception/InvalidStateException.h
$(CXX) $(CXXFLAGS) -c exception/InvalidStateException.cpp -o $(OInvalidStateException)

$(OFileDoesNotExistException): exception/FileDoesNotExistException.cpp exception/FileDoesNotExistException.h
$(CXX) $(CXXFLAGS) -c exception/FileDoesNotExistException.cpp -o $(OFileDoesNotExistException)

$(OInvalidOrderOfOperationsException): exception/InvalidOrderOfOperationsException.cpp exception/InvalidOrderOfOperationsException.h
$(CXX) $(CXXFLAGS) -c exception/InvalidOrderOfOperationsException.cpp -o $(OInvalidOrderOfOperationsException)

$(OInvalidScoreException): exception/InvalidScoreException.cpp exception/InvalidScoreException.h
$(CXX) $(CXXFLAGS) -c exception/InvalidScoreException.cpp -o $(OInvalidScoreException)

$(OInvalidOperationException): exception/InvalidOperationException.cpp exception/InvalidOperationException.h
$(CXX) $(CXXFLAGS) -c exception/InvalidOperationException.cpp -o $(OInvalidOperationException)

#
# Utility
#

$(OUtil): utility/Util.cpp utility/Util.h utility/Location.h exception/FileDoesNotExistException.h
$(CXX) $(CXXFLAGS) -c utility/Util.cpp -o $(OUtil)

$(OLocation): utility/Location.cpp utility/Location.h utility/ILocation.h exception/InvalidInputException.h utility/Util.h
$(CXX) $(CXXFLAGS) -c utility/Location.cpp -o $(OLocation)

$(OEmptyLocation): utility/EmptyLocation.cpp utility/EmptyLocation.h utility/ILocation.h exception/InvalidOperationException.h
$(CXX) $(CXXFLAGS) -c utility/EmptyLocation.cpp -o $(OEmptyLocation)

$(OLCSLen): utility/LCSLen.cpp utility/LCSLen.h
$(CXX) $(CXXFLAGS) -c utility/LCSLen.cpp -o $(OLCSLen)

$(OAffineId): utility/AffineId.cpp utility/AffineId.h
$(CXX) $(CXXFLAGS) -c utility/AffineId.cpp -o $(OAffineId)

$(OGlobAlignE): utility/GlobAlignE.cpp utility/GlobAlignE.h
$(CXX) $(CXXFLAGS) -c utility/GlobAlignE.cpp -o $(OGlobAlignE)
#
# Non LTR
#

$(OChromosome): nonltr/Chromosome.cpp nonltr/Chromosome.h nonltr/IChromosome.h utility/Util.h exception/InvalidInputException.h exception/InvalidOperationException.h
$(CXX) $(CXXFLAGS) -c nonltr/Chromosome.cpp -o $(OChromosome)

$(OChromosomeOneDigit): nonltr/ChromosomeOneDigit.cpp nonltr/ChromosomeOneDigit.h nonltr/Chromosome.h exception/InvalidInputException.h
$(CXX) $(CXXFLAGS) -c nonltr/ChromosomeOneDigit.cpp -o $(OChromosomeOneDigit)

$(OChromosomeRandom): nonltr/ChromosomeRandom.cpp nonltr/ChromosomeRandom.h nonltr/IChromosome.h exception/InvalidInputException.h exception/InvalidStateException.h utility/Util.h
$(CXX) $(CXXFLAGS) -c nonltr/ChromosomeRandom.cpp -o $(OChromosomeRandom)

$(OTableBuilder): nonltr/TableBuilder.cpp nonltr/TableBuilder.h utility/Util.h nonltr/ChromosomeOneDigit.h nonltr/ITableView.h nonltr/KmerHashTable.h nonltr/KmerHashTable.cpp nonltr/EnrichmentMarkovView.h nonltr/EnrichmentMarkovView.cpp exception/InvalidStateException.h nonltr/ChromListMaker.h nonltr/IChromosome.h
$(CXX) $(CXXFLAGS) -c nonltr/TableBuilder.cpp -o $(OTableBuilder)

$(OScorer): nonltr/Scorer.cpp nonltr/Scorer.h nonltr/ChromosomeOneDigit.h utility/Util.h exception/InvalidStateException.h
$(CXX) $(CXXFLAGS) -c nonltr/Scorer.cpp -o $(OScorer)

$(ODetectorMaxima): nonltr/DetectorMaxima.cpp nonltr/DetectorMaxima.h utility/ILocation.h exception/InvalidStateException.h
$(CXX) $(CXXFLAGS) -c nonltr/DetectorMaxima.cpp -o $(ODetectorMaxima)

$(OChromDetectorMaxima): nonltr/ChromDetectorMaxima.cpp nonltr/ChromDetectorMaxima.h nonltr/DetectorMaxima.h nonltr/ChromosomeOneDigit.h utility/Util.h utility/ILocation.h utility/Location.h
$(CXX) $(CXXFLAGS) -c nonltr/ChromDetectorMaxima.cpp -o $(OChromDetectorMaxima)

$(OHMM): nonltr/HMM.cpp nonltr/HMM.h utility/ILocation.h exception/InvalidStateException.h exception/InvalidInputException.h exception/FileDoesNotExistException.h exception/InvalidOperationException.h
$(CXX) $(CXXFLAGS) -c nonltr/HMM.cpp -o $(OHMM)

$(OScanner): nonltr/Scanner.cpp nonltr/Scanner.h nonltr/Chromosome.h nonltr/ChromosomeOneDigit.h nonltr/HMM.h nonltr/ITableView.h nonltr/Scorer.h utility/Util.h utility/ILocation.h exception/InvalidInputException.h exception/InvalidStateException.h exception/FileDoesNotExistException.h exception/InvalidOperationException.h
$(CXX) $(CXXFLAGS) -c nonltr/Scanner.cpp -o $(OScanner)

$(OTrainer): nonltr/Trainer.cpp nonltr/Trainer.h nonltr/TableBuilder.h nonltr/KmerHashTable.h nonltr/KmerHashTable.cpp nonltr/HMM.h nonltr/ChromDetectorMaxima.h nonltr/Scorer.h nonltr/ChromListMaker.h utility/Util.h nonltr/LocationListCollection.h
$(CXX) $(CXXFLAGS) -c nonltr/Trainer.cpp -o $(OTrainer)

$(OChromListMaker): nonltr/ChromListMaker.cpp nonltr/ChromListMaker.h nonltr/Chromosome.h nonltr/ChromosomeOneDigit.h utility/Util.h
$(CXX) $(CXXFLAGS) -c nonltr/ChromListMaker.cpp -o $(OChromListMaker)

$(OCluster): nonltr/Cluster.cpp nonltr/Cluster.h utility/Util.h exception/InvalidStateException.h exception/InvalidInputException.h
$(CXX) $(CXXFLAGS) -c nonltr/Cluster.cpp -o $(OCluster)

$(OLocationList): nonltr/LocationList.cpp nonltr/LocationList.h utility/ILocation.h utility/Location.h exception/InvalidStateException.h
$(CXX) $(CXXFLAGS) -c nonltr/LocationList.cpp -o $(OLocationList)

$(OLocationListCollection): nonltr/LocationListCollection.cpp nonltr/LocationListCollection.h utility/Location.h exception/InvalidStateException.h
$(CXX) $(CXXFLAGS) -c nonltr/LocationListCollection.cpp -o $(OLocationListCollection)


#
# Make binary directories
#

red: $(TRed)

#
# Make Red
#

bin:
mkdir ../bin
mkdir ../bin/exception
mkdir ../bin/utility
mkdir ../bin/nonltr

#
# Make clean
#

clean:
rm -f ../bin/*.o ../bin/exception/*.o ../bin/ms/*.o ../bin/nonltr/*.o ../bin/test/*.o ../bin/utility/*.o ../bin/tr/*.o *.o $(TRed)
Loading

0 comments on commit 61e031e

Please sign in to comment.