Skip to content

Latest commit

 

History

History
78 lines (64 loc) · 8.82 KB

descriptors.md

File metadata and controls

78 lines (64 loc) · 8.82 KB

Python Dependencies Contributions welcome Status

MathFeature

Feature Extraction Package for Biological Sequences Based on Mathematical Descriptors

HomeKey FeaturesList of filesDependenciesInstallingHow To UseCitation

List of Descriptors

Descriptors calculated by MathFeature for DNA, RNA, and Protein sequences.

Descriptor groups Descriptor Dimension Sequence Example (Study with Application or Theory)
Binary L * 4 DNA/RNA Ref 1 - Ref 2
Z-curve L * 3 DNA/RNA Ref 1 - Ref 2
Real L DNA/RNA Ref 1 - Ref 2
Numerical Mapping Integer L DNA/RNA/Protein Ref 1 - Ref 2
EIIP L DNA/RNA/Protein Ref 1 - Ref 2
Complex Number L DNA/RNA Ref 1 - Ref 2
Atomic Number L DNA/RNA Ref 1 - Ref 2
Chaos Game Representation L * 2 DNA/RNA Ref 1
Chaos Game Frequency Chaos Game Representation L - k + 1 DNA/RNA
Chaos Game Signal (with Fourier) 19 DNA/RNA Ref 1
Fourier Transform Numerical Mapping + Fourier 19 DNA/RNA/Protein Ref 1
Entropy Shannon k DNA/RNA/Protein Ref 1
Tsallis k DNA/RNA/Protein Ref 1
Graphs Complex Networks (with threshold) 12 * t DNA/RNA/Protein Ref 1 - Ref 2
Complex Networks (without threshold - v2) 27 * k DNA/RNA/Protein Ref 1 - Ref 2
Basic k-mer 4^k DNA/RNA Ref 1
Customizable k-mer 4^k DNA/RNA
Nucleic acid composition (NAC) 4 DNA/RNA Ref 1
Di-nucleotide composition (DNC) 16 DNA/RNA Ref 1
Tri-nucleotide composition (TNC) 64 DNA/RNA Ref 1
ORF Features or Coding Features 10 DNA/RNA Ref 1 - Ref 2
Fickett score 2 DNA/RNA Ref 1
Pseudo K-tuple nucleotide composition - DNA/RNA Ref 1
Other techniques Accumulated Nucleotide Frequency-ANF L DNA/RNA/Protein Ref 1
ANF with Fourier 19 DNA/RNA/Protein
Xmer k-Spaced Ymer Composition Frequency (kGap) 4^X * 4^Y or 20^X * 20^Y DNA/RNA/Protein Ref 1 - Ref 2
Amino acid composition (AAC) 20 Protein Ref 1
Dipeptide composition (DPC) 400 Protein Ref 1
Tripeptide composition (TPC) 8000 Protein Ref 1
Basic k-mer 20^k Protein
Customizable k-mer 20^k Protein
Kmer Frequency Mapping L - k + 1 Protein
Kmer Frequency Mapping with Fourier 19 Protein

To use any descriptor, see our documentation.

Note 1: L = length of the longest sequence.

Note 2: k = frequencies of k-mer.

Note 3: t = threshold: number of subgraphs.

Note 4: The reference column represents some studies that apply the descriptor (Similar approach). Other references are cited in our article.