Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Amount of identifying information (Amount of Information / Entropy) characteristic calculator #51

Open
ChainsManipulator opened this issue Oct 14, 2024 · 0 comments
Assignees

Comments

@ChainsManipulator
Copy link
Member

ChainsManipulator commented Oct 14, 2024

Add calculator function for Identifying Information (Amount of Information / Entropy) characteristic for given intervals array using formula:
$$\displaystyle H=\sum_{j=1}^{m}{\frac {n_j} {n} * \log_2 \Delta_{a j}}$$
Where $n_j$ is a number of intervals in the given congeneric sequence, $n$ is a number of intervals in the whole sequence, $m$ is a number of different elements in the sequence and $\Delta_{a j}$ arithmetic mean of intervals of $j$-th element of the alphabet.

Examples

X = [2, 4, 2, 2, 4]
x_intervals = intervals(X, 'Start', 'Normal')
result = identifying_information(x_intervals)
print(result)
> 0,77779373752225
X = ['B','B','A','A','C','B','A','C','C','B']
x_intervals = intervals(X, 'Start', 'Lossy')
result = identifying_information(x_intervals)
print(result)
> 1.25069821459
X = ['B','B','A','A','C','B','A','C','C','B']
x_intervals = intervals(X, 'Start', 'Normal')
result = identifying_information(x_intervals)
print(result)
> 1.3709777
X = ['B','B','A','A','C','B','A','C','C','B']
x_intervals = intervals(X, 'End', 'Normal')
result = identifying_information(x_intervals)
print(result)
> 1.2532824857
X = ['B','B','A','A','C','B','A','C','C','B']
x_intervals = intervals(X, 'Start', 'Redundant')
result = identifying_information(x_intervals)
print(result)
> 1.335618955
X = ['B','B','A','A','C','B','A','C','C','B']
x_intervals = intervals(X, 'Start', 'Cycle')
result = identifying_information(x_intervals)
print(result)
> 1.571

X = ['A','C','T','T','G','A','T','A','C','G']  
x_intervals = intervals(X, 'Start', 'Lossy')
result = identifying_information(x_intervals)
print(result)
> 1.7906654768
X = ['A','C','T','T','G','A','T','A','C','G'] 
x_intervals = intervals(X, 'Start', 'Normal')
result = identifying_information(x_intervals)
print(result)
> 1.6895995955
X = ['A','C','T','T','G','A','T','A','C','G'] 
x_intervals = intervals(X, 'End', 'Normal')
result = identifying_information(x_intervals)
print(result)
> 1.6965784285
X = ['A','C','T','T','G','A','T','A','C','G'] 
x_intervals = intervals(X, 'End', 'Redundant')
result = identifying_information(x_intervals)
print(result)
> 1.6373048326
X = ['A','C','T','T','G','A','T','A','C','G'] 
x_intervals = intervals(X, 'End', 'Cycle')
result = identifying_information(x_intervals)
print(result)
> 1.9709505945

X = ['C','C','A','C','G','C','T','T','A','C']
x_intervals = intervals(X, 'Start', 'Lossy')
result = identifying_information(x_intervals)
print(result)
> 1.210777084415
X = ['C','C','A','C','G','C','T','T','A','C']
x_intervals = intervals(X, 'Start', 'Normal')
result = identifying_information(x_intervals)
print(result)
> 1.5661778097771987
X = ['C','C','A','C','G','C','T','T','A','C']
x_intervals = intervals(X, 'End', 'Normal')
result = identifying_information(x_intervals)
print(result)
> 1.35849625
X = ['C','C','A','C','G','C','T','T','A','C']
x_intervals = intervals(X, 'End', 'Redundant')
result = identifying_information(x_intervals)
print(result)
> 1.5294637608763
X = ['C','C','A','C','G','C','T','T','A','C'] 
x_intervals = intervals(X, 'End', 'Cycle')
result = identifying_information(x_intervals)
print(result)
> 1.76096404744368

X = ['C','G'] 
x_intervals = intervals(X, 'Start', 'Normal')
result = identifying_information(x_intervals)
print(result)
> 0.5
X = ['C','G'] 
x_intervals = intervals(X, 'End', 'Normal')
result = identifying_information(x_intervals)
print(result)
> 0.5
X = ['C','G'] 
x_intervals = intervals(X, 'End', 'Redundant')
result = identifying_information(x_intervals)
print(result)
> 0.5849625007
X = ['C','G'] 
x_intervals = intervals(X, 'End', 'Cycle')
result = identifying_information(x_intervals)
print(result)
> 1

X = ['C','C','C','C'] 
x_intervals = intervals(X, 'Start', 'Lossy')
result = identifying_information(x_intervals)
print(result)
> 0
X =['C','C','C','C']  
x_intervals = intervals(X, 'Start', 'Normal')
result = identifying_information(x_intervals)
print(result)
> 0
X = ['C','C','C','C'] 
x_intervals = intervals(X, 'End', 'Normal')
result = identifying_information(x_intervals)
print(result)
> 0
X = ['C','C','C','C'] 
x_intervals = intervals(X, 'End', 'Redundant')
result = identifying_information(x_intervals)
print(result)
> 0
X = ['C','C','C','C'] 
x_intervals = intervals(X, 'End', 'Cycle')
result = identifying_information(x_intervals)
print(result)
> 0

X =  ['A','C','G','T'] 
x_intervals = intervals(X, 'Start', 'Normal')
result = identifying_information(x_intervals)
print(result)
> 1.1462406252
X = ['A','C','G','T'] 
x_intervals = intervals(X, 'End', 'Normal')
result = identifying_information(x_intervals)
print(result)
> 1.1462406252
X = ['A','C','G','T'] 
x_intervals = intervals(X, 'End', 'Redundant')
result = identifying_information(x_intervals)
print(result)
> 1.3219280949
X = ['A','C','G','T'] 
x_intervals = intervals(X, 'End', 'Cycle')
result = identifying_information(x_intervals)
print(result)
> 2

X = ['A','A','A','A','C','G','T'] 
x_intervals = intervals(X, 'Start', 'Lossy')
result = identifying_information(x_intervals)
print(result)
> 0
X = ['A','A','A','A','C','G','T'] 
x_intervals = intervals(X, 'Start', 'Normal')
result = identifying_information(x_intervals)
print(result)
> 1.102035074
X = ['A','A','A','A','C','G','T'] 
x_intervals = intervals(X, 'End', 'Normal')
result = identifying_information(x_intervals)
print(result)
> 0.830626027
X = ['A','A','A','A','C','G','T'] 
x_intervals = intervals(X, 'End', 'Redundant')
result = identifying_information(x_intervals)
print(result)
> 1.3991235932
X = ['A','A','A','A','C','G','T'] 
x_intervals = intervals(X, 'End', 'Cycle')
result = identifying_information(x_intervals)
print(result)
> 1.6644977792

No intervals

X = ['C','G'] 
x_intervals = intervals(X, 'End', 'Lossy')
result = identifying_information(x_intervals)
print(result)
> 1
X = ['A','C','G','T'] 
x_intervals = intervals(X, 'End', 'Lossy')
result = identifying_information(x_intervals)
print(result)
> 1
X = [2, 1] 
x_intervals = intervals(X, 'End', 'Lossy')
result = identifying_information(x_intervals)
print(result)
> 1
@ChainsManipulator ChainsManipulator converted this from a draft issue Oct 14, 2024
@ChainsManipulator ChainsManipulator changed the title Add Entropy characteristic calculator Add Entropy (Amount of Information) characteristic calculator Oct 14, 2024
@ChainsManipulator ChainsManipulator changed the title Add Entropy (Amount of Information) characteristic calculator Add Entropy (Amount of Information / Amount of identifying information) characteristic calculator Oct 14, 2024
@ChainsManipulator ChainsManipulator changed the title Add Entropy (Amount of Information / Amount of identifying information) characteristic calculator Add Amount of identifying information (Amount of Information / Entropy) characteristic calculator Dec 8, 2024
@ChainsManipulator ChainsManipulator moved this from In Progress to Pending review in FOApy V1 - Batman begins Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Pending review
Development

No branches or pull requests

2 participants