create_windowed_protein_data

Takes protein sequence data as input and generates windowed data as output

win_1_array = input array
window_size = the size window array we are creating
number_of_features = number of features in each row of win_1_array, not counting class ID and row #
terminal_ID_col = column where the terminal ID is, this tells us where sequences start and end.

This function reads in a csv file with each row containing a class ID in
column #0 and number_of_features additional features in each row. 
It creates an array with a class ID in column #0 and "windowed" rows.

The value 9999.9999 represents NaN (not a number) in this case.

For example with number_of_features = 3 and window_size = 3 and 12 rows of data:

 1,1,2,3
 1,2,3,4
-1,3,4,5
 1,4,5,6
-1,5,6,7
-1,6,7,8
 1,7,8,9
 1,8,9,10
 1,9,10,11
-1,10,11,12
 1,11,12,13
-1,12,13,14


 windowed data:


 1,9999.9999,9999.9999,9999.9999,1,2,3,2,3,4
 1,1,2,3,2,3,4,3,4,5
-1,2,3,4,3,4,5,4,5,6
 1,3,4,5,4,5,6,5,6,7
-1,4,5,6,5,6,7,6,7,8
-1,5,6,7,6,7,8,7,8,9
 1,6,7,8,7,8,9,8,9,10
 1,7,8,9,8,9,10,9,10,11
 1,8,9,10,9,10,11,10,11,12
-1,9,10,11,10,11,12,11,12,13
 1,10,11,12,11,12,13,12,13,14
-1,11,12,13,12,13,14,9999.9999,9999.9999,9999.9999

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
create_windowed_data.py		create_windowed_data.py
sample_input.csv		sample_input.csv
sample_output_window_size_3.csv		sample_output_window_size_3.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

create_windowed_protein_data

About

Releases

Packages

Languages

denson/create_windowed_protein_data

Folders and files

Latest commit

History

Repository files navigation

create_windowed_protein_data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages