-
Notifications
You must be signed in to change notification settings - Fork 15
Edit Data Table Then Load
Show how a Cytoscape table can be loaded from a data file that needs editing before loading.
Cytoscape can load table data from a number of file formats, provided that the data file is already formatted for Cytoscape consumption. If this is true, see the Load Data Table From File recipe. If not, Python can easily make such edits, as described below.
The data table must be in a table-oriented format that Pandas' CSV reader can load directly, and it must have a column whose values can be used as a key into the Cytoscape table (i.e., they correspond to values in the Cytoscape table).
Data tables that don't quite match the standard Cytoscape table formats can be loaded into Cytoscape tables anyway.
Because the data table is first loaded by Python and manipulated in the Python memory space, it must be transferred to Cytoscape via API call. This requires both Python memory and wall-clock time to transfer to Cytoscape. Contrast this to Cytoscape directly loading the table file (see Load Data Table From File), which requires no Python memory or transfer time.
Suppose the data is a tab-separated table in Barabasi/supplementary_tablesS2.txt with the column names as the second line of the file, and data in subsequent lines:
Supporting Information Table 2. Network characteristics of diseases.
Disease ID Name Disorder class Size (s) Degree (k)
1 "17,20-lyase_deficiency" Endocrine 1 0
3 2-methyl-3-hydroxybutyryl-CoA_dehydrogenase_deficiency Metabolic 1 0
...
Suppose, too, that there is a Cytoscape node table into which this table should be merged:
shared name name
1 1
2 2
3 3
Assume that the Cytoscape node table's Name column values correspond to Disease ID column values in the new table. There are three issues that need solving before loading the new table into Cytoscape's node table:
- The first line is meaningless, and should be discarded.
- The new table's Disease ID column appears to be a number, but it will be used as a key to match Name values in the Cytoscape node table. Cytoscape Name values are already of type String.
- The new table's Name column (in the second line) conflicts with the Name column already present in Cytoscape's node table.
The following code achieves all three objectives, and then downloads the table to Cytoscape as a node table:
import pandas as df
disease_table = df.read_csv('Barabasi/supplementary_tablesS2.txt', sep='\t', header=1, dtype={'Disease ID':str})
disease_table.rename({'Name': 'Disease Name'}, axis=1, inplace=True)
disease_table
import py4cytoscape as p4c
p4c.load_table_data(disease_table, data_key_column='Disease ID')
- The
sep='\t'
parameter recognizes the file as tab-separated, and theheader=1
parameter causes the first line (i.e., 0) to be skipped. - The
dtype=
parameter defines Disease ID as a string instead of a number. - The
.rename()
function renames the Name column as Disease Name.
Finally, the 'load_table_data()' function transfers the new disease_table
to Cytoscape, and matches its Disease ID column values with the Cytoscape node table's Name values (per the data_key_column=
parameter). The result is a node table containing the new data values:
shared name name Disease ID Name Disorder class Size (s) Degree (k)
1 1 1 "17,20-lyase_deficiency" Endocrine 1 0
2 2
3 3 3 2-methyl-3-hydroxybutyryl-CoA_dehydrogenase_deficiency Metabolic 1 0