Skip to content

Edit Data Table Then Load

Barry Demchak edited this page Dec 15, 2020 · 7 revisions
  • Recipe Name
  • Intent Show how a data file can be loaded as a Cytoscape table if it needs a few changes before loading it.
  • Motivation Cytoscape can load table data from a number of file formats, provided that the data file is already formatted for Cytoscape consumption. If this is true, see the Load Data Table From File recipe. If not, Python can easily make such edits, first.
  • Applicability The data table must be in a table-oriented format that Pandas' CSV reader can load directly, and must be mappable as a Cytoscape table.
  • Consequences Because the data table is first loaded by Python and manipulated in the Python memory space, it must be transferred to Cytoscape via API call. This requires Python memory and transfer time to Cytoscape. Contrast this to Cytoscape directly loading the table file, which requires no Python memory or transfer time.
  • Implications
  • Sample Code Suppose the data is a tab-separated table in Barabasi/supplementary_tablesS2.txt with the column names as the second line of the file:

Supporting Information Table 2. Network characteristics of diseases. Disease ID Name Disorder class Size (s) Degree (k) 1 "17,20-lyase_deficiency" Endocrine 1 0 3 2-methyl-3-hydroxybutyryl-CoA_dehydrogenase_deficiency Metabolic 1 0

Assume that there is already a Cytoscape node table whose Name column corresponds to the table's Disease ID. When we want to arrange for each row in the table to be added to the Cytoscape node table row where Cytoscape Name column matches the table's Disease ID.

There are three issues that need solving before loading the table into Cytoscape's node table:

  1. The first line is meaningless ... it should be discarded.
  2. The table's Disease ID column appears to be a number, but it will be used as a key to match Name values in the Cytoscape node table. Cytoscape Name values are already of type String.
  3. The table's Name column (in the second line) conflicts with the Name column already present in the node table. So, we must rename the table's Name column.

The following code achieves all three objectives, and then downloads the table to Cytoscape as a node table:

import pandas as df disease_table = df.read_csv('Barabasi/supplementary_tablesS2.txt', sep='\t', header=1, dtype={'Disease ID':str}) disease_table.rename({'Name': 'Disease Name'}, axis=1, inplace=True) disease_table

import py4cytoscape as p4c p4c.load_table_data(disease_table, data_key_column='Disease ID')

  1. The sep='\t' parameter recognizes the file as tab-separated, and the header=1 parameter causes the file line (i.e., 0) to be skipped.
  2. The dtype= parameter defines Disease ID as a string instead of a number.
  3. The .rename() function renames the Name column as Disease Name.
  • Related Recipes Load Data Table From File
Clone this wiki locally