Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Header Lines #142

Merged
merged 29 commits into from
Jul 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
2964db1
almost done with adding header files
ntalluri Dec 22, 2023
1b340bb
precommit
ntalluri Dec 22, 2023
5edf8a7
updated summary.py code
ntalluri Dec 23, 2023
733b736
precommit
ntalluri Dec 23, 2023
3964789
added changes to cytoscape
ntalluri Jan 19, 2024
7fc351b
Merge branch 'master' into header
agitter Jan 19, 2024
9d122e1
precommit
ntalluri Jan 24, 2024
ada6cde
update config
ntalluri Feb 9, 2024
0f1cba7
review
ntalluri Feb 9, 2024
a9de3ed
update ml
ntalluri Feb 9, 2024
6a9ea4a
update ml and test cases
ntalluri Feb 9, 2024
f9e989e
update contributing guide
ntalluri Feb 9, 2024
278b761
ml changes
ntalluri Mar 12, 2024
4dfd018
precommit to ml
ntalluri Mar 12, 2024
c6da91a
attempting error checking for empty df read from rpw
ntalluri Mar 12, 2024
c183abf
cleaned up code
ntalluri Mar 13, 2024
01ad342
precommit
ntalluri Mar 13, 2024
129532c
clean up new util func, add new test, add to contributing guide
ntalluri Mar 18, 2024
bbd9075
trying to fix error
ntalluri Mar 18, 2024
0f4510c
testing mcf tester with macos-latest
ntalluri Mar 21, 2024
36d556b
revert
ntalluri Mar 21, 2024
6945db6
Merge branch 'master' into header
agitter Jun 14, 2024
f5b880b
updated contributing guide
ntalluri Jun 14, 2024
322bfa5
updated new ML test files to include headers
ntalluri Jun 14, 2024
1ffe1d7
output docs
ntalluri Jun 17, 2024
7db6ea0
Code review and formatting updates
agitter Jul 4, 2024
cac0638
Resolve merge conflicts
agitter Jul 4, 2024
026e7e0
Bump version to 0.2.0
agitter Jul 4, 2024
d6b019a
Fix post_domino_id_transform
agitter Jul 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/test-spras.yml
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ jobs:
docker pull reedcompbio/mincostflow:latest
docker pull reedcompbio/allpairs:v2
docker pull reedcompbio/domino:latest
docker pull reedcompbio/py4cytoscape:v2
docker pull reedcompbio/py4cytoscape:v3
docker pull reedcompbio/spras:v0.1.0
- name: Build Omics Integrator 1 Docker image
uses: docker/build-push-action@v1
Expand Down Expand Up @@ -154,8 +154,8 @@ jobs:
path: docker-wrappers/Cytoscape/.
dockerfile: docker-wrappers/Cytoscape/Dockerfile
repository: reedcompbio/py4cytoscape
tags: v2
cache_froms: reedcompbio/py4cytoscape:latest
tags: v3
cache_froms: reedcompbio/py4cytoscape:v3
push: false
- name: Build SPRAS Docker image
uses: docker/build-push-action@v1
Expand Down
7 changes: 4 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,9 +154,10 @@ Use the `run_container` utility function to run the command in the container `<u

Implement the `parse_output` function.
The edges in the Local Neighborhood output have the same format as the input, `<vertex1>|<vertex2>`.
Convert these to be tab-separated vertex pairs followed by a tab and a `1` at the end of every line, which indicates all edges have the same rank.
See the `add_rank_column` function in `src.util.py`.
The output should have the format `<vertex1> <vertex2> 1`.
Convert these to be tab-separated vertex pairs followed by a tab `1` and tab `U` at the end of every line, which indicates all edges have the same rank and are undirected.
See the `add_rank_column` and `raw_pathway_df` function in `src.util.py` and `reinsert_direction_col_undirected` function in `src.interactome.py`.
Make sure header = True with column names: ['Node1', 'Node2', 'Rank', 'Direction'] when the file is created.
The output should have the format `<vertex1> <vertex2> 1 U`.

### Step 4: Make the Local Neighborhood wrapper accessible through SPRAS
Import the new class `LocalNeighborhood` in `src/runner.py` so the wrapper functions can be accessed.
Expand Down
17 changes: 17 additions & 0 deletions doc/output.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
## File formats

### Pathway output format
Output pathway files in the standard SPRAS format include a header row and rows providing attributes for each edge.
The header row is `Node1 Node2 Rank Direction`.
Each row lists the two nodes that are connected with an edge, the rank for that edge, and a directionality column to indicate whether the edge is directed or undirected.
The directionality values are either a 'U' for an undirected edge or a 'D' for a directed edge, where the direction is from Node1 to Node2.
Pathways that do not contain ranked edges can output all 1s in the Rank column.

For example:
```
Node1 Node2 Rank Direction
A B 1 D
B C 1 D
B D 2 U
D A 3 U
```
1 change: 1 addition & 0 deletions docker-wrappers/Cytoscape/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ The Docker wrapper can be tested with `pytest`.
## Versions:
- v1: Use supervisord to launch Cytoscape from a Python subprocess, then connect to Cytoscape with py4cytoscape. Only loads undirected pathways. Compatible with Singularity in local testing (Apptainer version 1.2.2-1.el7) but fails in GitHub Actions.
- v2: Add support for edge direction column.
- v3: Add support for header lines in files

## TODO
- Add an auth file for `xvfb-run`
Expand Down
4 changes: 3 additions & 1 deletion docker-wrappers/Cytoscape/cytoscape_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,9 @@ def load_pathways(pathways: List[str], output: str) -> None:
suid = p4c.networks.import_network_from_tabular_file(
file=path,
column_type_list='s,t,x,ea',
delimiters='\t'
delimiters='\t',
first_row_as_column_names=True,

ntalluri marked this conversation as resolved.
Show resolved Hide resolved
)
p4c.networks.rename_network(name, network=suid)

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "spras"
version = "0.1.0"
version = "0.2.0"
description = "Signaling Pathway Reconstruction Analysis Streamliner"
authors = [
{ name = "Anthony Gitter", email = "[email protected]" },
Expand Down
13 changes: 7 additions & 6 deletions spras/allpairs.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
import warnings
from pathlib import Path

import pandas as pd

from spras.containers import prepare_volume, run_container
from spras.interactome import (
convert_directed_to_undirected,
reinsert_direction_col_undirected,
)
from spras.prm import PRM
from spras.util import add_rank_column, raw_pathway_df

__all__ = ['AllPairs']

Expand Down Expand Up @@ -110,7 +109,9 @@ def parse_output(raw_pathway_file, standardized_pathway_file):
@param raw_pathway_file: pathway file produced by an algorithm's run function
@param standardized_pathway_file: the same pathway written in the universal format
"""
df = pd.read_csv(raw_pathway_file, sep='\t', header=None)
df['Rank'] = 1 # add a rank column of 1s since the edges are not ranked.
df = reinsert_direction_col_undirected(df)
df.to_csv(standardized_pathway_file, header=False, index=False, sep='\t')
df = raw_pathway_df(raw_pathway_file, sep='\t', header=None)
if not df.empty:
df = add_rank_column(df)
df = reinsert_direction_col_undirected(df)
df.columns = ['Node1', 'Node2', 'Rank', 'Direction']
df.to_csv(standardized_pathway_file, header=True, index=False, sep='\t')
2 changes: 1 addition & 1 deletion spras/analysis/cytoscape.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ def run_cytoscape(pathways: List[Union[str, PurePath]], output_file: str, contai

print('Running Cytoscape with arguments: {}'.format(' '.join(command)), flush=True)

container_suffix = "py4cytoscape:v2"
container_suffix = "py4cytoscape:v3"
ntalluri marked this conversation as resolved.
Show resolved Hide resolved
out = run_container(container_framework,
container_suffix,
command,
Expand Down
8 changes: 4 additions & 4 deletions spras/analysis/graphspace.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,21 +77,21 @@ def load_graph(path: str) -> Tuple[Union[nx.Graph, nx.DiGraph], bool]:
directed = False

try:
pathways = pd.read_csv(path, sep="\t", header=None)
pathways = pd.read_csv(path, sep="\t", header=0)
except pd.errors.EmptyDataError:
print(f"The file {path} is empty.")
return G, directed
pathways.columns = ["Interactor1", "Interactor2", "Rank", "Direction"]

mask_u = pathways['Direction'] == 'U'
mask_d = pathways['Direction'] == 'D'
pathways.drop(columns=["Direction"])

if mask_u.all():
G = nx.from_pandas_edgelist(pathways, "Interactor1", "Interactor2", ["Rank"])
G = nx.from_pandas_edgelist(pathways, "Node1", "Node2", ["Rank"])
directed = False

elif mask_d.all():
G = nx.from_pandas_edgelist(pathways, "Interactor1", "Interactor2", ["Rank"], create_using=nx.DiGraph())
G = nx.from_pandas_edgelist(pathways, "Node1", "Node2", ["Rank"], create_using=nx.DiGraph())
directed = True
else:
print(f"{path} could not be visualized. GraphSpace does not support mixed direction type graphs currently")
Expand Down
11 changes: 8 additions & 3 deletions spras/analysis/ml.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,13 @@ def summarize_networks(file_paths: Iterable[Union[str, PathLike]]) -> pd.DataFra
with open(file, 'r') as f:
lines = f.readlines()

if len(lines) > 0:
lines.pop(0) # skip header line

edges = []
for line in lines:
parts = line.split('\t')
if len(parts) > 0: # in case of empty line in file
if len(parts) == 4: # empty lines not allowed but empty files are allowed
node1 = parts[0]
node2 = parts[1]
direction = str(parts[3]).strip()
Expand All @@ -54,8 +57,10 @@ def summarize_networks(file_paths: Iterable[Union[str, PathLike]]) -> pd.DataFra
elif direction == "D":
# node order does matter for directed edges
edges.append(DIR_CONST.join([node1, node2]))
else:
ValueError(f"direction is {direction}, rather than U or D")
elif direction != 'Direction':
raise ValueError(f"direction is {direction}, rather than U or D")
elif len(parts) != 0:
raise ValueError(f"In file {file}, expected line {line} to have 4 values, but found {len(parts)} values.")

# getting the algorithm name
p = PurePath(file)
Expand Down
8 changes: 6 additions & 2 deletions spras/analysis/summary.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,12 @@ def summarize_networks(file_paths: Iterable[Path], node_table: pd.DataFrame) ->

# Iterate through each network file path
for file_path in sorted(file_paths):
# Load in the network
nw = nx.read_edgelist(file_path, data=(('weight', float), ('Direction',str)))

with open(file_path, 'r') as f:
lines = f.readlines()[1:] # skip the header line

nw = nx.read_edgelist(lines, data=(('weight', float), ('Direction', str)))

# Save the network name, number of nodes, number edges, and number of connected components
nw_name = str(file_path)
number_nodes = nw.number_of_nodes()
Expand Down
12 changes: 5 additions & 7 deletions spras/domino.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,8 +205,11 @@ def parse_output(raw_pathway_file, standardized_pathway_file):
edges_df['source'] = edges_df['source'].apply(post_domino_id_transform)
edges_df['target'] = edges_df['target'].apply(post_domino_id_transform)
edges_df = reinsert_direction_col_undirected(edges_df)
edges_df.columns = ['Node1', 'Node2', 'Rank', 'Direction']
else:
edges_df = pd.DataFrame(columns=['Node1', 'Node2', 'Rank', 'Direction'])

edges_df.to_csv(standardized_pathway_file, sep='\t', header=False, index=False)
edges_df.to_csv(standardized_pathway_file, sep='\t', header=True, index=False)


def pre_domino_id_transform(node_id):
Expand All @@ -225,9 +228,4 @@ def post_domino_id_transform(node_id):
@param node_id: the node id to transform
@return the node id without the prefix, if it was present, otherwise the original node id
"""
# Use removeprefix if SPRAS ever requires Python >= 3.9
# https://docs.python.org/3/library/stdtypes.html#str.removeprefix
if node_id.startswith(ID_PREFIX):
return node_id[ID_PREFIX_LEN:]
else:
return node_id
return node_id.removeprefix(ID_PREFIX)
25 changes: 12 additions & 13 deletions spras/meo.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
from pathlib import Path

import pandas as pd

from spras.containers import prepare_volume, run_container
from spras.interactome import (
add_directionality_constant,
reinsert_direction_col_directed,
)
from spras.prm import PRM
from spras.util import add_rank_column
from spras.util import add_rank_column, raw_pathway_df

__all__ = ['MEO', 'write_properties']

Expand Down Expand Up @@ -181,13 +179,14 @@ def parse_output(raw_pathway_file, standardized_pathway_file):
@param standardized_pathway_file: the same pathway written in the universal format
"""
# Columns Source Type Target Oriented Weight
df = pd.read_csv(raw_pathway_file, sep='\t')
# Keep only edges that were assigned an orientation (direction)
df = df.loc[df['Oriented']]
# TODO what should be the edge rank?
# Would need to load the paths output file to rank edges correctly
df = add_rank_column(df)
df = reinsert_direction_col_directed(df)

df.to_csv(standardized_pathway_file, columns=['Source', 'Target', 'Rank', "Direction"], header=False,
index=False, sep='\t')
df = raw_pathway_df(raw_pathway_file, sep='\t', header=0)
if not df.empty:
# Keep only edges that were assigned an orientation (direction)
df = df.loc[df['Oriented']]
# TODO what should be the edge rank?
# Would need to load the paths output file to rank edges correctly
df = add_rank_column(df)
df = reinsert_direction_col_directed(df)
df.drop(columns=['Type', 'Oriented', 'Weight'], inplace=True)
df.columns = ['Node1', 'Node2', 'Rank', "Direction"]
df.to_csv(standardized_pathway_file, index=False, sep='\t', header=True)
19 changes: 9 additions & 10 deletions spras/mincostflow.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
from pathlib import Path

import pandas as pd

from spras.containers import prepare_volume, run_container
from spras.interactome import (
convert_undirected_to_directed,
reinsert_direction_col_undirected,
)
from spras.prm import PRM
from spras.util import add_rank_column
from spras.util import add_rank_column, raw_pathway_df

__all__ = ['MinCostFlow']

Expand Down Expand Up @@ -150,10 +148,11 @@ def parse_output(raw_pathway_file, standardized_pathway_file):
@param standardized_pathway_file: the same pathway written in the universal format
"""

df = pd.read_csv(raw_pathway_file, sep='\t', header=None)
df = add_rank_column(df)
# TODO update MinCostFlow version to support mixed graphs
# Currently directed edges in the input will be converted to undirected edges in the output
df = reinsert_direction_col_undirected(df)
df.to_csv(standardized_pathway_file, header=False, index=False, sep='\t')

df = raw_pathway_df(raw_pathway_file, sep='\t', header=None)
if not df.empty:
df = add_rank_column(df)
# TODO update MinCostFlow version to support mixed graphs
# Currently directed edges in the input will be converted to undirected edges in the output
df = reinsert_direction_col_undirected(df)
df.columns = ['Node1', 'Node2', 'Rank', "Direction"]
df.to_csv(standardized_pathway_file, header=True, index=False, sep='\t')
26 changes: 10 additions & 16 deletions spras/omicsintegrator1.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
from pathlib import Path

import pandas as pd

from spras.containers import prepare_volume, run_container
from spras.interactome import reinsert_direction_col_mixed
from spras.prm import PRM
from spras.util import add_rank_column
from spras.util import add_rank_column, raw_pathway_df

__all__ = ['OmicsIntegrator1', 'write_conf']

Expand Down Expand Up @@ -191,16 +189,12 @@ def parse_output(raw_pathway_file, standardized_pathway_file):
# I'm assuming from having read the documentation that we will be passing in optimalForest.sif
# as raw_pathway_file, in which case the format should be edge1 interactiontype edge2.
# if that assumption is wrong we will need to tweak things
try:
df = pd.read_csv(raw_pathway_file, sep='\t', header=None)
except pd.errors.EmptyDataError:
with open(standardized_pathway_file, 'w'):
pass
return

df.columns = ["Edge1", "InteractionType", "Edge2"]
df = add_rank_column(df)
df = reinsert_direction_col_mixed(df, "InteractionType", "pd", "pp")

df.to_csv(standardized_pathway_file, columns=['Edge1', 'Edge2', 'Rank', "Direction"], header=False, index=False,
sep='\t')
df = raw_pathway_df(raw_pathway_file, sep='\t', header=None)
if not df.empty:
df.columns = ["Edge1", "InteractionType", "Edge2"]
df = add_rank_column(df)
df = reinsert_direction_col_mixed(df, "InteractionType", "pd", "pp")
df.drop(columns=['InteractionType'], inplace=True)
df.columns = ['Node1', 'Node2', 'Rank', 'Direction']

df.to_csv(standardized_pathway_file, header=True, index=False, sep='\t')
19 changes: 10 additions & 9 deletions spras/omicsintegrator2.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,12 +149,13 @@ def parse_output(raw_pathway_file, standardized_pathway_file):
# Omicsintegrator2 returns a single line file if no network is found
num_lines = sum(1 for line in open(raw_pathway_file))
if num_lines < 2:
with open(standardized_pathway_file, 'w'):
pass
return
df = pd.read_csv(raw_pathway_file, sep='\t')
df = df[df['in_solution'] == True] # Check whether this column can be empty before revising this line
df = df.take([0, 1], axis=1)
df = add_rank_column(df)
df = reinsert_direction_col_undirected(df)
df.to_csv(standardized_pathway_file, header=False, index=False, sep='\t')
df = pd.DataFrame(columns=['Node1', 'Node2', 'Rank', 'Direction'])
else:
df = pd.read_csv(raw_pathway_file, sep='\t', header=0)
df = df[df['in_solution'] == True] # Check whether this column can be empty before revising this line
df = df.take([0, 1], axis=1)
df = add_rank_column(df)
df = reinsert_direction_col_undirected(df)
df.columns = ['Node1', 'Node2', 'Rank', "Direction"]

df.to_csv(standardized_pathway_file, header=True, index=False, sep='\t')
14 changes: 8 additions & 6 deletions spras/pathlinker.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
import warnings
from pathlib import Path

import pandas as pd

from spras.containers import prepare_volume, run_container
from spras.interactome import (
convert_undirected_to_directed,
reinsert_direction_col_directed,
)
from spras.prm import PRM
from spras.util import raw_pathway_df

__all__ = ['PathLinker']

Expand Down Expand Up @@ -136,7 +135,10 @@ def parse_output(raw_pathway_file, standardized_pathway_file):
@param raw_pathway_file: pathway file produced by an algorithm's run function
@param standardized_pathway_file: the same pathway written in the universal format
"""
# What about multiple raw_pathway_files
df = pd.read_csv(raw_pathway_file, sep='\t').take([0, 1, 2], axis=1)
df = reinsert_direction_col_directed(df)
df.to_csv(standardized_pathway_file, header=False, index=False, sep='\t')
# What about multiple raw_pathway_files?
df = raw_pathway_df(raw_pathway_file, sep='\t', header=0)
if not df.empty:
df = df.take([0, 1, 2], axis=1)
df = reinsert_direction_col_directed(df)
df.columns = ['Node1', 'Node2', 'Rank', "Direction"]
df.to_csv(standardized_pathway_file, header=True, index=False, sep='\t')
ntalluri marked this conversation as resolved.
Show resolved Hide resolved
Loading
Loading