Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RWR pathway reconstruction algorithm #148

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Lyce24
Copy link

@Lyce24 Lyce24 commented Feb 21, 2024

This pull request replaces the RWR part of pull request #92. Changes are made to adapt the updated version of SPRAS with directionality.

@annaritz annaritz self-requested a review March 6, 2024 19:18
Copy link
Contributor

@annaritz annaritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some clarifications will be helpful, especially about:

  1. The logic of sources/targets/prize files
  2. What happens if there are no sources (or if there are no targets in the source-target mode)
  3. Whether you want to have separate output files or keep all outputs in a single file.

I also will test these changes locally.

__all__ = ['RWR']

"""
RWR will construct a directed graph from the provided input file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to note that RWR assumes a directed graph. "From the provided input file" is vague, especially if the user can input a network file and a node prizes file.

- an edge is represented with a head and tail node, which represents the direction of the interation between two nodes
- uses networkx Digraph() object

Expected raw input format:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is edge flux for the input file requirements? And this is the network file, correct?

- the 'type' column should be 1 for edges, 2 for nodes, and 3 for pathways as we want to keep information about nodes, edges, and pathways.
- it can include repeated and bidirectional edges

Expected raw input format for prizes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the prizes file optional, or is it required and your code will generate a stub file with all nodes set to 1.0? Also, I think the description here is for the network file, not the node file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be renamed to focus on sources/targets, with prizes as a secondary attribute. RWR requires at least sources to run, even if there are no targets or prizes, right?

sources_targets = data.request_node_columns(["sources", "targets"])
if sources_targets is None:
if data.contains_node_columns('prize'):
sources_targets = data.request_node_columns(['prize'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you are assuming that if there is NO sources file but there's a prize file, all the nodes with a prize are sources, right? Add comments to clearly descibe the logic here.

raise ValueError(f"{input_type} filename is missing")

sources_targets = data.request_node_columns(["sources", "targets"])
if sources_targets is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there are targets but no sources?

node_df = data.request_node_columns(['prize'])
input_df = pd.merge(input_df, node_df, on='NODEID')
else:
#If there aren't prizes but are sources and targets, make prizes based on them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor - rephrase to be "If there aren't prizes but there are sources and targets, set their prize to be 1.0"


# Skips parameter validation step
@staticmethod
def run(edges=None, prizes = None, output_file = None, single_source = None, df = None, w = None, f = None, threshold = None, container_framework="docker"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should single_source be a Boolean?

df_node = df_node.rename(columns={'Node1': 'Node', 'Node2': 'Pr', 'Edge Flux': 'R_Pr', 'Weight': 'Final_Pr', 'InNetwork' : 'InNetwork'})
df_node.to_csv(node_output_file, sep="\t", index=False, header=True)

df_pathway = df.loc[df['Type'] == 3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about that Type 3 is. Is this the subgraph after filtering applying the selection function and threshold? A few more comments here (or at the top) about these output types would be helpful.


TEST_DIR = 'test/RWR/'
OUT_FILE_DEFAULT = TEST_DIR+'output/rwr-edges.txt'
OUT_FILE_OPTIONAL = TEST_DIR+'output/rwr-edges-optional.txt'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename OPTIONAL to OPTIONS or OPTIONAL_ARGS - I think that's what you mean (the optional arguments)



@staticmethod
def parse_output(raw_pathway_file, standardized_pathway_file):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now remember that we talked about how your RWR outputs everything as a single file, so you can parse pieces of it in parse_output(). Would you rather output three separate output files? That would be cleaner conceptually. We can ask the group about how to parse multiple output files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants