Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RWR and TieDIE Integration #92

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
ecec5b6
Create Docker Image for random-walk
Jun 2, 2023
34d3ef3
Implementing src/random_walk.py (questions about the output files)
Jun 3, 2023
aff0893
Updating Dockerfile and src/random_walk.py (new image and single raw_…
Jun 5, 2023
5755c62
Make RWWR working (change Dockerfile for current random-walk-with-res…
Jun 6, 2023
13dd12a
Make the program running (writing the tests); clean up the codes; upd…
Jun 7, 2023
2e6aaa5
Initiate dockerizing TieDIE repo
Jun 7, 2023
c214002
Dockerize TieDIE and integrate it into SPRAS
Jun 8, 2023
78036de
Fixing the format of TieDIE output pathway; need to consider RWwR
Jun 9, 2023
45d5733
Update RandomWalk and TieDIE (add user-defined threshold for RWwR and…
Jun 10, 2023
fafbff9
Complete the tests for RWR and TieDIE (test and GitHub actions); add …
Jun 16, 2023
c958aa3
Merge branch 'master' into RWR_and_TieDIE
agitter Jun 21, 2023
fe863e0
Fixing pre-commit hooks and updating the GitHub Action
Jun 21, 2023
62fc246
Fixing the GitHub Action
Jun 21, 2023
becf7bc
Updating the Docker Image for RWR and TieDIE
Jun 21, 2023
b7d2c94
Completing changes requested
Jun 22, 2023
8e94cf3
Renaming random_walk to rwr
Jun 23, 2023
f467db1
Fix a minor bug in workflows/test-spras.yaml (RandomWalk to RWR)
Jun 29, 2023
7833ea5
Resolved some issues suggested by Neha (updating configfile; adding p…
Jul 4, 2023
f7f2ba4
Merge branch 'Reed-CompBio:master' into RWR_and_TieDIE
Lyce24 Jul 9, 2023
53faeb0
Resolved some issues suggested by Tony (pushed the ReedCompBio Docker…
Jul 15, 2023
b7f007f
Merge branch 'RWR_and_TieDIE' of https://github.com/Lyce24/spras into…
Jul 15, 2023
269c14e
Pre-commit check
Jul 15, 2023
dd75af8
Merge branch 'Reed-CompBio:master' into RWR_and_TieDIE
Lyce24 Jul 19, 2023
1f5a233
Added single source mode to RWR, updated the input file format for RW…
Jul 21, 2023
b2d6a8e
Updated tests for RWR
Jul 21, 2023
4e25538
Minor updates for RWR (param updates; test updates)
Jul 22, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .github/workflows/test-spras.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ jobs:
docker pull reedcompbio/pathlinker:latest
docker pull reedcompbio/meo:latest
docker pull reedcompbio/mincostflow:latest
docker pull erikliu24/rwwr:latest
docker pull erikliu24/tiedie:latest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a reminder to change these to reedcompbio after @annaritz pushes the containers to the organization account. Same goes for the steps below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will make changes to this file when Anna pushes the image to reedcompbio. And the changes will be made also in src/random_walk.py and src/tiedie.py as the run functions in these two files are also pulling images from my personal account.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I built and pushed both of these to DockerHub so you can make the test workflow changes now:

- name: Build Omics Integrator 1 Docker image
uses: docker/build-push-action@v1
with:
Expand Down Expand Up @@ -126,6 +128,24 @@ jobs:
tags: latest
cache_froms: reedcompbio/mincostflow:latest
push: false
- name: Build RWR Docker image
uses: docker/build-push-action@v1
with:
path: docker-wrappers/RandomWalk/.
dockerfile: docker-wrappers/RandomWalk/Dockerfile
repository: eriliu24/rwwr
tags: latest
cache_froms: erikliu24/rwwr:latest
push: false
- name: Build TieDIE Docker image
uses: docker/build-push-action@v1
with:
path: docker-wrappers/TieDIE/.
dockerfile: docker-wrappers/TieDIE/Dockerfile
repository: erikliu24/tiedie
tags: latest
cache_froms: erikliu24/tiedie:latest
push: false

# Run pre-commit checks on source files
pre-commit:
Expand Down
28 changes: 23 additions & 5 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,32 @@
algorithms:
- name: "pathlinker"
params:
include: true
include: false
Lyce24 marked this conversation as resolved.
Show resolved Hide resolved
directed: true
run1:
k: range(100,201,100)

- name: "omicsintegrator1"
- name: "local_neighborhood"
params:
include : false
directed : false

Lyce24 marked this conversation as resolved.
Show resolved Hide resolved
- name: "random_walk"
Lyce24 marked this conversation as resolved.
Show resolved Hide resolved
params:
include : true
directed : true

- name: "tiedie"
params:
include: true
directed: true
run1:
pagerank: [true]
s: [1.1]

- name: "omicsintegrator1"
params:
include: false
Lyce24 marked this conversation as resolved.
Show resolved Hide resolved
directed: false
run1:
r: [5]
Expand All @@ -46,7 +64,7 @@

- name: "omicsintegrator2"
params:
include: true
include: false
Lyce24 marked this conversation as resolved.
Show resolved Hide resolved
directed: false
run1:
b: [4]
Expand All @@ -56,7 +74,7 @@
g: [3]
- name: "meo"
params:
include: true
Lyce24 marked this conversation as resolved.
Show resolved Hide resolved
include: false
directed: true
run1:
max_path_length: [3]
Expand All @@ -65,7 +83,7 @@

- name: "mincostflow"
params:
include: true
Lyce24 marked this conversation as resolved.
Show resolved Hide resolved
include: false
directed: false
run1:
flow: [1] # The flow must be an int
Expand Down
10 changes: 10 additions & 0 deletions docker-wrappers/RandomWalk/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
FROM python:3.10.7

WORKDIR /RandomWalk

# installing essential packages
RUN pip install networkx==2.8
RUN pip install numpy==1.24.3
RUN pip install scipy==1.10.1

RUN wget https://raw.githubusercontent.com/Reed-CompBio/random-walk-with-restart/ef6bd61e0c866c13205ae94c1301827817dc1abb/random_walk.py
29 changes: 29 additions & 0 deletions docker-wrappers/RandomWalk/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# RWwR Docker image

A Docker image for the random-walk-with-start algorithm that is available on [DockerHub](https://hub.docker.com/repository/docker/erikliu24/rwwr).

To create the Docker image run:
```
docker build -t eriliu24/RandomWalk -f Dockerfile .
```
from this directory.

To inspect the installed Python packages:
```
winpty docker run erikliu24/rwwr pip list
```
The `winpty` prefix is only needed on Windows.

## Testing
Test code is located in `test/RandomWalk`.
The `input` subdirectory contains test files `source_nodes.txt`, `target_nodes.txt` and `edges.txt`.
The Docker wrapper can be tested with `pytest`.

Alternatively, to test the Docker image directly, run the following command from the root of the `spras` repository
```
docker run -w /data --mount type=bind,source=/${PWD},target=/data erikliu24/rwwr python random_walk.py \
/data/test/RandomWalk/input/edges.txt /data/test/RandomWalk/input/source_nodes.txt /data/test/RandomWalk/input/target_nodes.txt --damping_factor 0.85 --selection_function min --threshold 0.001 --output_file /data/test/RandomWalk/output/output.txt
```
This will run RWR on the test input files and write the output files to the root of the `spras` repository.
Windows users may need to escape the absolute paths so that `/data` becomes `//data`, etc.

11 changes: 11 additions & 0 deletions docker-wrappers/TieDIE/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
FROM python:2.7.15

WORKDIR /TieDIE

COPY requirements.txt .
RUN pip install -r requirements.txt && \
commit=c64ab5c4b4e0f6cfac4b5151c7d9f1d7ea331e65 && \
wget https://github.com/Reed-CompBio/TieDIE/tarball/$commit && \
tar -zxvf $commit && \
rm $commit && \
mv Reed-CompBio-TieDIE-*/* .
21 changes: 21 additions & 0 deletions docker-wrappers/TieDIE/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# TieDIE Docker image

A Docker image for [TieDIE](https://github.com/epaull/TieDIE) that is available on [DockerHub](https://hub.docker.com/r/erikliu24/tiedie).
Lyce24 marked this conversation as resolved.
Show resolved Hide resolved

To create the Docker image run:
```
docker build -t erikliu24/tiedie -f Dockerfile .
Lyce24 marked this conversation as resolved.
Show resolved Hide resolved
```
from this directory.

To inspect the installed Python packages:
```
winpty docker run erikliu24/tiedie pip list
```
The `winpty` prefix is only needed on Windows.

## Testing
Test code is located in `test/TieDIE`.
The `input` subdirectory contains test files `pathway.txt`, `target.txt` and `source.txt`.
The Docker wrapper can be tested with `pytest`.
Lyce24 marked this conversation as resolved.
Show resolved Hide resolved

Binary file added docker-wrappers/TieDIE/Tutorial.pdf
Lyce24 marked this conversation as resolved.
Show resolved Hide resolved
Binary file not shown.
3 changes: 3 additions & 0 deletions docker-wrappers/TieDIE/downstream.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
E 1 +
F 1 +
G 1 +
6 changes: 6 additions & 0 deletions docker-wrappers/TieDIE/pathway.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
A -a> D
B -a> D
C -a> D
D -a> E
D -a> F
D -a> G
3 changes: 3 additions & 0 deletions docker-wrappers/TieDIE/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
networkx==1.11
numpy==1.11.3
scipy==0.18.1
3 changes: 3 additions & 0 deletions docker-wrappers/TieDIE/upstream.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
A 1 +
B 1 +
C 1 +
155 changes: 155 additions & 0 deletions src/random_walk.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
import warnings
from pathlib import Path

import pandas as pd

from src.prm import PRM
from src.util import add_rank_column, prepare_volume, run_container

__all__ = ['RandomWalk']

class RandomWalk(PRM):
# we need edges (weighted), source set (with prizes), and target set (with prizes).
required_inputs = ['edges', 'sources', 'targets']

@staticmethod
def generate_inputs(data, filename_map):
"""
Access fields from the dataset and write the required input files
@param data: dataset
@param filename_map: a dict mapping file types in the required_inputs to the filename for that type
@return:
"""
# ensures the required input are within the filename_map
for input_type in RandomWalk.required_inputs:
if input_type not in filename_map:
raise ValueError(f"{input_type} filename is missing")

# will take the sources and write them to files, and repeats with targets
for node_type in ['sources', 'targets']:
nodes = data.request_node_columns([node_type])
# check if the nodes have prizes or not
if data.contains_node_columns('prize'):
node_df = data.request_node_columns(['prize'])
nodes = pd.merge(nodes, node_df, on='NODEID')
# creates with the node type without headers
nodes.to_csv(filename_map[node_type], index=False, sep= " ", columns=['NODEID', 'prize'])
else:
#If there aren't prizes but are sources and targets, make prizes based on them
nodes = data.request_node_columns([node_type])
# make all nodes have a prize of 1
nodes['prize'] = 1.0
# creates with the node type without headers
nodes.to_csv(filename_map[node_type], index=False, sep= " ", columns=['NODEID', 'prize'])

# create the network of edges
edges = data.get_interactome()

# creates the edges files that contains the head and tail nodes and the weights after them
edges.to_csv(filename_map['edges'], sep=" ", index=False, columns=["Interactor1","Interactor2","Weight"])


# Skips parameter validation step
@staticmethod
def run(edges=None, sources=None, targets = None, output_file = None, df : float = 0.85, f : str = 'min' , threshold : float = 0.0001, singularity=False):
"""
Run RandomWalk with Docker
@param nodetypes: input node types with sources and targets (required)
@param network: input network file (required)
@param output_file: path to the output pathway file (required)
@param df: damping factor for restarting (default 0.85) (optional)
@param f: selection function (default 'min') (optional)
@param threshold: threshold for constructing the final pathway (default 0.0001) (optional)
@param singularity: if True, run using the Singularity container instead of the Docker container
"""

if not edges or not sources or not targets or not output_file:
raise ValueError('Required RandomWalk arguments are missing')

work_dir = '/spras'

# Each volume is a tuple (src, dest) - data generated by Docker
volumes = list()

bind_path, edges_file = prepare_volume(edges, work_dir)
volumes.append(bind_path)

bind_path, sources_file = prepare_volume(sources, work_dir)
volumes.append(bind_path)

bind_path, targets_file = prepare_volume(targets, work_dir)
volumes.append(bind_path)


out_dir = Path(output_file).parent
# RandomWalk requires that the output directory exist
out_dir.mkdir(parents=True, exist_ok=True)
bind_path, mapped_out_dir = prepare_volume(str(out_dir), work_dir)
volumes.append(bind_path)
mapped_out_prefix= mapped_out_dir + '/out' # Use posix path inside the container


command = ['python',
'/RandomWalk/random_walk.py',
'--edges_file', edges_file,
'--sources_file', sources_file,
'--targets_file', targets_file,
'--damping_factor', str(df),
'--selection_function', f,
'--threshold', str(threshold),
'--output_file', mapped_out_prefix]

print('Running RandomWalk with arguments: {}'.format(' '.join(command)), flush=True)


container_framework = 'singularity' if singularity else 'docker'
out = run_container(container_framework,
'erikliu24/rwwr',
command,
volumes,
work_dir)
print(out)

output = Path(out_dir, 'out')
output.rename(output_file)

# From edge_output_file, construct a pathway file in the universal format
# 1. Stop when the source and targets are connected.

@staticmethod
def parse_output(raw_pathway_file, standardized_pathway_file):
"""
Convert a predicted pathway into the universal format
@param raw_pathway_file: pathway file produced by an algorithm's run function
@param standardized_pathway_file: the same pathway written in the universal format
"""
print('Parsing random-walk-with-restart output')

df = pd.read_csv(raw_pathway_file, sep="\t")

pathway_output_file = standardized_pathway_file
edge_output_file = standardized_pathway_file.replace('.txt', '') + '_edges.txt'
node_output_file = standardized_pathway_file.replace('.txt', '') + '_nodes.txt'

# get all rows where type is 1
df_edge = df.loc[df["Type"] == 1]

# get rid of the placeholder column and output it to a file
df_edge = df_edge.drop(columns=['Placeholder'])
df_edge = df_edge.drop(columns=['Type'])
df_edge.to_csv(edge_output_file, sep="\t", index=False, header=True)

# locate the first place where placeholder is not Nan
df_node = df.loc[df['Type'] == 2]
# rename the header to Node, Pr, R_Pr, Final_Pr
df_node = df_node.drop(columns=['Type'])
df_node = df_node.rename(columns={'Node1': 'Node', 'Node2': 'Pr', 'Weight': 'R_Pr', 'Placeholder': 'Final_Pr'})
df_node.to_csv(node_output_file, sep="\t", index=False, header=True)

df_pathway = df.loc[df['Type'] == 3]
df_pathway = df_pathway.drop(columns=['Placeholder'])
df_pathway = df_pathway.drop(columns=['Type'])
df_pathway = df_pathway.drop(columns=['Weight'])
# add a colum of 1 to represent the rank
df_pathway = add_rank_column(df_pathway)
df_pathway.to_csv(pathway_output_file, sep="\t", index=False, header=False)
2 changes: 2 additions & 0 deletions src/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
from src.omicsintegrator1 import OmicsIntegrator1 as omicsintegrator1
from src.omicsintegrator2 import OmicsIntegrator2 as omicsintegrator2
from src.pathlinker import PathLinker as pathlinker
from src.random_walk import RandomWalk as random_walk
from src.tiedie import TieDIE as tiedie


def run(algorithm, params):
Expand Down
Loading