Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RWR and TieDIE Integration #92

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
ecec5b6
Create Docker Image for random-walk
Jun 2, 2023
34d3ef3
Implementing src/random_walk.py (questions about the output files)
Jun 3, 2023
aff0893
Updating Dockerfile and src/random_walk.py (new image and single raw_…
Jun 5, 2023
5755c62
Make RWWR working (change Dockerfile for current random-walk-with-res…
Jun 6, 2023
13dd12a
Make the program running (writing the tests); clean up the codes; upd…
Jun 7, 2023
2e6aaa5
Initiate dockerizing TieDIE repo
Jun 7, 2023
c214002
Dockerize TieDIE and integrate it into SPRAS
Jun 8, 2023
78036de
Fixing the format of TieDIE output pathway; need to consider RWwR
Jun 9, 2023
45d5733
Update RandomWalk and TieDIE (add user-defined threshold for RWwR and…
Jun 10, 2023
fafbff9
Complete the tests for RWR and TieDIE (test and GitHub actions); add …
Jun 16, 2023
c958aa3
Merge branch 'master' into RWR_and_TieDIE
agitter Jun 21, 2023
fe863e0
Fixing pre-commit hooks and updating the GitHub Action
Jun 21, 2023
62fc246
Fixing the GitHub Action
Jun 21, 2023
becf7bc
Updating the Docker Image for RWR and TieDIE
Jun 21, 2023
b7d2c94
Completing changes requested
Jun 22, 2023
8e94cf3
Renaming random_walk to rwr
Jun 23, 2023
f467db1
Fix a minor bug in workflows/test-spras.yaml (RandomWalk to RWR)
Jun 29, 2023
7833ea5
Resolved some issues suggested by Neha (updating configfile; adding p…
Jul 4, 2023
f7f2ba4
Merge branch 'Reed-CompBio:master' into RWR_and_TieDIE
Lyce24 Jul 9, 2023
53faeb0
Resolved some issues suggested by Tony (pushed the ReedCompBio Docker…
Jul 15, 2023
b7f007f
Merge branch 'RWR_and_TieDIE' of https://github.com/Lyce24/spras into…
Jul 15, 2023
269c14e
Pre-commit check
Jul 15, 2023
dd75af8
Merge branch 'Reed-CompBio:master' into RWR_and_TieDIE
Lyce24 Jul 19, 2023
1f5a233
Added single source mode to RWR, updated the input file format for RW…
Jul 21, 2023
b2d6a8e
Updated tests for RWR
Jul 21, 2023
4e25538
Minor updates for RWR (param updates; test updates)
Jul 22, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .github/workflows/test-spras.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ jobs:
docker pull reedcompbio/pathlinker:latest
docker pull reedcompbio/meo:latest
docker pull reedcompbio/mincostflow:latest
docker pull reedcompbio/random-walk-with-restart:latest
docker pull reedcompbio/tiedie:latest
- name: Build Omics Integrator 1 Docker image
uses: docker/build-push-action@v1
with:
Expand Down Expand Up @@ -126,6 +128,24 @@ jobs:
tags: latest
cache_froms: reedcompbio/mincostflow:latest
push: false
- name: Build RWR Docker image
uses: docker/build-push-action@v1
with:
path: docker-wrappers/RWR/.
dockerfile: docker-wrappers/RWR/Dockerfile
repository: reedcompbio/random-walk-with-restart
tags: latest
cache_froms: reedcompbio/random-walk-with-restart:latest
push: false
- name: Build TieDIE Docker image
uses: docker/build-push-action@v1
with:
path: docker-wrappers/TieDIE/.
dockerfile: docker-wrappers/TieDIE/Dockerfile
repository: reedcompbio/tiedie
tags: latest
cache_froms: reedcompbio/tiedie:latest
push: false

# Run pre-commit checks on source files
pre-commit:
Expand Down
18 changes: 18 additions & 0 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,24 @@
run1:
k: range(100,201,100)

- name: "rwr"
params:
include : true
directed : true
run1:
single_source: [1, 0]
df: [0.85, 0.75]
threshold: [0.01, 0.05]
w : [0.02, 0.05]

- name: "tiedie"
params:
include: true
directed: true
run1:
pagerank: [true]
s: [1.1]

- name: "omicsintegrator1"
params:
include: true
Expand Down
8 changes: 8 additions & 0 deletions docker-wrappers/RWR/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FROM python:3.10.7

WORKDIR /RWR

# installing essential packages
RUN pip install networkx==2.8 numpy==1.24.3 scipy==1.10.1

RUN wget https://raw.githubusercontent.com/Reed-CompBio/random-walk-with-restart/8ca6969fb2fc744edd544535e2ebd67217b0606c/random_walk.py
29 changes: 29 additions & 0 deletions docker-wrappers/RWR/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# RWR Docker image

A Docker image for the random-walk-with-start algorithm that is available on [DockerHub](https://hub.docker.com/repository/docker/reedcompbio/random-walk-with-restart).

To create the Docker image run:
```
docker build -t reedcompbio/random-walk-with-restart -f Dockerfile .
```
from this directory.

To inspect the installed Python packages:
```
winpty docker run reedcompbio/random-walk-with-restart pip list
```
The `winpty` prefix is only needed on Windows.

## Testing
Test code is located in `test/RWR`.
The `input` subdirectory contains test files `source_nodes.txt`, `target_nodes.txt` and `edges.txt`.
The Docker wrapper can be tested with `pytest` or a unit test with `pytest -k test_rwr.py`.

Alternatively, to test the Docker image directly, run the following command from the root of the `spras` repository
```
docker run -w /data --mount type=bind,source=/${PWD},target=/data reedcompbio/random-walk-with-restart python random_walk.py \
/data/test/RWR/input/edges.txt /data/test/RWR/input/source_nodes.txt /data/test/RWR/input/target_nodes.txt --damping_factor 0.85 --selection_function min --threshold 0.001 --w 0.0001 --output_file /data/test/RWR/output/output.txt
```
This will run RWR on the test input files and write the output files to the root of the `spras` repository.
Windows users may need to escape the absolute paths so that `/data` becomes `//data`, etc.

11 changes: 11 additions & 0 deletions docker-wrappers/TieDIE/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
FROM python:2.7.15

WORKDIR /TieDIE

COPY requirements.txt .
RUN pip install -r requirements.txt && \
commit=c64ab5c4b4e0f6cfac4b5151c7d9f1d7ea331e65 && \
wget https://github.com/Reed-CompBio/TieDIE/tarball/$commit && \
tar -zxvf $commit && \
rm $commit && \
mv Reed-CompBio-TieDIE-*/* .
20 changes: 20 additions & 0 deletions docker-wrappers/TieDIE/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# TieDIE Docker image

A Docker image for [TieDIE](https://github.com/Reed-CompBio/TieDIE) that is available on [DockerHub](https://hub.docker.com/r/reedcompbio/tiedie).

To create the Docker image run:
```
docker build -t reedcompbio/tiedie -f Dockerfile .
```
from this directory.

To inspect the installed Python packages:
```
winpty docker run reedcompbio/tiedie pip list
```
The `winpty` prefix is only needed on Windows.

## Testing
Test code is located in `test/TieDIE`.
The `input` subdirectory contains test files `pathway.txt`, `target.txt` and `source.txt`.
The Docker wrapper can be tested with `pytest` or a unit test with `pytest -k test_tiedie.py`.
3 changes: 3 additions & 0 deletions docker-wrappers/TieDIE/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
networkx==1.11
numpy==1.11.3
scipy==0.18.1
2 changes: 2 additions & 0 deletions src/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
from src.omicsintegrator1 import OmicsIntegrator1 as omicsintegrator1
from src.omicsintegrator2 import OmicsIntegrator2 as omicsintegrator2
from src.pathlinker import PathLinker as pathlinker
from src.rwr import RWR as rwr
from src.tiedie import TieDIE as tiedie


def run(algorithm, params):
Expand Down
170 changes: 170 additions & 0 deletions src/rwr.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
import warnings
from pathlib import Path

import pandas as pd

from src.prm import PRM
from src.util import add_rank_column, prepare_volume, run_container

__all__ = ['RWR']

class RWR(PRM):
# we need edges (weighted), source set (with prizes), and target set (with prizes).
required_inputs = ['edges', 'prizes']

@staticmethod
def generate_inputs(data, filename_map):
"""
Access fields from the dataset and write the required input files
@param data: dataset
@param filename_map: a dict mapping file types in the required_inputs to the filename for that type
"""
# ensures the required input are within the filename_map
for input_type in RWR.required_inputs:
if input_type not in filename_map:
raise ValueError(f"{input_type} filename is missing")

sources_targets = data.request_node_columns(["sources", "targets"])
if sources_targets is None:
if data.contains_node_columns('prize'):
sources_targets = data.request_node_columns(['prize'])
input_df = sources_targets[["NODEID"]].copy()
input_df["Node type"] = "source"
else:
raise ValueError("No sources, targets, or prizes found in dataset")
else:
both_series = sources_targets.sources & sources_targets.targets
for _index,row in sources_targets[both_series].iterrows():
warn_msg = row.NODEID+" has been labeled as both a source and a target."
# Only use stacklevel 1 because this is due to the data not the code context
warnings.warn(warn_msg, stacklevel=1)

#Create nodetype file
input_df = sources_targets[["NODEID"]].copy()
input_df.loc[sources_targets["sources"] == True,"Node type"]="source"
input_df.loc[sources_targets["targets"] == True,"Node type"]="target"

if data.contains_node_columns('prize'):
node_df = data.request_node_columns(['prize'])
input_df = pd.merge(input_df, node_df, on='NODEID')
else:
#If there aren't prizes but are sources and targets, make prizes based on them
input_df['prize'] = 1.0

input_df.to_csv(filename_map["prizes"],sep="\t",index=False,columns=["NODEID", "prize", "Node type"])

# create the network of edges
edges = data.get_interactome()

# creates the edges files that contains the head and tail nodes and the weights after them
edges.to_csv(filename_map['edges'], sep="\t", index=False, columns=["Interactor1","Interactor2","Weight"])


# Skips parameter validation step
@staticmethod
def run(edges=None, prizes = None, output_file = None, single_source = None, df = None, w = None, f = None, threshold = None, singularity = False):
"""
Run RandomWalk with Docker
@param edges: input network file (required)
@param prizes: input node prizes with sources and targets (required)
@param output_file: path to the output pathway file (required)
@param df: damping factor for restarting (default 0.85) (optional)
@param single_source: 1 for single source, 0 for source-target (default 1) (optional)
@param w: lower bound to filter the edges based on the edge confidence (default 0.00) (optional)
@param f: selection function (default 'min') (optional)
@param threshold: threshold for constructing the final pathway (default 0.0001) (optional)
@param singularity: if True, run using the Singularity container instead of the Docker container
"""

if not edges or not prizes or not output_file:
raise ValueError('Required RWR arguments are missing')

work_dir = '/spras'

# Each volume is a tuple (src, dest) - data generated by Docker
volumes = list()

bind_path, edges_file = prepare_volume(edges, work_dir)
volumes.append(bind_path)

bind_path, prizes_file = prepare_volume(prizes, work_dir)
volumes.append(bind_path)


out_dir = Path(output_file).parent

# RWR requires that the output directory exist
out_dir.mkdir(parents=True, exist_ok=True)
bind_path, mapped_out_dir = prepare_volume(str(out_dir), work_dir)
volumes.append(bind_path)
mapped_out_prefix= mapped_out_dir + '/out' # Use posix path inside the container


command = ['python',
'/RWR/random_walk.py',
'--edges_file', edges_file,
'--prizes_file', prizes_file,
'--output_file', mapped_out_prefix]

if single_source is not None:
command.extend(['--single_source', str(single_source)])
if df is not None:
command.extend(['--damping_factor', str(df)])
if f is not None:
command.extend(['--selection_function', str(f)])
if w is not None:
command.extend(['--w', str(w)])
if threshold is not None:
command.extend(['--threshold', str(threshold)])

print('Running RWR with arguments: {}'.format(' '.join(command)), flush=True)


container_framework = 'singularity' if singularity else 'docker'
out = run_container(container_framework,
'reedcompbio/random-walk-with-restart',
command,
volumes,
work_dir)
print(out)

output = Path(out_dir, 'out')
output.rename(output_file)


@staticmethod
def parse_output(raw_pathway_file, standardized_pathway_file):
"""
Convert a predicted pathway into the universal format
@param raw_pathway_file: pathway file produced by an algorithm's run function
@param standardized_pathway_file: the same pathway written in the universal format
"""

df = pd.read_csv(raw_pathway_file, sep="\t")

pathway_output_file = standardized_pathway_file
edge_output_file = standardized_pathway_file.replace('.txt', '') + '_edges.txt'
node_output_file = standardized_pathway_file.replace('.txt', '') + '_nodes.txt'

# get all rows where type is 1
df_edge = df.loc[df["Type"] == 1]

# get rid of the placeholder column and output it to a file
df_edge = df_edge.drop(columns=['Type'])
df_edge.to_csv(edge_output_file, sep="\t", index=False, header=True)

# locate the first place where placeholder is not Nan
df_node = df.loc[df['Type'] == 2]
# rename the header to Node, Pr, R_Pr, Final_Pr
df_node = df_node.drop(columns=['Type'])
df_node = df_node.rename(columns={'Node1': 'Node', 'Node2': 'Pr', 'Edge Flux': 'R_Pr', 'Weight': 'Final_Pr', 'InNetwork' : 'InNetwork'})
df_node.to_csv(node_output_file, sep="\t", index=False, header=True)

df_pathway = df.loc[df['Type'] == 3]
df_pathway = df_pathway.drop(columns=['InNetwork'])
df_pathway = df_pathway.drop(columns=['Type'])
df_pathway = df_pathway.drop(columns=['Weight'])
df_pathway = df_pathway.drop(columns=['Edge Flux'])
# add a colum of 1 to represent the rank
df_pathway = add_rank_column(df_pathway)
df_pathway.to_csv(pathway_output_file, sep="\t", index=False, header=False)
Loading