Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Integration of ProteoRE galaxy components in Galaxy-P #535

Open
wants to merge 1,061 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1061 commits
Select commit Hold shift + click to select a range
3c43825
Update add_expression_data.xml
yvandenb Mar 7, 2019
b07edb1
modif for the make_dotplot function
combesf Mar 7, 2019
be4872f
Merge branch 'master' of https://github.com/ifb-git/ProteoRE
combesf Mar 7, 2019
29742cd
typo
combesf Mar 7, 2019
fd41122
add expression data: copy paste ids fix
davidchristiany Mar 7, 2019
70623e3
typo again
combesf Mar 7, 2019
77365da
Merge branch 'master' of https://github.com/ifb-git/ProteoRE
davidchristiany Mar 7, 2019
dc43865
add expression data: small fix
davidchristiany Mar 7, 2019
ef1fc6c
Update build_protein_interaction_maps.xml
yvandenb Mar 7, 2019
6a1da10
build protein interaction: humap dictionary modified
davidchristiany Mar 8, 2019
a6106dd
data manager: humap dictionary now incorporate interactant A and B as…
davidchristiany Mar 8, 2019
723ec8e
Merge branch 'master' of https://github.com/ifb-git/ProteoRE
davidchristiany Mar 8, 2019
fe64568
build protein interaction maps: add tool-data
davidchristiany Mar 8, 2019
c9adb94
data manager: nextprot ref file for add protein features
davidchristiany Mar 11, 2019
7e06dde
add protein features: handle data manager update
davidchristiany Mar 11, 2019
c0511de
add protein features: loc file corrected
davidchristiany Mar 11, 2019
7a401bd
filter tool: add error message for wrong column
davidchristiany Mar 11, 2019
f010543
add protein features: nextprotID removed fixed
davidchristiany Mar 11, 2019
8e7f43d
add protein features: nextprot ref of 2018
davidchristiany Mar 12, 2019
ca41f64
data manager: small fix
davidchristiany Mar 12, 2019
07862d0
Update build_protein_interaction_maps.xml
yvandenb Mar 12, 2019
a78dfe4
build protein interaction maps: protein name column for nodes file added
davidchristiany Mar 12, 2019
54c8cac
protein name dictionary added for humap dictionary
davidchristiany Mar 12, 2019
251507c
data manager documentation
davidchristiany Mar 13, 2019
b280b5a
get expression profiles: loc file corrected
davidchristiany Mar 13, 2019
fe9749e
version number
davidchristiany Mar 13, 2019
1befc97
filter tool: output 'discarded' en deuxième position
davidchristiany Mar 15, 2019
b0d4004
venn diagram: force user to give list names
davidchristiany Mar 15, 2019
e841cdb
Bug correction of transmb domains nb
combesf Apr 4, 2019
55ac698
Update resource_building.xml
combesf Apr 4, 2019
1aee5b6
Open output file before loop
vloux Apr 17, 2019
dcefa21
Syntax
vloux Apr 17, 2019
4d4cd29
Xml galaxy version
vloux Apr 17, 2019
2846693
bugfix array reinitialisation
vloux Apr 17, 2019
94840df
List reinitialisation
vloux Apr 17, 2019
911b6d9
Add travis configuration file
vloux Apr 24, 2019
8addbd9
Travis yml
vloux Apr 24, 2019
d53895b
Travis yml
vloux Apr 24, 2019
e9368a5
get msms tool : handle multiple ids separated by ';'
davidchristiany May 2, 2019
d5c06a1
data manager : id key of nextprot data table corrected
davidchristiany May 2, 2019
958be32
data manager corrected
davidchristiany May 6, 2019
580a5d2
data table name corrected
davidchristiany May 6, 2019
371b7ef
id converter : one id per line in output
davidchristiany May 6, 2019
afe86b1
id converter : sort and data table new column
davidchristiany May 7, 2019
890e12c
data manager : reviewed uniprot-AC
davidchristiany May 7, 2019
986c196
id converter: remove comments tag in data tables
davidchristiany May 9, 2019
157ee4e
id converter maj
davidchristiany May 9, 2019
37748da
data manager maj
davidchristiany May 9, 2019
50ce41f
add protein data: new column in data table
davidchristiany May 10, 2019
a964b0c
id conv: multiple ids per line in output for some ids
davidchristiany May 10, 2019
9e9e86b
add protein features: fichier ref 7 mai
davidchristiany May 10, 2019
5a25063
id converter: shed file corrected
davidchristiany May 10, 2019
521f7f6
venn diagram : handle multiple ids separated by ';' in a line
davidchristiany May 13, 2019
20d9e92
get expression profiles : handle
davidchristiany May 13, 2019
b942f46
goprofiles : handle multiple ids per line in input
davidchristiany May 14, 2019
1c4adb5
get msms obs : handle ';' separated ids in copy/paste
davidchristiany May 14, 2019
a45e12e
reactome : handle multiple ids per line (';' separated)
davidchristiany May 14, 2019
9274029
kegg visualisation : handle
davidchristiany May 14, 2019
8b760aa
cluster profiler : handle multiple ids per line
davidchristiany May 14, 2019
c865a3e
add protein features : handle ';' in copy/paste
davidchristiany May 14, 2019
bb99bd3
add protein features mouse : handle ';' in copy/paste
davidchristiany May 14, 2019
2c9ad87
id converter : back to multiple ids per line in output
davidchristiany May 16, 2019
1c92193
data tables update to sort (data manager tools)
davidchristiany May 16, 2019
e3c32fa
build protein interaction tool : date format modified for sorting
davidchristiany May 17, 2019
c98c8fb
get expression profiles : sort corrected
davidchristiany May 17, 2019
8ec2dc6
data manager : data tables modified for Build prot prot interaction
davidchristiany May 17, 2019
a4d07c9
add protein features : planemo test
davidchristiany May 23, 2019
3f97bce
build protein interaction : planemo tests
davidchristiany May 24, 2019
dc360cd
shed file corrected
davidchristiany May 24, 2019
09371da
id converter : loc.sample
davidchristiany May 27, 2019
b594c60
id converter : ref files update
davidchristiany May 27, 2019
9fc25a8
id converter : loc.sample corrected
davidchristiany May 27, 2019
d4c0e26
id converter : loc.sample corrected
davidchristiany May 27, 2019
c879e86
data manager update
davidchristiany Jun 3, 2019
9a91e89
cluster profiler : doc
davidchristiany Jun 3, 2019
c66cab5
small fix
davidchristiany Jun 3, 2019
900daaf
topGo : doc + planemo test
davidchristiany Jun 3, 2019
f31b001
topGo : shed file
davidchristiany Jun 3, 2019
433a63c
id converter : remove duplicated lines and NA in dictionary
davidchristiany Jun 4, 2019
3837521
goprofiles : duplicated ids
davidchristiany Jun 4, 2019
d512cd3
goprofiles : small fix
davidchristiany Jun 4, 2019
112e097
releases notes
davidchristiany Jun 11, 2019
0f3b58d
data manager : uniprot ID non reviewed deleted
davidchristiany Jun 13, 2019
fe869fa
id converter: releases modified
davidchristiany Jun 14, 2019
528e5a2
id conv
davidchristiany Jun 14, 2019
d9673b8
Update build_protein_interaction_maps.xml
yvandenb Jun 17, 2019
b6dcf12
Update get_expression_profiles.xml
yvandenb Jun 17, 2019
d712735
Update Get_ms-ms_observations.xml
yvandenb Jun 17, 2019
83dc1e1
Update add_expression_data.xml
yvandenb Jun 17, 2019
bc18a75
Update add_protein_features.xml
yvandenb Jun 17, 2019
23a20b9
id converter: maj doc
davidchristiany Jun 18, 2019
1ca301b
Merge branch 'master' of https://github.com/ifb-git/ProteoRE
davidchristiany Jun 18, 2019
c6f432c
Update reactome_analysis.xml
yvandenb Jun 18, 2019
3f094b4
Update add_protein_features_mouse.xml
yvandenb Jun 18, 2019
5d40a17
Update cluster_profiler.xml
yvandenb Jun 18, 2019
b1605cc
id conv doc
davidchristiany Jun 19, 2019
a3539c2
Update goprofiles.xml
yvandenb Jun 20, 2019
d0cd422
Update build_protein_interaction_maps.xml
yvandenb Jun 24, 2019
e80cd83
Update Build_tissue-specific_expression_dataset.xml
yvandenb Jun 24, 2019
b1b2bde
Update get_expression_profiles.xml
yvandenb Jun 24, 2019
b7eee7b
Update Get_ms-ms_observations.xml
yvandenb Jun 24, 2019
11e0841
Update get_expression_profiles.xml
yvandenb Jun 24, 2019
24e8c54
Update add_expression_data.xml
yvandenb Jun 24, 2019
d400c4d
Update add_protein_features.xml
yvandenb Jun 24, 2019
f94a76b
Update add_protein_features_mouse.xml
yvandenb Jun 24, 2019
01b64d8
Update goprofiles.xml
yvandenb Jun 24, 2019
3fd1a38
Update cluster_profiler.xml
yvandenb Jun 24, 2019
9cf793b
Update filter_kw_val.xml
yvandenb Jun 24, 2019
3b6c711
Update heatmap.xml
yvandenb Jun 24, 2019
1993472
Update id_converter.xml
yvandenb Jun 24, 2019
81ec10e
Update kegg_identification.xml
yvandenb Jun 24, 2019
1b7cebd
Update kegg_maps_visualization.xml
yvandenb Jun 24, 2019
8d822c1
Update reactome_analysis.xml
yvandenb Jun 24, 2019
8656657
Update topGO.xml
yvandenb Jun 24, 2019
59aba70
Update venn_diagram.xml
yvandenb Jun 24, 2019
3aedff6
Merge branch 'master' of https://github.com/ifb-git/ProteoRE
davidchristiany Jun 25, 2019
3095ee6
maj tools doc release 2.0
davidchristiany Jun 27, 2019
ebfe0bb
kegg maps visu variable fixed
davidchristiany Jun 27, 2019
eccea9f
add protein features : variable fix
davidchristiany Jun 27, 2019
b1611d9
09-03-2019 relese removed from add protein features
davidchristiany Jun 27, 2019
607ebe6
shed file corrected
davidchristiany Jun 27, 2019
c67c4ea
cluster profiler: working version
davidchristiany Jun 28, 2019
2ad0846
Update add_protein_features.xml
vloux Jul 11, 2019
a2664c9
venn diagramm font-size
davidchristiany Sep 12, 2019
fab6515
venn diagram version number
davidchristiany Sep 12, 2019
48d77f6
Update goprofiles.xml
vloux Sep 25, 2019
85f9e58
Update goprofiles.xml
vloux Sep 25, 2019
cc15dd1
Update goprofiles.xml
vloux Sep 26, 2019
a0e3880
Update goprofiles.xml
vloux Sep 26, 2019
78ebc9b
Update topGO.xml
vloux Sep 26, 2019
8083bdf
Update topGO.xml
vloux Sep 26, 2019
8b7ae0a
update clusterprofiler
vloux Sep 26, 2019
6ffab3d
Update cluster_profiler.xml
vloux Sep 26, 2019
635d2e7
Update GO_prof_comp.xml
combesf Sep 27, 2019
064c33a
id converter ref file corrected
davidchristiany Oct 14, 2019
7fcf926
Add files for GO_terms_enrich
combesf Nov 7, 2019
47bbc47
new tool
combesf Nov 19, 2019
dd7dd7d
Delete GO_terms_enrich_comparison.R
combesf Nov 19, 2019
2953dc0
Delete GO_terms_enrich_comparison.xml
combesf Nov 19, 2019
611287f
Update GO_terms_enrich_comparison.xml
combesf Dec 10, 2019
0a5af6c
goprofiles requirements corrected
Dec 11, 2019
247be34
Update get_expression_profiles.xml
yvandenb Dec 12, 2019
1c4780d
id converter and data manager update
Dec 12, 2019
3c361af
addition of new tool: add-protein_features_3orga
combesf Dec 19, 2019
91d2110
id_converter xml correction
Dec 20, 2019
c89a9a7
Merge branch 'master' of https://github.com/ifb-git/ProteoRE
Jan 6, 2020
127a259
kegg visu: hsa04215 issue corrected
Jan 8, 2020
31d6d57
data manager update idpamming
Jan 8, 2020
d450aca
id converter update
Jan 9, 2020
0f51019
venn diagram tool fix
Jan 10, 2020
e3bde27
Update add_protein_features_3orga.xml
yvandenb Jan 20, 2020
854a081
Update GO_terms_enrich_comparison.xml
yvandenb Jan 20, 2020
1377193
Update GO_terms_enrich_comparison.xml
combesf Jan 21, 2020
a725350
Update in doc section
combesf Jan 21, 2020
348be64
Update GO_terms_enrich_comparison.xml
combesf Jan 21, 2020
c32be26
data manager HPA full atlas update
Jan 22, 2020
fb593ae
Update GO_terms_enrich_comparison.xml
yvandenb Jan 22, 2020
69d8fd8
buid tissue specific exp dataset: use of protein_atlas loc files
Jan 22, 2020
b5179ca
build specific tissue: update tool
Jan 22, 2020
627d230
Update GO_terms_enrich_comparison.xml
combesf Jan 23, 2020
92cdc75
build tissue specific: small changes
Jan 23, 2020
bb8bb81
add expression data update
Jan 23, 2020
eb5d4d3
add exp data fix
Jan 23, 2020
81a58fe
add exp data fix
Jan 23, 2020
24b74a1
Update heatmap.xml
combesf Jan 23, 2020
038fed4
Update add_protein_features_3orga.xml
combesf Jan 23, 2020
bcd94a1
Update add_protein_features.xml
combesf Jan 23, 2020
c309bbe
Update Build_tissue-specific_expression_dataset.xml
combesf Jan 23, 2020
bcaef81
Update Get_ms-ms_observations.xml
combesf Jan 23, 2020
35e955e
Update goprofiles.xml
combesf Jan 23, 2020
011b3bc
Update cluster_profiler.xml
combesf Jan 23, 2020
8d18064
Update GO_terms_enrich_comparison.xml
combesf Jan 23, 2020
3a3cf43
Update topGO.xml
combesf Jan 23, 2020
4e22761
Update reactome_analysis.xml
combesf Jan 23, 2020
84eb5aa
Update kegg_identification.xml
combesf Jan 23, 2020
50b6e09
Update kegg_maps_visualization.xml
combesf Jan 23, 2020
a18326c
Update build_protein_interaction_maps.xml
combesf Jan 23, 2020
c3cdc8f
Update heatmap.xml
combesf Jan 23, 2020
8a630e2
Update add_protein_features_3orga.xml
combesf Jan 23, 2020
caddf10
Update add_protein_features.xml
combesf Jan 23, 2020
fcbb3a8
Update Get_ms-ms_observations.xml
combesf Jan 23, 2020
3bd2203
Update goprofiles.xml
combesf Jan 23, 2020
d0a3bc4
Update add_expression_data.xml
yvandenb Jan 23, 2020
907e633
Update Build_tissue-specific_expression_dataset.xml
yvandenb Jan 23, 2020
103c33f
number version update
Jan 24, 2020
7a0651c
add get_unique_peptide_srm_method
Jan 24, 2020
f02a283
Update build_protein_interaction_maps.xml
combesf Jan 28, 2020
9259776
Update GO_prof_comp.xml
combesf Jan 28, 2020
0b437fc
Update prot_prot_interaction.xml
combesf Jan 28, 2020
f3abfb3
Update proteore_get_unique_peptide_SRM-MRM_method.xml
combesf Jan 30, 2020
b237562
Update proteore_get_unique_peptide_SRM-MRM_method.xml
combesf Jan 30, 2020
395dcb3
data manager and build tissue update
Feb 3, 2020
6260b31
fix build tissue tool
Feb 3, 2020
98692ce
fix build tissue tool
Feb 3, 2020
7b007e4
Update Build_tissue-specific_expression_dataset.xml
yvandenb Feb 3, 2020
84b17e7
fix data manager
Feb 3, 2020
b7fa2d3
fix data manager
Feb 3, 2020
905cb6c
fix cervix, uterine issue on build tissue dataset tool
Feb 3, 2020
16a79f9
remove old tool GO_terms_profiles_comparison
Feb 4, 2020
c37517f
.shed.yml addition
combesf Feb 4, 2020
53b6136
go terms enrich comparison shed file fixed
Feb 4, 2020
e9967e8
Build tissue specific dataset version number
Feb 5, 2020
b511d6d
build tissue specific update
Feb 5, 2020
261d2fb
Build tissue tool version number
Feb 6, 2020
b99330f
number version data manager
Feb 6, 2020
f82c8e7
Update build_protein_interaction_maps.xml
yvandenb Feb 6, 2020
b60ec08
Update build_protein_interaction_maps.xml
yvandenb Feb 6, 2020
841f83f
build protein interaction maps number version
Feb 6, 2020
6efc8e7
kegg visu: remove mmu04723 kegg id from input
Mar 2, 2020
8a6d90b
kegg maps, handle NA in values columns
Mar 5, 2020
8cd6910
add expression data Gene description column fix
Apr 6, 2020
9f67f93
update r-base version for add expression data
Apr 6, 2020
3e6ba07
update get MSMS observations
Apr 30, 2020
05669d1
add r-base to get expression profiles requirements
May 7, 2020
bae4319
# Ceci est la combinaison de 2 commits.
Jun 5, 2020
84973d4
data manager fix for nextprot
Jun 9, 2020
16f9f9e
rebase
Jun 9, 2020
9ef90f8
data manager version
Jun 10, 2020
5991c63
data manager update nextprot
Jul 29, 2020
70f90c7
add_protein_featues update
Jul 30, 2020
b596e90
protein features handle old nextprot releases
Jul 30, 2020
1224faf
update data manager
Jul 30, 2020
f857181
Update add_protein_features.xml
yvandenb Jul 30, 2020
eff96b9
data manager update
Jul 31, 2020
abef519
add protein features ref files update
Aug 17, 2020
7dbce5d
add prot features fix shed file
Aug 17, 2020
5a8f492
rename proteore_id_converter
vloux Jan 6, 2021
f7eb46b
rename proteore_filter_keywords_values
vloux Jan 6, 2021
9068514
rename proteore_venn_diagram
vloux Jan 6, 2021
71648db
rename proteore_heatmap_visualization
vloux Jan 6, 2021
59fe543
rename proteore_prot_features_3orga
vloux Jan 6, 2021
9f06aa7
rename proteore_prot_features
vloux Jan 6, 2021
13c8332
rename proteore_expression_levels_by_tissue
vloux Jan 6, 2021
f17ade3
rename proteore_expression_rnaseq_abbased
vloux Jan 6, 2021
68b80a2
rename proteore_tissue_specific_expression_data
vloux Jan 6, 2021
2f6ad8d
rename proteore_ms_observation_pepatlas
vloux Jan 6, 2021
2f9b4e7
rename proteore_get_unique_peptide_srm_method
vloux Jan 6, 2021
ecae379
rename proteore_goprofiles
vloux Jan 6, 2021
c3948d5
rename proteore_clusterprofiler
vloux Jan 6, 2021
245b5e6
rename proteore_go_terms_enrich_comparison
vloux Jan 6, 2021
bba4f4a
rename proteore_reactome
vloux Jan 6, 2021
1fee028
rename proteore_maps_visualization
vloux Jan 6, 2021
2b3f2c9
rename proteore_kegg_pathways_coverage
vloux Jan 6, 2021
e1aafc0
rename proteore_build_protein_interaction_maps
vloux Jan 6, 2021
b47b48d
rename proteore_data_manager
vloux Jan 6, 2021
0864679
rename proteore_topgo
vloux Jan 6, 2021
fc91623
cleaning
vloux Jan 6, 2021
69df8ca
Merge branch 'ProteoRE'
vloux Jan 6, 2021
9463d32
merge ProteoRE
vloux Jan 6, 2021
0e700e5
Update get_expression_profiles.xml
vloux Jan 7, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
33 changes: 33 additions & 0 deletions tools/proteore_build_protein_interaction_maps/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
categories: [Proteomics]
description: Build protein interaction maps
name: proteore_build_protein_interaction_maps
owner: proteore
include:
- build_protein_interaction_maps.py
- build_protein_interaction_maps.xml
- tool_data_table_conf.xml.sample
- tool_data_table_conf.xml.test
- proteore_humap_dictionaries.loc.sample
- proteore_bioplex_dictionaries.loc.sample
- proteore_biogrid_dictionaries.loc.sample
- tool-data/Human_biogrid_2019-03-01
- tool-data/Human_bioplex_2019-03-01
- tool-data/Human_humap_2019-03-01
- tool-data/Mouse_biogrid_2019-03-01
- tool-data/Rat_biogrid_2019-03-01
- test-data/Lacombe_geneID.tsv
- test-data/Network_biogrid_from_Lacombe_geneID.tsv
- test-data/Network_biogrid_from_Rattus_Hameza_dataset_geneID.tsv
- test-data/Network_biogrid_from_Wilson-foie-souris_geneID.tsv
- test-data/Network_bioplex_from_Lacombe_geneID.tsv
- test-data/Network_humap_from_Lacombe_geneID.tsv
- test-data/Nodes_biogrid_from_Lacombe_geneID.tsv
- test-data/Nodes_biogrid_from_Rattus_Hameza_dataset_geneID.tsv
- test-data/Nodes_biogrid_from_Wilson-foie-souris_geneID.tsv
- test-data/Nodes_bioplex_from_Lacombe_geneID.tsv
- test-data/Nodes_humap_from_Lacombe_geneID.tsv
- test-data/Rattus_Hameza_dataset_geneID.tsv
- test-data/Wilson-foie-souris-up_geneID.tsv
- test-data/cached_locally/proteore_biogrid_dictionaries.loc
- test-data/cached_locally/proteore_bioplex_dictionaries.loc
- test-data/cached_locally/proteore_humap_dictionaries.loc
Original file line number Diff line number Diff line change
@@ -0,0 +1,282 @@
# -*- coding: utf-8 -*-
import csv, json, argparse, re

def get_args() :
parser = argparse.ArgumentParser()
parser.add_argument("--species")
parser.add_argument("--database", help="Humap, Bioplex or Biogrid", required=True)
parser.add_argument("--dict_path", required=True)
parser.add_argument("--input_type", help="type of input (list of id or filename)",required=True)
parser.add_argument("--input", required=True)
parser.add_argument("--header")
parser.add_argument("--ncol")
parser.add_argument("--id_type")
parser.add_argument("--network_output")
parser.add_argument("--nodes_output")
args = parser.parse_args()

if args.input_type=="file" :
args.ncol = nb_col_to_int(args.ncol)
args.header = str2bool(args.header)

return args

#Turn string into boolean
def str2bool(v):
if v.lower() in ('yes', 'true', 't', 'y', '1'):
return True
elif v.lower() in ('no', 'false', 'f', 'n', '0'):
return False
else:
raise argparse.ArgumentTypeError('Boolean value expected.')

#return the column number in int format
def nb_col_to_int(nb_col):
try :
nb_col = int(nb_col.replace("c", "")) - 1
return nb_col
except :
sys.exit("Please specify the column where you would like to apply the filter with valid format")

#return list of (unique) ids from string
def get_input_ids_from_string(input) :
ids_list = list(set(re.split(r'\s+',input.replace(";"," ").replace("\r","").replace("\n"," ").replace("\t"," "))))
if "" in ids_list : ids_list.remove("")
#if "NA" in ids_list : ids_list.remove("NA")
return ids_list

#return input_file and list of unique ids from input file path
def get_input_ids_from_file(input,nb_col,header) :
with open(input, "r") as csv_file :
input_file= list(csv.reader(csv_file, delimiter='\t'))

input_file, ids_list = one_id_one_line(input_file,nb_col,header)
if "" in ids_list : ids_list.remove("")
#if "NA" in ids_list : ids_list.remove("NA")

return input_file, ids_list

#return input file by adding lines when there are more than one id per line
def one_id_one_line(input_file,nb_col,header) :

if header :
new_file = [input_file[0]]
input_file = input_file[1:]
else :
new_file=[]
ids_list=[]

for line in input_file :
if line != [] and set(line) != {''}:
line[nb_col] = re.sub(r"\s+","",line[nb_col])
if ";" in line[nb_col] :
ids = line[nb_col].split(";")
for id in ids :
new_file.append(line[:nb_col]+[id]+line[nb_col+1:])
ids_list.append(id)
else :
new_file.append(line)
ids_list.append(line[nb_col])

ids_list= list(set(ids_list))

return new_file, ids_list

#replace all blank cells to NA
def blank_to_NA(csv_file) :
tmp=[]
for line in csv_file :
line = ["NA" if cell=="" or cell==" " or cell=="NaN" or cell=="-" else cell for cell in line]
tmp.append(line)

return tmp

def biogrid_output_files(ids,species) :
network_file=[["Entrez Gene Interactor A","Entrez Gene Interactor B","Gene symbol Interactor A","Gene symbol Interactor B","Experimental System","Experimental Type","Pubmed ID","Interaction Score","Phenotypes"]]
ids_set= set(ids)
ids_not_found=set([])
for id in ids :
if id in ppi_dict['network'] :
network_file.extend(ppi_dict['network'][id])
ids_set.update([interact[1] for interact in ppi_dict['network'][id]])
else :
ids_not_found.add(id)

nodes_file = [["Entrez gene ID","Official Symbol Interactor","Present in user input ids","ID present in Biogrid "+species,"Pathway"]]
for id in ids_set:
#get pathway
if id in ppi_dict['nodes']:
description_pathway=";".join(ppi_dict['nodes'][id])
else :
description_pathway="NA"

#get gene name
if id in ppi_dict['network']: gene_name = ppi_dict['network'][id][0][2]
else : gene_name="NA"

#make line
nodes_file.append([id]+[gene_name]+[id in ids]+[id not in ids_not_found]+[description_pathway])

return network_file,nodes_file

def bioplex_output_files(ids,id_type,species) :
network_file=[[id_type+" Interactor A",id_type+" Interactor B","Gene symbol Interactor A","Gene symbol Interactor B","Interaction Score"]]
ids_set= set(ids)
ids_not_found=set([])
for id in ids :
if id in ppi_dict['network'][id_type] :
network_file.extend(ppi_dict['network'][id_type][id])
ids_set.update([interact[1] for interact in ppi_dict['network'][id_type][id]])
else :
ids_not_found.add(id)

if id_type=="UniProt-AC" : nodes_file=[[id_type,"Present in user input ids","ID present in Human Bioplex","Pathway"]]
else: nodes_file=[[id_type,"Official symbol Interactor","Present in user input ids","Present in interactome","Pathway"]]
for id in ids_set:

if id in ppi_dict['nodes'][id_type]:
description_pathway=";".join(ppi_dict['nodes'][id_type][id])
else :
description_pathway="NA"

#make line
if id_type=="UniProt-AC":
nodes_file.append([id]+[id in ids]+[id not in ids_not_found]+[description_pathway])
elif id_type=="GeneID":
#get gene_name
if id in ppi_dict['network'][id_type]: gene_name = ppi_dict['network'][id_type][id][0][2]
else : gene_name="NA"
nodes_file.append([id]+[gene_name]+[id in ids]+[id not in ids_not_found]+[description_pathway])

return network_file,nodes_file

def humap_output_files(ids,species) :
network_file=[["Entrez Gene Interactor A","Entrez Gene Interactor B","Gene symbol Interactor A","Gene symbol Interactor B","Interaction Score"]]
ids_set= set(ids)
ids_not_found=set([])
for id in ids :
if id in ppi_dict['network'] :
network_file.extend(ppi_dict['network'][id])
ids_set.update([interact[1] for interact in ppi_dict['network'][id]])
else :
ids_not_found.add(id)

nodes_file = [["Entrez gene ID","Official Symbol Interactor","Present in user input ids","ID present in Hu.MAP","Protein name","Pathway"]]
for id in ids_set:
if id in ppi_dict['nodes']:
description_pathway=";".join(ppi_dict['nodes'][id])
else :
description_pathway="NA"

#get gene name
if id in ppi_dict['gene_name']:
gene_name = ppi_dict['gene_name'][id]
else :
gene_name = "NA"

#get protein name
if id in ppi_dict['protein_name']:
protein_name = ppi_dict['protein_name'][id]
else :
protein_name = "NA"

#make line
nodes_file.append([id]+[gene_name]+[id in ids]+[id not in ids_not_found]+[protein_name]+[description_pathway])

return network_file,nodes_file

#function to sort the csv_file by value in a specific column
def sort_by_column(tab,sort_col,reverse,header):

if len(tab) > 1 : #if there's more than just a header or 1 row
if header :
head=tab[0]
tab=tab[1:]

#list of empty cells in the column to sort
unsortable_lines = [i for i,line in enumerate(tab) if (line[sort_col]=='' or line[sort_col] == 'NA')]
unsorted_tab=[ tab[i] for i in unsortable_lines]
tab= [line for i,line in enumerate(tab) if i not in unsortable_lines]

if only_number(tab,sort_col) and any_float(tab,sort_col) :
tab = sorted(tab, key=lambda row: float(row[sort_col]), reverse=reverse)
elif only_number(tab,sort_col):
tab = sorted(tab, key=lambda row: int(row[sort_col]), reverse=reverse)
else :
tab = sorted(tab, key=lambda row: row[sort_col], reverse=reverse)

tab.extend(unsorted_tab)
if header is True : tab = [head]+tab

return tab

def only_number(tab,col) :

for line in tab :
if not (is_number("float",line[col].replace(",",".")) or is_number("int",line[col].replace(",","."))) :
return False
return True

#Check if a variable is a float or an integer
def is_number(number_format, n):
float_format = re.compile(r"^[-]?[0-9][0-9]*.?[0-9]+$")
int_format = re.compile(r"^[-]?[0-9][0-9]*$")
test = ""
if number_format == "int":
test = re.match(int_format, n)
elif number_format == "float":
test = re.match(float_format, n)
if test:
return True

#return True is there is at least one float in the column
def any_float(tab,col) :

for line in tab :
if is_number("float",line[col].replace(",",".")) :
return True

return False

def main() :

#Get args from command line
global args
args = get_args()

#get PPI dictionary
with open(args.dict_path, 'r') as handle:
global ppi_dict
ppi_dict = json.load(handle)

#Get file and/or ids from input
if args.input_type == "text" :
ids = get_input_ids_from_string(args.input)
elif args.input_type == "file" :
input_file, ids = get_input_ids_from_file(args.input,args.ncol,args.header)

#create output files
if args.database=="biogrid":
network_file, nodes_file = biogrid_output_files(ids,args.species)
elif args.database=="bioplex":
network_file, nodes_file = bioplex_output_files(ids,args.id_type,args.species)
elif args.database=="humap":
network_file, nodes_file = humap_output_files(ids,args.species)

#convert blank to NA and sort files
network_file = blank_to_NA(network_file)
network_file = sort_by_column(network_file,0,False,True)
nodes_file = sort_by_column(nodes_file,0,False,True)

#write output files
with open(args.network_output,"w") as output :
writer = csv.writer(output,delimiter="\t")
writer.writerows(network_file)

with open(args.nodes_output,"w") as output :
writer = csv.writer(output,delimiter="\t")
for row in nodes_file:
writer.writerow([unicode(s).encode("utf-8") for s in row])

if __name__ == "__main__":
main()
Loading