2024.1 docs

bioexcel · May 14, 2024 · 646e79b · 646e79b
1 parent 7d8f101
commit 646e79b
Show file tree

Hide file tree

Showing 14 changed files with 16,857 additions and 36,293 deletions.
diff --git a/README.md b/README.md
diff --git a/biobb_wf_virtual-screening/docs/source/cluster_bs_tutorial.md b/biobb_wf_virtual-screening/docs/source/cluster_bs_tutorial.md
@@ -1,5 +1,5 @@
 # Protein-ligand Docking tutorial using BioExcel Building Blocks (biobb)
-**-- *PDB Cluster90 Binding Site Version* --**
+### -- *PDB Cluster90 Binding Site Version* --
 
 ***
 This tutorial aims to illustrate the process of **protein-ligand docking**, step by step, using the **BioExcel Building Blocks library (biobb)**. The particular example used is the **Mitogen-activated protein kinase 14** (p38-α) protein (PDB code [3HEC](https://www.rcsb.org/structure/3HEC), [https://doi.org/10.2210/pdb3HEC/pdb](https://doi.org/10.2210/pdb3HEC/pdb)), a well-known **Protein Kinase enzyme**, 
@@ -20,21 +20,21 @@ Please note that **docking algorithms**, and in particular, **AutoDock Vina** pr
  - [biobb_structure_utils](https://github.com/bioexcel/biobb_structure_utils): Tools to modify or extract information from a PDB structure file.
  - [biobb_chemistry](https://github.com/bioexcel/biobb_chemistry): Tools to perform chemoinformatics processes.
  - [biobb_vs](https://github.com/bioexcel/biobb_vs): Tools to perform virtual screening studies.
- 
+
 ### Auxiliary libraries used
 
 * [jupyter](https://jupyter.org/): Free software, open standards, and web services for interactive computing across all programming languages.
 * [nglview](http://nglviewer.org/#nglview): Jupyter/IPython widget to interactively view molecular structures and trajectories in notebooks.
 
-### Conda Installation and Launch
+### Conda Installation
 
 ```console
 git clone https://github.com/bioexcel/biobb_wf_virtual-screening.git
 cd biobb_wf_virtual-screening
 conda env create -f conda_env/environment.yml
 conda activate biobb_VS_tutorial
-jupyter-notebook biobb_wf_virtual-screening/notebooks/ebi_api/wf_vs_ebi_api.ipynb
-``` 
+jupyter-notebook biobb_wf_virtual-screening/notebooks/ebi_api/wf_vs_clusterBindingSite.ipynb
+```
 
 ***
 ## Pipeline steps
@@ -62,14 +62,58 @@ jupyter-notebook biobb_wf_virtual-screening/notebooks/ebi_api/wf_vs_ebi_api.ipyn
 ***
 
 
+## Initializing colab
+The two cells below are used only in case this notebook is executed via **Google Colab**. Take into account that, for running conda on **Google Colab**, the **condacolab** library must be installed. As [explained here](https://pypi.org/project/condacolab/), the installation requires a **kernel restart**, so when running this notebook in **Google Colab**, don't run all cells until this **installation** is properly **finished** and the **kernel** has **restarted**.
+
+
+```python
+# Only executed when using google colab
+import sys
+if 'google.colab' in sys.modules:
+  import subprocess
+  from pathlib import Path
+  try:
+    subprocess.run(["conda", "-V"], check=True)
+  except FileNotFoundError:
+    subprocess.run([sys.executable, "-m", "pip", "install", "condacolab"], check=True)
+    import condacolab
+    condacolab.install()
+    # Clone repository
+    repo_URL = "https://github.com/bioexcel/biobb_wf_virtual-screening.git"
+    repo_name = Path(repo_URL).name.split('.')[0]
+    if not Path(repo_name).exists():
+      subprocess.run(["mamba", "install", "-y", "git"], check=True)
+      subprocess.run(["git", "clone", repo_URL], check=True)
+      print("⏬ Repository properly cloned.")
+    # Install environment
+    print("⏳ Creating environment...")
+    env_file_path = f"{repo_name}/conda_env/environment.yml"
+    subprocess.run(["mamba", "env", "update", "-n", "base", "-f", env_file_path], check=True)
+    print("🎨 Install NGLView dependencies...")
+    subprocess.run(["mamba", "install", "-y", "-c", "conda-forge", "nglview==3.0.8", "ipywidgets=7.7.2"], check=True)
+    print("👍 Conda environment successfully created and updated.")
+```
+
+
+```python
+# Enable widgets for colab
+if 'google.colab' in sys.modules:
+  from google.colab import output
+  output.enable_custom_widget_manager()
+  # Change working dir
+  import os
+  os.chdir("biobb_wf_virtual-screening/biobb_wf_virtual-screening/notebooks/clusterBindingSite")
+  print(f"📂 New working directory: {os.getcwd()}")
+```
+
 <a id="input"></a>
 ## Input parameters
 **Input parameters** needed:
 
  - **pdb_code**: PDB code of the experimental complex structure (if exists).<br>
 In this particular example, the **p38α** structure in complex with the **Imatinib drug** was experimentally solved and deposited in the **PDB database** under the **3HEC** PDB code ([https://doi.org/10.2210/pdb3HEC/pdb](https://doi.org/10.2210/pdb3HEC/pdb)). The protein structure from this PDB file will be used as a **target protein** for the **docking process**, after stripping the **small molecule**. An **APO structure**, or any other structure from the **p38α** [cluster 100](https://www.rcsb.org/search?request=%7B%22query%22%3A%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22sequence%22%2C%22parameters%22%3A%7B%22target%22%3A%22pdb_protein_sequence%22%2C%22value%22%3A%22RPTFYRQELNKTIWEVPERYQNLSPVGSGAYGSVCAAFDTKTGLRVAVKKLSRPFQSIIHAKRTYRELRLLKHMKHENVIGLLDVFTPARSLEEFNDVYLVTHLMGADLNNIVKCQKLTDDHVQFLIYQILRGLKYIHSADIIHRDLKPSNLAVNEDCELKILDFGLARHTDDEMTGYVATRWYRAPEIMLNWMHYNQTVDIWSVGCIMAELLTGRTLFPGTDHIDQLKLILRLVGTPGAELLKKISSESARNYIQSLTQMPKMNFANVFIGANPLAVDLLEKMLVLDSDKRITAAQALAHAYFAQYHDPDDEPVADPYDQSFESRDLLIDEWKSLTYDEVISFVPPP%22%2C%22identity_cutoff%22%3A1%2C%22evalue_cutoff%22%3A0.1%7D%2C%22node_id%22%3A0%7D%2C%22return_type%22%3A%22polymer_entity%22%2C%22request_options%22%3A%7B%22pager%22%3A%7B%22start%22%3A0%2C%22rows%22%3A25%7D%2C%22scoring_strategy%22%3A%22combined%22%2C%22sort%22%3A%5B%7B%22sort_by%22%3A%22score%22%2C%22direction%22%3A%22desc%22%7D%5D%7D%2C%22request_info%22%3A%7B%22src%22%3A%22ui%22%2C%22query_id%22%3A%22bea5861f8b38a9e25a3e626b39d6bcbf%22%7D%7D) (sharing a 100% of sequence similarity with the **p38α** structure) could also be used as a **target protein**. This structure of the **protein-ligand complex** will be also used in the last step of the tutorial to check **how close** the resulting **docking pose** is from the known **experimental structure**. 
  -----
- - **ligandCode**: Ligand PDB code (3-letter code) for the small molecule (e.g. STI), DrugBank Ligand Code [DB00619](https://go.drugbank.com/drugs/DB00619).<br>
+ - **ligandCode**: Ligand PDB code (3-letter code) for the small molecule (e.g. STI, DrugBank Ligand Code [DB00619](https://go.drugbank.com/drugs/DB00619)).<br>
 In this particular example, the small molecule chosen for the tutorial is the FDA-approved drug **Imatinib** (PDB Code STI, DrugBank Ligand Code [DB00619](https://go.drugbank.com/drugs/DB00619)), a type of cancer growth blocker, used in [diferent types of leukemia](https://go.drugbank.com/drugs/DB00619).
 
 
@@ -245,15 +289,15 @@ view[0].add_representation(repr_type='cartoon',
                           opacity=.2,
                           color='#cccccc')
 
-view.add_component(output_bindingsite, default=False)
+view.add_component(nglview.FileStructure(output_bindingsite), default=False)
 view[1].add_representation(repr_type='surface', 
                            selection='*', 
                            opacity = .3,
                            radius='1.5',
                            lowResolution= True,
                            # 0: low resolution 
                            smooth=1,
-                           useWorker= True,
+                           #useWorker= True,
                            wrap= True)
 view[1].add_representation(repr_type='licorice', 
                         selection='*')
@@ -301,9 +345,9 @@ Visualizing the **protein structure**, the **selected cavity**, and the **genera
 #view = nglview.show_structure_file(box, default=False)
 view = nglview.NGLWidget()
 #s = view.add_component(pdb_single_chain)
-s = view.add_component(download_pdb)
-b = view.add_component(output_box)
-s = view.add_component(output_bindingsite)
+s = view.add_component(nglview.FileStructure(download_pdb))
+b = view.add_component(nglview.FileStructure(output_box))
+s = view.add_component(nglview.FileStructure(output_bindingsite))
 
 atomPair = [
     [ "9999:Z.ZN1", "9999:Z.ZN2" ],
@@ -350,7 +394,7 @@ s.add_representation(repr_type='surface',
                         surfaceType= 'av', 
                         contour=True,
                         opacity=0.4,
-                        useWorker= True,
+                        #useWorker= True,
                         wrap= True)
 
 
@@ -665,15 +709,15 @@ Note that outputs from **AutoDock Vina** don't contain all the atoms, as the pro
 view = nglview.NGLWidget()
 
 # v1 = Experimental Structure
-v1 = view.add_component(download_pdb)
+v1 = view.add_component(nglview.FileStructure(download_pdb))
 
 v1.clear()
 v1.add_representation(repr_type='licorice', 
                      selection='STI',
                      radius=0.5)
 
 # v2 = Docking result
-v2 = view.add_component(output_structure)
+v2 = view.add_component(nglview.FileStructure(output_structure))
 v2.clear()
 v2.add_representation(repr_type='cartoon', colorScheme = 'sstruc')
 v2.add_representation(repr_type='licorice', radius=0.5, color= 'green', selection='UNL')

diff --git a/biobb_wf_virtual-screening/docs/source/ebi_api_tutorial.md b/biobb_wf_virtual-screening/docs/source/ebi_api_tutorial.md
@@ -1,5 +1,5 @@
 # Protein-ligand Docking tutorial using BioExcel Building Blocks (biobb)
-**-- *PDBe REST-API Version* --**
+### -- *PDBe REST-API Version* --
 
 ***
 This tutorial aims to illustrate the process of **protein-ligand docking**, step by step, using the **BioExcel Building Blocks library (biobb)**. The particular example used is the **Mitogen-activated protein kinase 14** (p38-α) protein (PDB code [3LFA](https://www.rcsb.org/structure/3LFA), [https://doi.org/10.2210/pdb3LFA/pdb](https://doi.org/10.2210/pdb3LFA/pdb)), a well-known **Protein Kinase enzyme**, 
@@ -20,21 +20,21 @@ Please note that **docking algorithms**, and in particular, **AutoDock Vina** pr
  - [biobb_structure_utils](https://github.com/bioexcel/biobb_structure_utils): Tools to modify or extract information from a PDB structure file.
  - [biobb_chemistry](https://github.com/bioexcel/biobb_chemistry): Tools to perform chemoinformatics processes.
  - [biobb_vs](https://github.com/bioexcel/biobb_vs): Tools to perform virtual screening studies.
- 
+
 ### Auxiliary libraries used
 
 * [jupyter](https://jupyter.org/): Free software, open standards, and web services for interactive computing across all programming languages.
 * [nglview](http://nglviewer.org/#nglview): Jupyter/IPython widget to interactively view molecular structures and trajectories in notebooks.
 
-### Conda Installation and Launch
+### Conda Installation
 
 ```console
 git clone https://github.com/bioexcel/biobb_wf_virtual-screening.git
 cd biobb_wf_virtual-screening
 conda env create -f conda_env/environment.yml
 conda activate biobb_VS_tutorial
 jupyter-notebook biobb_wf_virtual-screening/notebooks/ebi_api/wf_vs_ebi_api.ipynb
-``` 
+```
 
 ***
 ## Pipeline steps
@@ -61,12 +61,56 @@ jupyter-notebook biobb_wf_virtual-screening/notebooks/ebi_api/wf_vs_ebi_api.ipyn
 ***
 
 
+## Initializing colab
+The two cells below are used only in case this notebook is executed via **Google Colab**. Take into account that, for running conda on **Google Colab**, the **condacolab** library must be installed. As [explained here](https://pypi.org/project/condacolab/), the installation requires a **kernel restart**, so when running this notebook in **Google Colab**, don't run all cells until this **installation** is properly **finished** and the **kernel** has **restarted**.
+
+
+```python
+# Only executed when using google colab
+import sys
+if 'google.colab' in sys.modules:
+  import subprocess
+  from pathlib import Path
+  try:
+    subprocess.run(["conda", "-V"], check=True)
+  except FileNotFoundError:
+    subprocess.run([sys.executable, "-m", "pip", "install", "condacolab"], check=True)
+    import condacolab
+    condacolab.install()
+    # Clone repository
+    repo_URL = "https://github.com/bioexcel/biobb_wf_virtual-screening.git"
+    repo_name = Path(repo_URL).name.split('.')[0]
+    if not Path(repo_name).exists():
+      subprocess.run(["mamba", "install", "-y", "git"], check=True)
+      subprocess.run(["git", "clone", repo_URL], check=True)
+      print("⏬ Repository properly cloned.")
+    # Install environment
+    print("⏳ Creating environment...")
+    env_file_path = f"{repo_name}/conda_env/environment.yml"
+    subprocess.run(["mamba", "env", "update", "-n", "base", "-f", env_file_path], check=True)
+    print("🎨 Install NGLView dependencies...")
+    subprocess.run(["mamba", "install", "-y", "-c", "conda-forge", "nglview==3.0.8", "ipywidgets=7.7.2"], check=True)
+    print("👍 Conda environment successfully created and updated.")
+```
+
+
+```python
+# Enable widgets for colab
+if 'google.colab' in sys.modules:
+  from google.colab import output
+  output.enable_custom_widget_manager()
+  # Change working dir
+  import os
+  os.chdir("biobb_wf_virtual-screening/biobb_wf_virtual-screening/notebooks/ebi_api")
+  print(f"📂 New working directory: {os.getcwd()}")
+```
+
 <a id="input"></a>
 ## Input parameters
 **Input parameters** needed:
 
  - **pdb_code**: PDB code of the experimental complex structure (if exists).<br>
-In this particular example, the **p38α** structure in complex with the **Dasatinib drug** was experimentally solved and deposited in the **PDB database** under the **3LFA** PDB code, [https://doi.org/10.2210/pdb3LFA/pdb](https://doi.org/10.2210/pdb3LFA/pdb). The protein structure from this PDB file will be used as a **target protein** for the **docking process**, after stripping the **small molecule**. An **APO structure**, or any other structure from the **p38α** [cluster 100](https://www.rcsb.org/search?request=%7B%22query%22%3A%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22sequence%22%2C%22parameters%22%3A%7B%22target%22%3A%22pdb_protein_sequence%22%2C%22value%22%3A%22RPTFYRQELNKTIWEVPERYQNLSPVGSGAYGSVCAAFDTKTGLRVAVKKLSRPFQSIIHAKRTYRELRLLKHMKHENVIGLLDVFTPARSLEEFNDVYLVTHLMGADLNNIVKCQKLTDDHVQFLIYQILRGLKYIHSADIIHRDLKPSNLAVNEDCELKILDFGLARHTDDEMTGYVATRWYRAPEIMLNWMHYNQTVDIWSVGCIMAELLTGRTLFPGTDHIDQLKLILRLVGTPGAELLKKISSESARNYIQSLTQMPKMNFANVFIGANPLAVDLLEKMLVLDSDKRITAAQALAHAYFAQYHDPDDEPVADPYDQSFESRDLLIDEWKSLTYDEVISFVPPP%22%2C%22identity_cutoff%22%3A1%2C%22evalue_cutoff%22%3A0.1%7D%2C%22node_id%22%3A0%7D%2C%22return_type%22%3A%22polymer_entity%22%2C%22request_options%22%3A%7B%22pager%22%3A%7B%22start%22%3A0%2C%22rows%22%3A25%7D%2C%22scoring_strategy%22%3A%22combined%22%2C%22sort%22%3A%5B%7B%22sort_by%22%3A%22score%22%2C%22direction%22%3A%22desc%22%7D%5D%7D%2C%22request_info%22%3A%7B%22src%22%3A%22ui%22%2C%22query_id%22%3A%22bea5861f8b38a9e25a3e626b39d6bcbf%22%7D%7D) (sharing a 100% of sequence similarity with the **p38α** structure) could also be used as a **target protein**. This structure of the **protein-ligand complex** will be also used in the last step of the tutorial to check **how close** the resulting **docking pose** is from the known **experimental structure**. 
+In this particular example, the **p38α** structure in complex with the **Dasatinib drug** was experimentally solved and deposited in the **PDB database** under the **3LFA** PDB code, , [https://doi.org/10.2210/pdb3LFA/pdb](https://doi.org/10.2210/pdb3LFA/pdb). The protein structure from this PDB file will be used as a **target protein** for the **docking process**, after stripping the **small molecule**. An **APO structure**, or any other structure from the **p38α** [cluster 100](https://www.rcsb.org/search?request=%7B%22query%22%3A%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22sequence%22%2C%22parameters%22%3A%7B%22target%22%3A%22pdb_protein_sequence%22%2C%22value%22%3A%22RPTFYRQELNKTIWEVPERYQNLSPVGSGAYGSVCAAFDTKTGLRVAVKKLSRPFQSIIHAKRTYRELRLLKHMKHENVIGLLDVFTPARSLEEFNDVYLVTHLMGADLNNIVKCQKLTDDHVQFLIYQILRGLKYIHSADIIHRDLKPSNLAVNEDCELKILDFGLARHTDDEMTGYVATRWYRAPEIMLNWMHYNQTVDIWSVGCIMAELLTGRTLFPGTDHIDQLKLILRLVGTPGAELLKKISSESARNYIQSLTQMPKMNFANVFIGANPLAVDLLEKMLVLDSDKRITAAQALAHAYFAQYHDPDDEPVADPYDQSFESRDLLIDEWKSLTYDEVISFVPPP%22%2C%22identity_cutoff%22%3A1%2C%22evalue_cutoff%22%3A0.1%7D%2C%22node_id%22%3A0%7D%2C%22return_type%22%3A%22polymer_entity%22%2C%22request_options%22%3A%7B%22pager%22%3A%7B%22start%22%3A0%2C%22rows%22%3A25%7D%2C%22scoring_strategy%22%3A%22combined%22%2C%22sort%22%3A%5B%7B%22sort_by%22%3A%22score%22%2C%22direction%22%3A%22desc%22%7D%5D%7D%2C%22request_info%22%3A%7B%22src%22%3A%22ui%22%2C%22query_id%22%3A%22bea5861f8b38a9e25a3e626b39d6bcbf%22%7D%7D) (sharing a 100% of sequence similarity with the **p38α** structure) could also be used as a **target protein**. This structure of the **protein-ligand complex** will be also used in the last step of the tutorial to check **how close** the resulting **docking pose** is from the known **experimental structure**. 
  -----
  - **ligandCode**: Ligand PDB code (3-letter code) for the small molecule (e.g. 1N1, DrugBank Ligand Code [DB01254](https://go.drugbank.com/drugs/DB01254)).<br>
 In this particular example, the small molecule chosen for the tutorial is the FDA-approved drug **Dasatinib** (PDB Code 1N1, DrugBank Ligand Code [DB01254](https://go.drugbank.com/drugs/DB01254)), a **tyrosine kinase inhibitor**, used in [lymphoblastic or chronic myeloid leukemia](https://go.drugbank.com/drugs/DB01254).
@@ -116,7 +160,6 @@ Note (and try to identify) the **Dasatinib small molecule (1N1)** and the **dete
 view = nglview.show_structure_file(download_pdb, default=True)
 view.center()
 view._remote_call('setSize', target='Widget', args=['','600px'])
-
 view
 ```
 
@@ -155,7 +198,6 @@ view.add_representation(repr_type='cartoon',
                        colorScheme = 'atomindex')
 view.center()
 view._remote_call('setSize', target='Widget', args=['','600px'])
-
 view
 ```
 
@@ -252,7 +294,6 @@ view.add_representation(repr_type='surface',
 
 view.center()
 view._remote_call('setSize', target='Widget', args=['','600px'])
-
 view
 ```
 
@@ -294,8 +335,8 @@ Visualizing the **protein structure**, the **selected cavity**, and the **genera
 ```python
 view = nglview.NGLWidget()
 
-s = view.add_component(download_pdb)
-b = view.add_component(output_box)
+s = view.add_component(nglview.FileStructure(download_pdb))
+b = view.add_component(nglview.FileStructure(output_box))
 
 atomPair = [
     [ "9999:Z.ZN1", "9999:Z.ZN2" ],
@@ -340,13 +381,12 @@ s.add_representation(repr_type='surface',
                         surfaceType= 'av', 
                         contour=True,
                         opacity=0.4,
-                        useWorker= True,
+                        #useWorker= True,
                         wrap= True)
 
 
 view.center()
 view._remote_call('setSize', target='Widget', args=['','600px'])
-
 view
 ```
 
@@ -480,7 +520,7 @@ prop = {
 str_check_add_hydrogens(
             input_structure_path = pdb_protein,
             output_structure_path = prep_receptor,
-            properties=prop)
+            properties = prop)
 ```
 
 <a id="docking"></a>
@@ -656,15 +696,15 @@ Note that outputs from **AutoDock Vina** don't contain all the atoms, as the pro
 view = nglview.NGLWidget()
 
 # v1 = Experimental Structure
-v1 = view.add_component(download_pdb)
+v1 = view.add_component(nglview.FileStructure(download_pdb))
 
 v1.clear()
 v1.add_representation(repr_type='licorice', 
                      selection='[1N1]',
                      radius=0.5)
 
 # v2 = Docking result
-v2 = view.add_component(output_structure)
+v2 = view.add_component(nglview.FileStructure(output_structure))
 v2.clear()
 v2.add_representation(repr_type='cartoon', colorScheme = 'sstruc')
 v2.add_representation(repr_type='licorice', radius=0.5, color= 'green', selection='UNL')
@@ -689,7 +729,6 @@ s[ 0 ].autoView()
 """
 
 view._execute_js_code(code)
-
 view
 ```