Skip to content

Commit

Permalink
Merge branch 'develop' into TASK-5964
Browse files Browse the repository at this point in the history
  • Loading branch information
pfurio authored Jul 11, 2024
2 parents 89fbe04 + 344c4aa commit 04be875
Show file tree
Hide file tree
Showing 4 changed files with 63 additions and 73 deletions.
94 changes: 37 additions & 57 deletions opencga-client/src/main/python/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,34 +4,28 @@ PyOpenCGA
==========

This Python client package makes use of the comprehensive RESTful web services API implemented for the `OpenCGA`_ platform.
OpenCGA is an open-source project that implements a high-performance, scalable and secure platform for Genomic data analysis and visualisation
OpenCGA is an open-source project that implements a high-performance, scalable and secure platform for Genomic data analysis and visualisation.

OpenCGA implements a secure and high performance platform for Big Data analysis and visualisation in current genomics.
OpenCGA uses the most modern and advanced technologies to scale to petabytes of data. OpenCGA is designed and implemented to work with
few million genomes. It is built on top of three main components: Catalog, Variant and Alignment Storage and Analysis.
OpenCGA uses the most modern and advanced technologies to scale to petabytes of data. OpenCGA is designed and implemented to work with few million genomes. It is built on top of three main components: Catalog, Variant and Alignment Storage and Analysis.

More info about this project in the `OpenCGA Docs`_
More info about this project in `OpenCGA Docs`_

Installation
------------

Cloning
```````
PyOpenCGA can be cloned in your local machine by executing in your terminal::
PyOpenCGA can be installed from the Pypi repository. Make sure you have pip available in your machine. You can check this by running::

$ git clone https://github.com/opencb/opencga.git
$ python3 -m pip --version

Once you have downloaded the project you can install the library. We recommend to install it inside a `virtual environment`_::

$ cd opencga/tree/develop/opencga-client/src/main/python/pyOpenCGA
$ python setup.py install
If you don't have Python or pip, please refer to https://packaging.python.org/en/latest/tutorials/installing-packages/

Pip install
```````````
Run the following command in the shell::
To install PyOpencga, run the following command in the shell::

$ pip install pyopencga


Usage
-----

Expand All @@ -48,14 +42,14 @@ The first step is to import the ClientConfiguration and OpenCGAClient from pyOpe
Setting up server host configuration
````````````````````````````````````

The second step is to generate a ClientConfiguration instance by passing a configuration dictionary containing the host to point to or a client-configuration.yml file:
The second step is to generate a ClientConfiguration instance by passing a configuration dictionary containing the opencga host OR a client-configuration.yml file with that information:

.. code-block:: python
>>> config = ClientConfiguration('/opt/opencga/conf/client-configuration.yml')
>>> config = ClientConfiguration({
"rest": {
"host": "http://bioinfo.hpc.cam.ac.uk/opencga-demo"
"host": "https://demo.app.zettagenomics.com/opencga"
}
})
Expand All @@ -67,33 +61,22 @@ With this configuration you can initialize the OpenCGAClient, and log in:
.. code-block:: python
>>> oc = OpenCGAClient(config)
>>> oc.login('user')
For scripting or using Jupyter Notebooks is preferable to load user credentials from an external JSON file.

Once you are logged in, it is mandatory to use the token of the session to propagate the access of the clients to the host server:

.. code-block:: python
>>> token = oc.token
>>> print(token)
eyJhbGciOi...
>>> oc = OpenCGAClient(configuration=config_dict, token=token)
>>> oc.login(user='user', password='pass', organization='organization')
Examples
````````

The next step is to get an instance of the clients we may want to use:
The first step is to get an instance of the clients we may want to use:

.. code-block:: python
>>> projects = oc.projects # Project client
>>> studies = oc.studies # Study client
>>> samples = oc.samples # Sample client
>>> cohorts = oc.cohorts # Cohort client
>>> projects = oc.projects # Project client
>>> studies = oc.studies # Study client
>>> samples = oc.samples # Sample client
>>> individuals = oc.individuals # Individual client
>>> cohorts = oc.cohorts # Cohort client
Now you can start asking to the OpenCGA RESTful service with pyOpenCGA:
Now you can start querying with pyOpenCGA:

.. code-block:: python
Expand All @@ -103,41 +86,38 @@ Now you can start asking to the OpenCGA RESTful service with pyOpenCGA:
project2
[...]
There are two different ways to access to the query response data:
There are two different ways to access query response data:

.. code-block:: python
>>> foo_client.method().get_results() # Iterates over all the results of all the QueryResults
>>> foo_client.method().get_responses() # Iterates over all the responses
>>> foo_client.method().get_responses() # Iterates over all the responses
>>> foo_client.method().get_results() # Iterates over all the results of the first response
Data can be accessed specifying comma-separated IDs or a list of IDs:
Data can be accessed specifying comma-separated IDs or a list of IDs.

.. code-block:: python
e.g. Retrieving individual karyotypic sex for a list of individuals:

>>> samples = 'NA12877,NA12878,NA12879'
>>> samples_list = ['NA12877','NA12878','NA12879']
>>> sc = oc.samples
.. code-block:: python
>>> for result in sc.info(query_id=samples, study='user@project1:study1').get_results():
... print(result['id'], result['attributes']['OPENCGA_INDIVIDUAL']['disorders'])
NA12877 [{'id': 'OMIM6500', 'name': "Chron's Disease"}]
NA12878 []
NA12879 [{'id': 'OMIM6500', 'name': "Chron's Disease"}]
>>> for result in oc.samples.info(samples='NA12877,NA12878,NA12889', study='platinum').get_results():
... print(result['id'], result['karyotypicSex'])
NA12877 XY
NA12878 XX
NA12889 XY
>>> for result in sc.info(query_id=samples_list, study='user@project1:study1').get_results():
... print(result['id'], result['attributes']['OPENCGA_INDIVIDUAL']['disorders'])
NA12877 [{'id': 'OMIM6500', 'name': "Chron's Disease"}]
NA12878 []
NA12879 [{'id': 'OMIM6500', 'name': "Chron's Disease"}]
>>> for result in oc.samples.info(samples=['NA12877', 'NA12878', 'NA12889'], study='platinum').get_results():
... print(result['id'], result['karyotypicSex'])
NA12877 XY
NA12878 XX
NA12889 XY
Optional filters and extra options can be added as key-value parameters (where the values can be a comma-separated string or a list).

What can I ask for?
```````````````````
The best way to know which data can be retrieved for each client check `OpenCGA web services`_ swagger.

The best way to know which data can be retrieved for each client, log into `OpenCGA Demo`_ and check the **OpenCGA REST API** in the **About** section (at the top right corner of the screen).

.. _OpenCGA: https://github.com/opencb/opencga
.. _OpenCGA Docs: http://docs.opencb.org/display/opencga
.. _virtual environment: https://help.dreamhost.com/hc/en-us/articles/115000695551-Installing-and-using-virtualenv-with-Python-3
.. _OpenCGA web services: http://bioinfodev.hpc.cam.ac.uk/opencga/webservices/
.. _OpenCGA REST API: https://demo.app.zettagenomics.com/
.. _OpenCGA Demo: https://demo.app.zettagenomics.com/
2 changes: 0 additions & 2 deletions opencga-client/src/main/python/pyopencga/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +0,0 @@
"refactoring from from fork pyCGA 7f2e3e404 branch release-1.4.0"
__author__="[email protected]"
35 changes: 24 additions & 11 deletions opencga-client/src/main/python/pyopencga/opencga_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,15 @@
import time
import sys
import re
if sys.version_info >= (3, 8):
from importlib.metadata import version
else:
from importlib_metadata import version

from pyopencga.opencga_config import ClientConfiguration
from pyopencga.rest_clients.admin_client import Admin
from pyopencga.rest_clients.alignment_client import Alignment
from pyopencga.rest_clients.clinical_client import Clinical
from pyopencga.rest_clients.clinical_analysis_client import ClinicalAnalysis
from pyopencga.rest_clients.cohort_client import Cohort
from pyopencga.rest_clients.family_client import Family
from pyopencga.rest_clients.file_client import File
Expand All @@ -21,6 +25,7 @@
from pyopencga.rest_clients.variant_operation_client import VariantOperation
from pyopencga.rest_clients.user_client import User
from pyopencga.rest_clients.variant_client import Variant
from pyopencga.rest_clients.organization_client import Organization


class OpencgaClient(object):
Expand Down Expand Up @@ -50,8 +55,10 @@ def __exit__(self, exc_type, exc_val, exc_tb):
self.logout()

def _check_versions(self):
# Getting client and server versions
client_version = version("pyopencga")
server_version = self.meta.about().get_result(0)['Version'].split('-')[0]
client_version = re.findall(r'Client version: (.+)\n', str(self.meta.__doc__))[0]

ansi_reset = "\033[0m"
ansi_red = "\033[31m"
ansi_yellow = "\033[33m"
Expand All @@ -60,10 +67,12 @@ def _check_versions(self):
' Some client features may not be implemented in the server.\n'.format(client_version, server_version)
sys.stdout.write('{}{}{}'.format(ansi_red, msg, ansi_reset))
elif tuple(server_version.split('.')[:2]) > tuple(client_version.split('.')[:2]):
msg = '[INFO]: Client version ({}) is lower than server version ({}).\n'.format(client_version, server_version)
msg = '[INFO]: Client version ({}) is lower than server version ({}).' \
' Some client features may not work as intended.\n'.format(client_version, server_version)
sys.stdout.write('{}{}{}'.format(ansi_yellow, msg, ansi_reset))

def _create_clients(self):
self.organizations = Organization(self.configuration, self.token, self._login_handler, auto_refresh=self.auto_refresh)
self.users = User(self.configuration, self.token, self._login_handler, auto_refresh=self.auto_refresh)
self.projects = Project(self.configuration, self.token, self._login_handler, auto_refresh=self.auto_refresh)
self.studies = Study(self.configuration, self.token, self._login_handler, auto_refresh=self.auto_refresh)
Expand All @@ -76,15 +85,15 @@ def _create_clients(self):
self.disease_panels = DiseasePanel(self.configuration, self.token, self._login_handler, auto_refresh=self.auto_refresh)
self.alignments = Alignment(self.configuration, self.token, self._login_handler, auto_refresh=self.auto_refresh)
self.variants = Variant(self.configuration, self.token, self._login_handler, auto_refresh=self.auto_refresh)
self.clinical = Clinical(self.configuration, self.token, self._login_handler, auto_refresh=self.auto_refresh)
self.clinical = ClinicalAnalysis(self.configuration, self.token, self._login_handler, auto_refresh=self.auto_refresh)
self.operations = VariantOperation(self.configuration, self.token, self._login_handler, auto_refresh=self.auto_refresh)
self.variant_operations = self.operations # DEPRECATED: use 'self.operations'
self.meta = Meta(self.configuration, self.token, self._login_handler, auto_refresh=self.auto_refresh)
self.ga4gh = GA4GH(self.configuration, self.token, self._login_handler, auto_refresh=self.auto_refresh)
self.admin = Admin(self.configuration, self.token, self._login_handler, auto_refresh=self.auto_refresh)

self.clients = [
self.users, self.projects, self.studies, self.files, self.jobs,
self.organizations, self.users, self.projects, self.studies, self.files, self.jobs,
self.samples, self.individuals, self.families, self.cohorts,
self.disease_panels, self.alignments, self.variants, self.clinical,
self.variant_operations, self.meta, self.ga4gh, self.admin
Expand All @@ -93,7 +102,7 @@ def _create_clients(self):
for client in self.clients:
client.on_retry = self.on_retry

def _make_login_handler(self, user, password):
def _make_login_handler(self, user, password, organization):
"""
Returns a closure that performs the log-in. This will be called on retries
if the current session ever expires.
Expand All @@ -109,13 +118,15 @@ def login_handler(refresh=False):
data = {'user': user, 'password': password}
else:
data = {'user': user, 'password': password}
if organization:
data.update({'organization': organization})
tokens = User(self.configuration).login(data=data).get_result(0)
self.token = tokens['token']
self.refresh_token = tokens['refreshToken']
return self.token
return login_handler

def login(self, user=None, password=None):
def login(self, user=None, password=None, organization=None):
if user is not None:
if password is None:
password = getpass.getpass()
Expand All @@ -125,7 +136,7 @@ def login(self, user=None, password=None):
except AssertionError:
raise ValueError("User and password required")

self._login_handler = self._make_login_handler(user, password)
self._login_handler = self._make_login_handler(user, password, organization)
self._login_handler()
for client in self.clients:
client.token = self.token
Expand Down Expand Up @@ -174,15 +185,14 @@ def _get_help_info(self, client_name=None, parameters=False):

# Description and path
class_docstring = client.__doc__
cls_desc = re.findall('(.+)\n +Client version', class_docstring)[0]
cls_desc = cls_desc.strip().replace('This class contains methods', 'Client')
cls_desc = re.findall('(This class contains methods .+)\n', class_docstring)[0]
cls_path = re.findall('PATH: (.+)\n', class_docstring)[0]

# Methods
methods = []
method_names = [method_name for method_name in dir(client)
if callable(getattr(client, method_name))
and not method_name.startswith('_')]
and not method_name.startswith('_') and method_name != 'login_handler']
for method_name in method_names:
if client_name is None:
continue
Expand Down Expand Up @@ -251,6 +261,9 @@ def help(self, client_name=None, show_parameters=False):
)]
sys.stdout.write('\n'.join(help_txt) + '\n')

def get_organization_client(self):
return self.organizations

def get_user_client(self):
return self.users

Expand Down
5 changes: 2 additions & 3 deletions opencga-client/src/main/python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,13 @@
url='https://github.com/opencb/opencga/tree/develop/opencga-client/src/main/python/pyopencga',
packages=['pyopencga', 'pyopencga.rest_clients'],
license='Apache Software License',
author='David Gomez-Peregrina, Pablo Marin-Garcia, Daniel Perez-Gil',
author_email='[email protected], pmarin@kanteron.com, [email protected]',
author='Pablo Marin-Garcia, Daniel Perez-Gil',
author_email='pablo.marin@zettagenomics.com, [email protected]',
classifiers=[
'Development Status :: 5 - Production/Stable',
'Intended Audience :: Developers',
'Topic :: Scientific/Engineering :: Bio-Informatics',
'License :: OSI Approved :: Apache Software License',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.5',
],
Expand Down

0 comments on commit 04be875

Please sign in to comment.