Skip to content

Commit

Permalink
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update README.md
Browse files Browse the repository at this point in the history
uwcdc authored Sep 18, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
1 parent f75856e commit a8c66ba
Showing 1 changed file with 82 additions and 79 deletions.
161 changes: 82 additions & 79 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,94 +1,95 @@
# Braingeneers Python Utilities

[This package][github] is supposed to collect, as well as make installable
through Pip, all of the Python code and utilities that we develop as
part of the Braingeneers project. There are five subpackages:
* `braingeneers.analysis` code for data analysis.
[![ssec](https://img.shields.io/badge/SSEC-Project-purple?logo=&style=plastic)](https://escience.washington.edu/wetai/)
[![BSD License](https://badgen.net/badge/license/BSD-3-Clause/blue)](LICENSE)

* `braingeneers.data` all code for basic data access .
* `braingeneers.data.datasets_electrophysiology` contains methods which load and manipulate ephys data.
* `braingeneers.data.datasets_fluidics` contains methods which load and manipulate fluidics data.
* `braingeneers.data.datasets_imaging` contains methods which load and manipulate imaging data.

* `braingeneers.iot` all code for IOT (internet of things) communication.
* `braingeneers.iot.messaging` a single interface for all messaging and inter-device data transfer functions (MQTT, redis, device state, etc.). A wetAI tutorial on this package exists.

* `braingeneers.ml` all code related to ML (machine learning).
* `braingeneers.ml.ephys_dataloader` a high performance pytorch data loader for ephys data.
Welcome to the **Braingeneers Python Utilities** repository! This package collects and provides various Python code and utilities developed as part of the Braingeneers project. The package adheres to the Python Package Authority (PyPA) standards for package structure and organization.

* `braigeneers.utils`
* `braingeneers.utils.s3wrangler` a wrapper of `awswrangler.s3` for accessing PRP/S3. See section below for the documentation and examples.
* `braingeneers.utils.smart_open_braingeneers` a wrapper of `smart_open` for opening files on PRP/S3. See section below for the documentation and examples.
## Installation

[github]: https://www.github.com/braingeneers/braingeneerspy
You can install `braingeneerspy` using `pip` with the following commands:

## Installation / upgrade

Most dependencies are optional installations for this package.
Below are examples of various installation configurations.
### Install from GitHub (Recommended)

```bash
pip install --force-reinstall git+https://github.com/braingeneers/braingeneerspy.git
```
# Typical install (includes `iot`, `analysis`, and `data` access functions, skips `ml`, and lab-specific dependencies):
python -m pip install --force-reinstall git+https://github.com/braingeneers/braingeneerspy.git#egg=braingeneerspy[iot,analysis,data]

# Full install (all optional dependencies included).
python -m pip install --force-reinstall git+https://github.com/braingeneers/braingeneerspy.git#egg=braingeneerspy[all]
### Install from a Wheel (PyPI)

If you prefer to install a pre-built wheel, you can find the latest release on [PyPI](https://pypi.org/project/braingeneerspy/). Please replace `<version>` with the specific version you want to install.

# Minimum install (no optional dependencies, good for Raspberry PI builds).
python -m pip install --force-reinstall git+https://github.com/braingeneers/braingeneerspy.git
```bash
pip install braingeneerspy==<version>
```

### macOS installation note:
if install fails with ```no matches found: git+https://github.com/braingeneers/braingeneerspy.git#egg=braingeneerspy[all]```
wrap quotes around the github address like so
### Install with Optional Dependencies

You can install `braingeneerspy` with specific optional dependencies based on your needs. Use the following command examples:

- Install with IoT, analysis, and data access functions (skips machine learning and lab-specific dependencies):

```bash
pip install --force-reinstall git+https://github.com/braingeneers/braingeneerspy.git#egg=braingeneerspy[iot,analysis,data]
```
# Typical install (includes `iot`, `analysis`, and `data` access functions, skips `ml`, and lab-specific dependencies):
python -m pip install --force-reinstall 'git+https://github.com/braingeneers/braingeneerspy.git#egg=braingeneerspy[iot,analysis]'

# Full install (all optional dependencies included).
python -m pip install --force-reinstall 'git+https://github.com/braingeneers/braingeneerspy.git#egg=braingeneerspy[all]'
- Install with all optional dependencies:

```bash
pip install --force-reinstall git+https://github.com/braingeneers/braingeneerspy.git#egg=braingeneerspy[all]
```

### Optional dependency organization
Please note that macOS users may need to wrap the GitHub URL in quotes if they encounter issues during installation, as shown in the examples above.

## Optional Dependency Groups

Dependencies in `braingeneerspy` are organized into optional groups of requirements. You can install all dependencies with `all`, or you can install a specific set of dependencies. Here are the optional dependency groups:

- *Unspecified*: Minimal packages for data access will be installed.
- `all`: All optional dependencies will be included.
- `iot`: IoT dependencies such as AWS and Redis packages will be installed.
- `analysis`: Dependencies for data analysis routines, plotting tools, math libraries, etc.
- `ml`: Machine learning dependencies such as `torch` will be installed.
- `hengenlab`: Hengenlab data loader-specific packages such as `neuraltoolkit` will be installed.

## Committing Changes to the Repo

If you plan to make changes to the `braingeneerspy` package and publish them on GitHub, please follow these steps:

1. Update the `version` variable in `setup.py`.
2. To receive the updated `braingeneerspy` package on your local machine, run one of the pip install commands mentioned earlier.

Dependencies are organized into optional groups of requirements. You can install all dependencies with `all`,
or install the minimum dependencies (by not specifying optional groups),
or some combination of dependencies you will use. Optional dependency groups are:
## Modules and Subpackages

- *Unspecified*: Minimal packages for data access will be installed.
- `all`: All optional dependencies will be included.
- `iot`: IOT dependencies such as AWS, Redis packages will be installed.
- `analysis`: Dependencies for data analysis routines, plotting tools, math libraries, etc.
- `ml`: Machine Learning dependencies such as `torch` will be installed.
- `hengenlab`: Hengenlab data loader specific packages such as `neuraltoolkit` will be installed.
`braingeneerspy` includes several subpackages and modules, each serving a specific purpose within the Braingeneers project:

### Committing changes to the repo
- `braingeneers.analysis`: Contains code for data analysis.
- `braingeneers.data`: Provides code for basic data access, including subpackages for handling electrophysiology, fluidics, and imaging data.
- `braingeneers.iot`: Offers code for Internet of Things (IoT) communication, including a messaging interface.
- `braingeneers.ml`: Contains code related to machine learning, such as a high-performance PyTorch data loader for electrophysiology data.
- `braingeneers.utils`: Provides utility functions, including S3 access and smart file opening.

To publish changes made to the `braingeneerspy` package on github, please follow these steps.
1. Update the `version` variable in `setup.py`.
2. To then receive the updated `braingeneerspy` package on your personal computer
3. Run one of the pip install commands listed above.
## S3 Access and Configuration

## braingeneers.utils.s3wrangler
Extends the `awswrangler.s3 package` for Braingeneers/PRP access.
See API documentation: https://aws-data-wrangler.readthedocs.io/en/2.4.0-docs/api.html#amazon-s3
### `braingeneers.utils.s3wrangler`

This module extends the `awswrangler.s3 package` for Braingeneers/PRP access. For API documentation and usage examples, please visit the [official documentation](https://aws-data-wrangler.readthedocs.io/en/2.4.0-docs/api.html#amazon-s3).

Here's a basic usage example:

Usage examples:
```python
import braingeneers.utils.s3wrangler as wr

# get all UUIDs from s3://braingeneers/ephys/
# Get all UUIDs from s3://braingeneers/ephys/
uuids = wr.list_directories('s3://braingeneers/ephys/')
print(uuids)
```

## braingeneers.utils.smart_open_braingeneers
Configures smart_open for braingeneers use on PRP/S3. When importing this version of `smart_open`
braingeneers defaults will be autoconfigured. Note that `smart_open` supports both local and S3 files,
so it can be used for all files, not just S3 file access.
### `braingeneers.utils.smart_open_braingeneers`

This module configures `smart_open` for Braingeneers use on PRP/S3. When importing this version of `smart_open`, Braingeneers defaults will be autoconfigured. Note that `smart_open` supports both local and S3 files, so it can be used for all files, not just S3 file access.

Basic usage example (copy/paste this to test your setup), if it works you will see a helpful bit of advice printed to the screen:
Here's a basic usage example:

```python
import braingeneers.utils.smart_open_braingeneers as smart_open
@@ -97,32 +98,31 @@ with smart_open.open('s3://braingeneersdev/test_file.txt', 'r') as f:
print(f.read())
```

You may also safely replace Python's default `open` function with `smart_open.open`,
`smart_open` supports both local and remote files:
You can also safely replace Python's default `open` function with `smart_open.open`:

```python
import braingeneers.utils.smart_open_braingeneers as smart_open

open = smart_open.open
```
### Non-standard S3 endpoints:

`smart_open` and `s3wrangler` are pre-configured by default to the standard braingeneers S3 endpoint,
no configuration is necessary. If you would like to utilize a different S3 service you can specify a
new custom `ENDPOINT`, this can be a local path or an endpoint URL for another S3 service (s3wrangler
only supports S3 services, not local paths, `smart_open` supports local paths).
## Customizing S3 Endpoints

By default, `smart_open` and `s3wrangler` are pre-configured for the standard Braingeneers S3 endpoint. However, you can specify a custom `ENDPOINT` if you'd like to use a different S3 service. This can be a local path or an endpoint URL for another S3 service (note that `s3wrangler` only supports S3 services, not local paths, while `smart_open` supports local paths).

To set a custom endpoint, follow these steps:

- Set an environment variable `ENDPOINT` with the new endpoint. Unix based example:`export ENDPOINT="https://s3-west.nrp-nautilus.io"`
- Call `braingeneers.set_default_endpoint(endpoint: str)` and `braingeneers.get_default_endpoint()`.
These functions will update both `smart_open` and `s3wrangler` (if it's an S3 endpoint,
local path endpoints are ignored by s3wrangler)
1. Set an environment variable `ENDPOINT` with the new endpoint. For example, on Unix-based systems:

When running a job on the PRP you can use the PRP internal S3 endpoint,
which is faster than the default external endpoint (this will only work on jobs run in the PRP
environment). Add the following environment variable to your job YAML file.
This will set the environment variable ENDPOINT_URL which overrides the
default external PPR/S3 endpoint, which is used if you don't set this variable.
Setting this environment variable can also be used to set an endpoint other than the PRP/S3.
```bash
export ENDPOINT="https://s3-west.nrp-nautilus.io"
```

2. Call `braingeneers.set_default_endpoint(endpoint: str)` and `braingeneers.get_default_endpoint()`. These functions will update both `smart_open` and `s3wrangler` (if it's an S3 endpoint, local path endpoints are ignored by `s3wrangler`).

### Using the PRP Internal S3 Endpoint

When running a job on the PRP, you can use the PRP internal S3 endpoint, which is faster than the default external endpoint. To do this, add the following environment variable to your job YAML file:

```yaml
spec:
@@ -137,5 +137,8 @@ spec:
value: "http://rook-ceph-rgw-nautiluss3.rook"
```
Notes:
- There were version conflicts between 4.2.0 and 5.1.0 of smart_open. This configuration has been tested to work with 5.1.0.
Please note that this will only work on jobs run in the PRP environment. Setting the `ENDPOINT` environment variable can also be used to specify an endpoint other than the PRP/S3.

### Notes

- `braingeneerspy` is compatible with `smart_open` version 5.1.0. If you encounter issues, make sure to use this version for compatibility.

0 comments on commit a8c66ba

Please sign in to comment.