Skip to content

Commit

Permalink
Merge pull request #68 from alxndrkalinin/v0.4.2
Browse files Browse the repository at this point in the history
v0.4.2
  • Loading branch information
johnarevalo authored Oct 22, 2024
2 parents fc829c0 + 7d47818 commit 44378ed
Show file tree
Hide file tree
Showing 22 changed files with 3,600 additions and 157 deletions.
15 changes: 3 additions & 12 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10"]
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v3
Expand All @@ -26,17 +26,8 @@ jobs:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip build
python -m pip install flake8 pytest
python -m build
pip install -e .
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
python -m pip install --upgrade pip
pip install -e .[test]
- name: Test with pytest
run: |
python -m pip install scikit-learn
pytest
11 changes: 11 additions & 0 deletions .github/workflows/ruff.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
name: Ruff
on: [push, pull_request]
jobs:
ruff:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/ruff-action@v1
- uses: astral-sh/ruff-action@v1
with:
args: "format --check"
137 changes: 39 additions & 98 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,119 +1,60 @@
# copairs

Find pairs and compute metrics between them.
`copairs` is a Python package for finding groups of profiles based on metadata and calculate mean Average Precision to assess intra- vs inter-group similarities.

## Installation
## Getting started

```bash
pip install git+https://github.com/cytomining/[email protected]
```

## Usage
### System requirements
copairs supports Python 3.8+ and should work with all modern operating systems (tested with MacOS 13.5, Ubuntu 18.04, Windows 10).

### Data
### Dependencies
copairs depends on widely used Python packages:
* numpy
* pandas
* tqdm
* statsmodels
* [optional] plotly

Say you have a dataset with 20 samples taken in 3 plates `p1, p2, p3`,
each plate is composed of 5 wells `w1, w2, w3, w4, w5`, and each well
has one or more labels (`t1, t2, t3, t4`) assigned.
### Installation

```python
import pandas as pd
import random

random.seed(0)
n_samples = 20
dframe = pd.DataFrame({
'plate': [random.choice(['p1', 'p2', 'p3']) for _ in range(n_samples)],
'well': [random.choice(['w1', 'w2', 'w3', 'w4', 'w5']) for _ in range(n_samples)],
'label': [random.choice(['t1', 't2', 't3', 't4']) for _ in range(n_samples)]
})
dframe = dframe.drop_duplicates()
dframe = dframe.sort_values(by=['plate', 'well', 'label'])
dframe = dframe.reset_index(drop=True)
To install copairs and dependencies, run:
```bash
pip install copairs
```

| | plate | well | label |
|---:|:--------|:-------|:--------|
| 0 | p1 | w2 | t4 |
| 1 | p1 | w3 | t2 |
| 2 | p1 | w3 | t4 |
| 3 | p1 | w4 | t1 |
| 4 | p1 | w4 | t3 |
| 5 | p2 | w1 | t1 |
| 6 | p2 | w2 | t1 |
| 7 | p2 | w3 | t1 |
| 8 | p2 | w3 | t2 |
| 9 | p2 | w3 | t3 |
| 10 | p2 | w4 | t2 |
| 11 | p2 | w5 | t1 |
| 12 | p2 | w5 | t3 |
| 13 | p3 | w1 | t3 |
| 14 | p3 | w1 | t4 |
| 15 | p3 | w4 | t2 |
| 16 | p3 | w5 | t2 |
| 17 | p3 | w5 | t4 |

### Getting valid pairs

To get pairs of samples that share the same `label` but comes from different
`plate`s at different `well` positions:

```python
from copairs import Matcher
matcher = Matcher(dframe, ['plate', 'well', 'label'], seed=0)
pairs_dict = matcher.get_all_pairs(sameby=['label'], diffby=['plate', 'well'])
To also install dependencies for running examples, run:
```bash
pip install copairs[demo]
```

`pairs_dict` is a `label_id: pairs` dictionary containing the list of valid
pairs for every unique value of `labels`
### Testing

```
{'t4': [(0, 17), (0, 14), (17, 2), (2, 14)],
't2': [(1, 16), (1, 10), (1, 15), (8, 16), (8, 15), (10, 16)],
't1': [(3, 11), (3, 5), (3, 6), (3, 7)],
't3': [(9, 4), (9, 13), (13, 4), (13, 12), (4, 12)]}
To run tests, run:
```bash
pip install -e .[test]
pytest
```

### Getting valid pairs from a multilabel column

For eficiency reasons, you may not want to have duplicated rows. You can
group all the labels in a single row and use `MatcherMultilabel` to find the
corresponding pairs:
## Usage

```python
dframe_multi = dframe.groupby(['plate', 'well'])['label'].unique().reset_index()
```
We provide examples demonstrating how to use copairs for:
- [grouping profiles based on their metadata](./examples/finding_pairs.ipynb)
- [calculating mAP to assess phenotypic activity and consistnecy of perturbation using real data](./examples/mAP_demo.ipynb)

| | plate | well | label |
|---:|:--------|:-------|:-------------------|
| 0 | p1 | w2 | ['t4'] |
| 1 | p1 | w3 | ['t2', 't4'] |
| 2 | p1 | w4 | ['t1', 't3'] |
| 3 | p2 | w1 | ['t1'] |
| 4 | p2 | w2 | ['t1'] |
| 5 | p2 | w3 | ['t1', 't2', 't3'] |
| 6 | p2 | w4 | ['t2'] |
| 7 | p2 | w5 | ['t1', 't3'] |
| 8 | p3 | w1 | ['t3', 't4'] |
| 9 | p3 | w4 | ['t2'] |
| 10 | p3 | w5 | ['t2', 't4'] |

```python
from copairs import MatcherMultilabel
matcher_multi = MatcherMultilabel(dframe_multi,
columns=['plate', 'well', 'label'],
multilabel_col='label',
seed=0)
pairs_multi = matcher_multi.get_all_pairs(sameby=['label'],
diffby=['plate', 'well'])
```
## Citation
If you find this work useful for your research, please cite our [pre-print](https://doi.org/10.1101/2024.04.01.587631):

`pairs_multi` is also a `label_id: pairs` dictionary with the same
structure discussed before:
Kalinin, A.A., Arevalo, J., Vulliard, L., Serrano, E., Tsang, H., Bornholdt, M., Rajwa, B., Carpenter, A.E., Way, G.P. and Singh, S., 2024. A versatile information retrieval framework for evaluating profile strength and similarity. bioRxiv, pp.2024-04. doi:10.1101/2024.04.01.587631

BibTeX:
```
{'t4': [(0, 10), (0, 8), (10, 1), (1, 8)],
't2': [(1, 10), (1, 6), (1, 9), (5, 10), (5, 9), (6, 10)],
't1': [(2, 7), (2, 3), (2, 4), (2, 5)],
't3': [(5, 2), (5, 8), (8, 2), (8, 7), (2, 7)]}
@article{kalinin2024versatile,
title={A versatile information retrieval framework for evaluating profile strength and similarity},
author={Kalinin, Alexandr A and Arevalo, John and Vulliard, Loan and Serrano, Erik and Tsang, Hillary and Bornholdt, Michael and Rajwa, Bartek and Carpenter, Anne E and Way, Gregory P and Singh, Shantanu},
journal={bioRxiv},
pages={2024--04},
year={2024},
doi={10.1101/2024.04.01.587631}
}
```
Loading

0 comments on commit 44378ed

Please sign in to comment.