-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #68 from alxndrkalinin/v0.4.2
v0.4.2
- Loading branch information
Showing
22 changed files
with
3,600 additions
and
157 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
name: Ruff | ||
on: [push, pull_request] | ||
jobs: | ||
ruff: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- uses: astral-sh/ruff-action@v1 | ||
- uses: astral-sh/ruff-action@v1 | ||
with: | ||
args: "format --check" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,119 +1,60 @@ | ||
# copairs | ||
|
||
Find pairs and compute metrics between them. | ||
`copairs` is a Python package for finding groups of profiles based on metadata and calculate mean Average Precision to assess intra- vs inter-group similarities. | ||
|
||
## Installation | ||
## Getting started | ||
|
||
```bash | ||
pip install git+https://github.com/cytomining/[email protected] | ||
``` | ||
|
||
## Usage | ||
### System requirements | ||
copairs supports Python 3.8+ and should work with all modern operating systems (tested with MacOS 13.5, Ubuntu 18.04, Windows 10). | ||
|
||
### Data | ||
### Dependencies | ||
copairs depends on widely used Python packages: | ||
* numpy | ||
* pandas | ||
* tqdm | ||
* statsmodels | ||
* [optional] plotly | ||
|
||
Say you have a dataset with 20 samples taken in 3 plates `p1, p2, p3`, | ||
each plate is composed of 5 wells `w1, w2, w3, w4, w5`, and each well | ||
has one or more labels (`t1, t2, t3, t4`) assigned. | ||
### Installation | ||
|
||
```python | ||
import pandas as pd | ||
import random | ||
|
||
random.seed(0) | ||
n_samples = 20 | ||
dframe = pd.DataFrame({ | ||
'plate': [random.choice(['p1', 'p2', 'p3']) for _ in range(n_samples)], | ||
'well': [random.choice(['w1', 'w2', 'w3', 'w4', 'w5']) for _ in range(n_samples)], | ||
'label': [random.choice(['t1', 't2', 't3', 't4']) for _ in range(n_samples)] | ||
}) | ||
dframe = dframe.drop_duplicates() | ||
dframe = dframe.sort_values(by=['plate', 'well', 'label']) | ||
dframe = dframe.reset_index(drop=True) | ||
To install copairs and dependencies, run: | ||
```bash | ||
pip install copairs | ||
``` | ||
|
||
| | plate | well | label | | ||
|---:|:--------|:-------|:--------| | ||
| 0 | p1 | w2 | t4 | | ||
| 1 | p1 | w3 | t2 | | ||
| 2 | p1 | w3 | t4 | | ||
| 3 | p1 | w4 | t1 | | ||
| 4 | p1 | w4 | t3 | | ||
| 5 | p2 | w1 | t1 | | ||
| 6 | p2 | w2 | t1 | | ||
| 7 | p2 | w3 | t1 | | ||
| 8 | p2 | w3 | t2 | | ||
| 9 | p2 | w3 | t3 | | ||
| 10 | p2 | w4 | t2 | | ||
| 11 | p2 | w5 | t1 | | ||
| 12 | p2 | w5 | t3 | | ||
| 13 | p3 | w1 | t3 | | ||
| 14 | p3 | w1 | t4 | | ||
| 15 | p3 | w4 | t2 | | ||
| 16 | p3 | w5 | t2 | | ||
| 17 | p3 | w5 | t4 | | ||
|
||
### Getting valid pairs | ||
|
||
To get pairs of samples that share the same `label` but comes from different | ||
`plate`s at different `well` positions: | ||
|
||
```python | ||
from copairs import Matcher | ||
matcher = Matcher(dframe, ['plate', 'well', 'label'], seed=0) | ||
pairs_dict = matcher.get_all_pairs(sameby=['label'], diffby=['plate', 'well']) | ||
To also install dependencies for running examples, run: | ||
```bash | ||
pip install copairs[demo] | ||
``` | ||
|
||
`pairs_dict` is a `label_id: pairs` dictionary containing the list of valid | ||
pairs for every unique value of `labels` | ||
### Testing | ||
|
||
``` | ||
{'t4': [(0, 17), (0, 14), (17, 2), (2, 14)], | ||
't2': [(1, 16), (1, 10), (1, 15), (8, 16), (8, 15), (10, 16)], | ||
't1': [(3, 11), (3, 5), (3, 6), (3, 7)], | ||
't3': [(9, 4), (9, 13), (13, 4), (13, 12), (4, 12)]} | ||
To run tests, run: | ||
```bash | ||
pip install -e .[test] | ||
pytest | ||
``` | ||
|
||
### Getting valid pairs from a multilabel column | ||
|
||
For eficiency reasons, you may not want to have duplicated rows. You can | ||
group all the labels in a single row and use `MatcherMultilabel` to find the | ||
corresponding pairs: | ||
## Usage | ||
|
||
```python | ||
dframe_multi = dframe.groupby(['plate', 'well'])['label'].unique().reset_index() | ||
``` | ||
We provide examples demonstrating how to use copairs for: | ||
- [grouping profiles based on their metadata](./examples/finding_pairs.ipynb) | ||
- [calculating mAP to assess phenotypic activity and consistnecy of perturbation using real data](./examples/mAP_demo.ipynb) | ||
|
||
| | plate | well | label | | ||
|---:|:--------|:-------|:-------------------| | ||
| 0 | p1 | w2 | ['t4'] | | ||
| 1 | p1 | w3 | ['t2', 't4'] | | ||
| 2 | p1 | w4 | ['t1', 't3'] | | ||
| 3 | p2 | w1 | ['t1'] | | ||
| 4 | p2 | w2 | ['t1'] | | ||
| 5 | p2 | w3 | ['t1', 't2', 't3'] | | ||
| 6 | p2 | w4 | ['t2'] | | ||
| 7 | p2 | w5 | ['t1', 't3'] | | ||
| 8 | p3 | w1 | ['t3', 't4'] | | ||
| 9 | p3 | w4 | ['t2'] | | ||
| 10 | p3 | w5 | ['t2', 't4'] | | ||
|
||
```python | ||
from copairs import MatcherMultilabel | ||
matcher_multi = MatcherMultilabel(dframe_multi, | ||
columns=['plate', 'well', 'label'], | ||
multilabel_col='label', | ||
seed=0) | ||
pairs_multi = matcher_multi.get_all_pairs(sameby=['label'], | ||
diffby=['plate', 'well']) | ||
``` | ||
## Citation | ||
If you find this work useful for your research, please cite our [pre-print](https://doi.org/10.1101/2024.04.01.587631): | ||
|
||
`pairs_multi` is also a `label_id: pairs` dictionary with the same | ||
structure discussed before: | ||
Kalinin, A.A., Arevalo, J., Vulliard, L., Serrano, E., Tsang, H., Bornholdt, M., Rajwa, B., Carpenter, A.E., Way, G.P. and Singh, S., 2024. A versatile information retrieval framework for evaluating profile strength and similarity. bioRxiv, pp.2024-04. doi:10.1101/2024.04.01.587631 | ||
|
||
BibTeX: | ||
``` | ||
{'t4': [(0, 10), (0, 8), (10, 1), (1, 8)], | ||
't2': [(1, 10), (1, 6), (1, 9), (5, 10), (5, 9), (6, 10)], | ||
't1': [(2, 7), (2, 3), (2, 4), (2, 5)], | ||
't3': [(5, 2), (5, 8), (8, 2), (8, 7), (2, 7)]} | ||
@article{kalinin2024versatile, | ||
title={A versatile information retrieval framework for evaluating profile strength and similarity}, | ||
author={Kalinin, Alexandr A and Arevalo, John and Vulliard, Loan and Serrano, Erik and Tsang, Hillary and Bornholdt, Michael and Rajwa, Bartek and Carpenter, Anne E and Way, Gregory P and Singh, Shantanu}, | ||
journal={bioRxiv}, | ||
pages={2024--04}, | ||
year={2024}, | ||
doi={10.1101/2024.04.01.587631} | ||
} | ||
``` |
Oops, something went wrong.