Skip to content

Commit

Permalink
better docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Valentin Zulkower committed Dec 17, 2024
1 parent f7828c5 commit 2ff9531
Show file tree
Hide file tree
Showing 3 changed files with 54 additions and 11 deletions.
36 changes: 27 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,20 +32,30 @@ from ginkgo_ai_client import GinkgoAIClient, MaskedInferenceQuery
client = GinkgoAIClient()
model = "ginkgo-aa0-650M"

# SINGLE QUERY

query = MaskedInferenceQuery(sequence="MPK<mask><mask>RRL", model=model)
prediction = client.send_request(query)
# prediction.sequence == "MPKRRRRL"
```

# BATCH QUERY
It is also possible to send multiple queries at once, and even recommended in most cases as these will be processed in parallel, with appropriate scaling from our servers. The `send_batch_request` method returns a list of results in the same order as the queries:

```python
sequences = ["MPK<mask><mask>RRL", "M<mask>RL", "MLLM<mask><mask>R"]
queries = [MaskedInferenceQuery(sequence=seq, model=model) for seq in sequences]
predictions = client.send_batch_request(queries)
# predictions[0].sequence == "MPKRRRRL"
```

For large datasets (say, 100,000 queries), one can also send multiple batches of requests, then iterate over the results as they are ready. Note that the order in which the results are returned is not guaranteed to be the same as the order of the queries, therefore you should make sure the queries have a `query_name` attribute that will be used to identify the results.

```python
from ginkgo_ai_client import MeanEmbeddingQuery
queries = MeanEmbeddingQuery.iter_from_fasta("sequences.fasta", model=model)
for batch_results in client.send_requests_by_batches(queries, batch_size=1000):
for result in batch_results:
print(result.query_name, result.embedding)
```

Changing the `model` parameter to `esm2-650M` or `esm2-3b` in this example will perform
masked inference with the ESM2 model.

Expand Down Expand Up @@ -75,16 +85,24 @@ predictions = client.send_batch_request(queries)

See the [example folder](examples/) and [reference docs](https://ginkgobioworks.github.io/ginkgo-ai-client/) for more details on usage and parameters.

| Model | Description | Reference | Supported queries | Versions |
| ----- | ------------------------------------------- | -------------------------------------------------------------------------------------------- | ---------------------------- | -------- |
| ESM2 | Large Protein language model from Meta | [Github](https://github.com/facebookresearch/esm?tab=readme-ov-file#esmfold) | Embeddings, masked inference | 3B, 650M |
| AA0 | Ginkgo's proprietary protein language model | [Announcement](https://www.ginkgobioworks.com/2024/09/17/aa-0-protein-llm-technical-review/) | Embeddings, masked inference | 650M |
| 3UTR | Ginkgo's proprietary 3'UTR language model | [Preprint](https://www.biorxiv.org/content/10.1101/2024.10.07.616676v1) | Embeddings, masked inference | v1 |
| Model | Description | Reference | Supported queries | Versions |
| ----------- | -------------------------------------- | -------------------------------------------------------------------------------------------- | --------------------------------- | -------- |
| ESM2 | Large Protein language model from Meta | [Github](https://github.com/facebookresearch/esm?tab=readme-ov-file#esmfold) | Embeddings, masked inference | 3B, 650M |
| AA0 | Ginkgo's protein language model | [Announcement](https://www.ginkgobioworks.com/2024/09/17/aa-0-protein-llm-technical-review/) | Embeddings, masked inference | 650M |
| 3UTR | Ginkgo's 3'UTR language model | [Preprint](https://www.biorxiv.org/content/10.1101/2024.10.07.616676v1) | Embeddings, masked inference | v1 |
| Promoter-0 | Ginkgo's promoter activity model | Coming soon | Promoter activity accross tissues | v1 |
| Boltz | Protein structure prediction model | [Github](https://github.com/jwohlwend/boltz) | Protein structure prediction | v1 |
| ABdiffusion | Antibody diffusion model | Coming soon | Unmasking | v1 |
| LCDNA | Long-context DNA diffusion model | Coming soon | Unmasking | v1 |

## License

This project is licensed under the MIT License. See the `LICENSE` file for details.

## Releases

Make sure the changelog is up to date and the top section reads `Unreleased`. Increment the version with the `bumpversion` workflow in Actions - it will update the version everywhere in the repo and create a tag. If all looks good, create a release for the tag, it will automatically publish to PyPI.
To release a new version to PyPI:

- Make sure the changelog is up to date and the top section reads `Unreleased`.
- Increment the version with the `bumpversion` workflow in Actions - it will update the version everywhere in the repo and create a tag.
- If all looks good, create a release for the tag, it will automatically publish to PyPI.
27 changes: 26 additions & 1 deletion docs/source/examples.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,17 @@
Examples
========

Handling large batches
----------------------

.. literalinclude:: ../../examples/handling_large_batches.py
:language: python
:linenos:


Example by application
----------------------

ESM model
~~~~~~~~~

Expand All @@ -22,11 +33,25 @@ AA0 model
:language: python
:linenos:

Promoter activity with Promoter-0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. literalinclude:: ../../examples/promoter_activity.py
:language: python
:linenos:


Boltz structure inference
~~~~~~~~~~~~~~~~~~~~~~~~~

Structure inference with a simple (single-chain) protein sequence:

.. literalinclude:: ../../examples/boltz_structure_inference/gfp.py
:language: python
:linenos:
:linenos:

Structure inference with a multimer protein sequence and ligand(s):

.. literalinclude:: ../../examples/boltz_structure_inference/with_ligand.py
:language: python
:linenos:
2 changes: 1 addition & 1 deletion examples/boltz_structure_inference/with_ligand.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Simple example where we predict the 3D structure of the GFP protein."""
"""We predict the structure of the multimer protein with ligand(s)."""

from ginkgo_ai_client import GinkgoAIClient, BoltzStructurePredictionQuery

Expand Down

0 comments on commit 2ff9531

Please sign in to comment.