Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs #40

Merged
merged 6 commits into from
Jan 22, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/docs_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,5 +31,5 @@ jobs:
python3 -m pip install --upgrade pip && python3 -m pip install poetry
poetry env use '3.10'
source $(poetry env info --path)/bin/activate
poetry install --with docs,test
poetry install --with docs,test,dev,peft
cd docs && rm -rf source/reference/api && make html
60 changes: 30 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,30 @@
# mmlearn

[![code checks](https://github.com/VectorInstitute/mmlearn/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/mmlearn/actions/workflows/code_checks.yml)
[![integration tests](https://github.com/VectorInstitute/mmlearn/actions/workflows/integration_tests.yml/badge.svg)](https://github.com/VectorInstitute/mmlearn/actions/workflows/integration_tests.yml)
[![license](https://img.shields.io/github/license/VectorInstitute/mmlearn.svg)](https://github.com/VectorInstitute/mmlearn/blob/main/LICENSE)

This project aims at enabling the evaluation of existing multimodal representation learning methods, as well as facilitating
*mmlearn* aims at enabling the evaluation of existing multimodal representation learning methods, as well as facilitating
experimentation and research for new techniques.

## Quick Start

### Installation

#### Prerequisites

The library requires Python 3.10 or later. We recommend using a virtual environment to manage dependencies. You can create
a virtual environment using the following command:

```bash
python3 -m venv /path/to/new/virtual/environment
source /path/to/new/virtual/environment/bin/activate
```

#### Installing binaries

To install the pre-built binaries, run:

```bash
python3 -m pip install mmlearn
```
Expand Down Expand Up @@ -73,13 +80,15 @@ Uses the <a href=https://huggingface.co/docs/peft/index>PEFT</a> library to enab
</table>

For example, to install the library with the `vision` and `audio` extras, run:

```bash
python3 -m pip install mmlearn[vision,audio]
```

</details>

#### Building from source

To install the library from source, run:

```bash
Expand All @@ -89,6 +98,7 @@ python3 -m pip install -e .
```

### Running Experiments

We use [Hydra](https://hydra.cc/docs/intro/) and [hydra-zen](https://mit-ll-responsible-ai.github.io/hydra-zen/) to manage configurations
in the library.

Expand All @@ -97,9 +107,11 @@ have an `__init__.py` file to make it a Python package and an `experiment` folde
This format allows the use of `.yaml` configuration files as well as Python modules (using [structured configs](https://hydra.cc/docs/tutorials/structured_config/intro/) or [hydra-zen](https://mit-ll-responsible-ai.github.io/hydra-zen/)) to define the experiment configurations.

To run an experiment, use the following command:

```bash
mmlearn_run 'hydra.searchpath=[pkg://path.to.config.directory]' +experiment=<name_of_experiment_yaml_file> experiment=your_experiment_name
```

Hydra will compose the experiment configuration from all the configurations in the specified directory as well as all the
configurations in the `mmlearn` package. *Note the dot-separated path to the directory containing the experiment configuration
files.*
Expand All @@ -109,23 +121,38 @@ One can add a path to `hydra.searchpath` either as a package (`pkg://path.to.con
Hence, please refrain from using the `file://` notation.

Hydra also allows for overriding configuration parameters from the command line. To see the available options and other information, run:

```bash
mmlearn_run 'hydra.searchpath=[pkg://path.to.config.directory]' +experiment=<name_of_experiment_yaml_file> --help
```

By default, the `mmlearn_run` command will run the experiment locally. To run the experiment on a SLURM cluster, we use
the [submitit launcher](https://hydra.cc/docs/plugins/submitit_launcher/) plugin built into Hydra. The following is an example
of how to run an experiment on a SLURM cluster:

```bash
mmlearn_run --multirun hydra.launcher.mem_gb=32 hydra.launcher.qos=your_qos hydra.launcher.partition=your_partition hydra.launcher.gres=gpu:4 hydra.launcher.cpus_per_task=8 hydra.launcher.tasks_per_node=4 hydra.launcher.nodes=1 hydra.launcher.stderr_to_stdout=true hydra.launcher.timeout_min=60 '+hydra.launcher.additional_parameters={export: ALL}' 'hydra.searchpath=[pkg://path.to.config.directory]' +experiment=<name_of_experiment_yaml_file> experiment=your_experiment_name
mmlearn_run --multirun \
hydra.launcher.mem_per_cpu=5G \
hydra.launcher.qos=your_qos \
hydra.launcher.partition=your_partition \
hydra.launcher.gres=gpu:4 \
hydra.launcher.cpus_per_task=8 \
hydra.launcher.tasks_per_node=4 \
hydra.launcher.nodes=1 \
hydra.launcher.stderr_to_stdout=true \
hydra.launcher.timeout_min=720 \
'hydra.searchpath=[pkg://path.to.my_project.configs]' \
+experiment=my_experiment \
experiment_name=my_experiment_name
```

This will submit a job to the SLURM cluster with the specified resources.

**Note**: After the job is submitted, it is okay to cancel the program with `Ctrl+C`. The job will continue running on
the cluster. You can also add `&` at the end of the command to run it in the background.


## Summary of Implemented Methods

<table>
<tr>
<th style="text-align: left; width: 250px"> Pretraining Methods </th>
Expand Down Expand Up @@ -181,33 +208,6 @@ Binary and multi-class classification tasks are supported.
</tr>
</table>

## Components
### Datasets
Every dataset object must return an instance of `Example` with one or more keys/attributes corresponding to a modality name
as specified in the `Modalities` registry. The `Example` object must also include an `example_index` attribute/key, which
is used, in addition to the dataset index, to uniquely identify the example.

<details>
<summary><b>CombinedDataset</b></summary>

The `CombinedDataset` object is used to combine multiple datasets into one. It accepts an iterable of `torch.utils.data.Dataset`
and/or `torch.utils.data.IterableDataset` objects and returns an `Example` object from one of the datasets, given an index.
Conceptually, the `CombinedDataset` object is a concatenation of the datasets in the input iterable, so the given index
can be mapped to a specific dataset based on the size of the datasets. As iterable-style datasets do not support random access,
the examples from these datasets are returned in order as they are iterated over.

The `CombinedDataset` object also adds a `dataset_index` attribute to the `Example` object, corresponding to the index of
the dataset in the input iterable. Every example returned by the `CombinedDataset` will have an `example_ids` attribute,
which is instance of `Example` containing the same keys/attributes as the original example, with the exception of the
`example_index` and `dataset_index` attributes, with values being a tensor of the `dataset_index` and `example_index`.
</details>

### Dataloading
When dealing with multiple datasets with different modalities, the default `collate_fn` of `torch.utils.data.DataLoader`
may not work, as it assumes that all examples have the same keys/attributes. In that case, the `collate_example_list`
function can be used as the `collate_fn` argument of `torch.utils.data.DataLoader`. This function takes a list of `Example`
objects and returns a dictionary of tensors, with all the keys/attributes of the `Example` objects.

## Contributing

If you are interested in contributing to the library, please see [CONTRIBUTING.MD](CONTRIBUTING.MD). This file contains
Expand Down
20 changes: 11 additions & 9 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,14 @@
"sphinx_copybutton",
"sphinx_design",
"sphinxcontrib.apidoc",
"myst_parser",
]
add_module_names = False
apidoc_module_dir = "../../mmlearn"
apidoc_output_dir = "reference/api"
apidoc_excluded_paths = ["tests"]
apidoc_separate_modules = True
apidoc_module_first = True
autoclass_content = "class"
autodoc_default_options = {
"members": True,
Expand All @@ -47,13 +53,6 @@
autosummary_generate = True
copybutton_prompt_text = r">>> |\.\.\. "
copybutton_prompt_is_regexp = True
napoleon_google_docstring = False
napoleon_numpy_docstring = True
napoleon_include_init_with_doc = True
napoleon_attr_annotations = True
set_type_checking_flag = True


intersphinx_mapping = {
"python": ("https://docs.python.org/3.10/", None),
"numpy": ("http://docs.scipy.org/doc/numpy/", None),
Expand All @@ -67,9 +66,12 @@
"torchmetrics": ("https://lightning.ai/docs/torchmetrics/stable/", None),
"Pillow": ("https://pillow.readthedocs.io/en/latest/", None),
"transformers": ("https://huggingface.co/docs/transformers/en/", None),
"peft": ("https://huggingface.co/docs/peft/en/", None),
}

napoleon_google_docstring = False
napoleon_numpy_docstring = True
napoleon_include_init_with_doc = True
napoleon_attr_annotations = True
set_type_checking_flag = True
templates_path = ["_templates"]

# -- Options for HTML output -------------------------------------------------
Expand Down
2 changes: 2 additions & 0 deletions docs/source/contributing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.. include:: ../../CONTRIBUTING.md
:parser: myst_parser.sphinx_
3 changes: 2 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@ Contents
:maxdepth: 2

installation
getting_started
user_guide
contributing
api
Loading
Loading