Skip to content

Commit

Permalink
Update contribution guide and improved unittest logic (#343)
Browse files Browse the repository at this point in the history
* Update contribution guide and fixed pytest open file descriptor issue

* Improved test logic
  • Loading branch information
karan6181 authored Jul 26, 2023
1 parent e9a74a6 commit f36318c
Show file tree
Hide file tree
Showing 5 changed files with 64 additions and 27 deletions.
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ _Put an `x` without space in the boxes that apply. If you are unsure about any c
### Tests
- [ ] I ran `pre-commit` on my change. (check out the `pre-commit` section of [prerequisites](https://github.com/mosaicml/streaming/blob/main/CONTRIBUTING.md#prerequisites))
- [ ] I have added tests that prove my fix is effective or that my feature works (if appropriate).
- [ ] I ran the tests locally to make sure it pass. (check out [testing](https://github.com/mosaicml/streaming/blob/main/CONTRIBUTING.md#running-tests))
- [ ] I ran the tests locally to make sure it pass. (check out [testing](https://github.com/mosaicml/streaming/blob/main/CONTRIBUTING.md#submitting-a-contribution))
- [ ] I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes.

<!--
Expand Down
50 changes: 42 additions & 8 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ To set up the development environment in your local box, run the commands below.
<!--pytest.mark.skip-->
```bash
pip install -e '.[dev]'

# Optional: If you would like to install all the dependencies
pip install -e '.[all]'
```

2\. Configure [pre-commit](https://pre-commit.com/), which automatically formats code before
Expand Down Expand Up @@ -47,22 +50,53 @@ git remote add upstream https://github.com/mosaicml/streaming.git

<!--pytest.mark.skip-->
```bash
cd streaming
git checkout -b cool-new-feature
```

4\. When you are ready, submit a pull request into the streaming repository! If merged, we'll reach out to send you some free swag :)
4\. Run linting as part of `pre-commit`.

## Configuring README Code Snippets
<!--pytest.mark.skip-->
```bash
git add <file1> <file2>
pre-commit run

Streaming uses [pytest-codeblocks](https://github.com/nschloe/pytest-codeblocks) to test all example code snippets. The pytest-codeblocks repository explains how to annotate code snippets, which supports most `pytest` configurations. For example, if a test requires model training, the GPU mark (`<!--pytest.mark.skip-->`) should be applied.
# Optional: Run pre-commit for all files
pre-commit run --all-files
```

5\. Run the unit test to ensure it passes locally.

<!--pytest.mark.skip-->
```bash
ulimit -n unlimited # Workaround: To overcome 'Too many open files' issues since streaming uses atexit handler to close file descriptor at the end.

## Running Tests
pytest -vv -s . # run all the unittests
cd docs && make clean && make doctest # run doctests
```

To test your changes locally, run:
6\. [Optional] Compile and visualize the documentation locally. If you have a documentation changes, running the below commands is mandatory.

1. `pytest .` # run all the unittests
1. `cd docs && make doctest` # run doctests
<!--pytest.mark.skip-->
```bash
cd docs
pip install -e '.[docs]'
make clean && make html
make host # open the output link in a browser.
```

See the [Makefile](/Makefile) for more information.

If you want to run pre-commit hooks manually, which check for code formatting and type annotations, run `pre-commit run --all-files`

7\. When you are ready, submit a pull request into the streaming repository!
<!--pytest.mark.skip-->
```bash
git commit -m "cool feature" # Add relevant commit message
git push origin cool-new-feature
```

Create a pull request to propose changes you've made to a fork of an upstream repository by following this [guide](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork).

## Configuring README Code Snippets

Streaming uses [pytest-codeblocks](https://github.com/nschloe/pytest-codeblocks) to test all example code snippets. The pytest-codeblocks repository explains how to annotate code snippets, which supports most `pytest` configurations. For example, if a test requires model training, the GPU mark (`<!--pytest.mark.skip-->`) should be applied.
5 changes: 3 additions & 2 deletions streaming/base/partition/orig.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,9 @@ def get_partitions_orig(num_samples: int,

# If drop_first isn't a multiple of num_physical_nodes, round down to make it divisible.
if drop_first % num_physical_nodes:
logger.warn('`drop_first` was not divisible by `num_physical_nodes`. Rounding it down ' +
'to make it divisible.')
logger.warning(
'`drop_first` was not divisible by `num_physical_nodes`. Rounding it down ' +
'to make it divisible.')
drop_first -= drop_first % num_physical_nodes

# Divide the full dataset sample range into a sample range per canonical node.
Expand Down
26 changes: 14 additions & 12 deletions tests/test_eviction.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import operator
import os
from shutil import rmtree
from typing import Tuple
from typing import Any, Tuple

import pytest
from torch.utils.data import DataLoader
Expand Down Expand Up @@ -119,12 +119,15 @@ def cache_limit_too_low(remote: str, local: str, keep_zip: bool):
rmtree(local, ignore_errors=False)


funcs = shard_eviction_disabled, shard_eviction_too_high, shard_eviction, manual_shard_eviction,
cache_limit_too_low,
funcs = [
shard_eviction_disabled, shard_eviction_too_high, shard_eviction, manual_shard_eviction,
cache_limit_too_low
]


@pytest.mark.usefixtures('local_remote_dir')
def test_eviction_nozip(local_remote_dir: Tuple[str, str]):
@pytest.mark.parametrize('func', [f for f in funcs])
def test_eviction_nozip(local_remote_dir: Tuple[str, str], func: Any):
num_samples = 5_000
local, remote = local_remote_dir
columns = {'data': 'bytes'}
Expand All @@ -141,12 +144,12 @@ def test_eviction_nozip(local_remote_dir: Tuple[str, str]):
sample = {'data': b'\0'}
out.write(sample)

for func in funcs:
func(remote, local, False)
func(remote, local, False)


@pytest.mark.usefixtures('local_remote_dir')
def test_eviction_zip_nokeep(local_remote_dir: Tuple[str, str]):
@pytest.mark.parametrize('func', [f for f in funcs])
def test_eviction_zip_nokeep(local_remote_dir: Tuple[str, str], func: Any):
num_samples = 5_000
local, remote = local_remote_dir
columns = {'data': 'bytes'}
Expand All @@ -163,12 +166,12 @@ def test_eviction_zip_nokeep(local_remote_dir: Tuple[str, str]):
sample = {'data': b'\0'}
out.write(sample)

for func in funcs:
func(remote, local, False)
func(remote, local, False)


@pytest.mark.usefixtures('local_remote_dir')
def test_eviction_zip_keep(local_remote_dir: Tuple[str, str]):
@pytest.mark.parametrize('func', [f for f in funcs])
def test_eviction_zip_keep(local_remote_dir: Tuple[str, str], func: Any):
num_samples = 5_000
local, remote = local_remote_dir
columns = {'data': 'bytes'}
Expand All @@ -185,5 +188,4 @@ def test_eviction_zip_keep(local_remote_dir: Tuple[str, str]):
sample = {'data': b'\0'}
out.write(sample)

for func in funcs:
func(remote, local, True)
func(remote, local, True)
8 changes: 4 additions & 4 deletions tests/test_laziness.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# SPDX-License-Identifier: Apache-2.0

from shutil import rmtree
from typing import Tuple
from typing import Any, Tuple

import pytest

Expand Down Expand Up @@ -60,7 +60,8 @@ def five(remote: str, local: str):


@pytest.mark.usefixtures('local_remote_dir')
def test_laziness(local_remote_dir: Tuple[str, str]):
@pytest.mark.parametrize('func', [one, two, three, four, five])
def test_laziness(local_remote_dir: Tuple[str, str], func: Any):
num_samples = 10_000
local, remote = local_remote_dir
columns = {'value': 'int'}
Expand All @@ -77,5 +78,4 @@ def test_laziness(local_remote_dir: Tuple[str, str]):
sample = {'value': i}
out.write(sample)

for func in [one, two, three, four, five]:
func(remote, local)
func(remote, local)

0 comments on commit f36318c

Please sign in to comment.