description |
---|
Guidelines for open source enthusiasts to contribute to our open-source data format. |
Deep Lake relies on feedback and contributions from our wonderful community. Let's make it amazing with your help! Any and all contributions are appreciated, including code profiling, refactoring, and tests.
We love feedback! Please join our Slack Community or raise an issue in Github.
Clone the repository:
git clone https://github.com/activeloopai/deeplake
cd deeplake
If you are using Linux, install environment dependencies:
apt-get -y update
apt-get -y install git wget build-essential python-setuptools python3-dev libjpeg-dev libpng-dev zlib1g-dev
apt install build-essential
If you are planning to work on videos, install codecs:
apt-get install -y ffmpeg libavcodec-dev libavformat-dev libswscale-dev
Install the package locally with plugins and development dependencies:
pip install -r deeplake/requirements/plugins.txt
pip install -r deeplake/requirements/tests.txt
pip install -e .
Run local tests to ensure everything is correct:
pytest -x --local .
You can use docker-compose for running tests
docker-compose -f ./bin/docker-compose.yaml up --build local
and even work inside the docker by building the image and bashing into.
docker build -t activeloop-deeplake:latest -f ./bin/Dockerfile.dev .
docker run -it -v $(pwd):/app activeloop-deeplake:latest bash
$ python3 -c "import deeplake"
Now changes done on your local files will be directly reflected into the package running inside the docker.
Deep Lake uses the black python linter. You can auto-format your code by running pip install black
, and the run black .
inside the directory you want to format.
Deep Lake uses Google Docstrings. Please refer to this example to learn more.
Deep Lake uses static typing for function arguments/variables for better code readability. Deep Lake has a GitHub action that runs mypy .
, which runs similar to pytest .
to check for valid static typing. You can refer to mypy documentation for more information.
Deep Lake uses pytest for tests. In order to make it easier to contribute, Deep Lake also has a set of custom options defined here.
- Understand how to write pytest tests.
- Understand what a pytest fixture is.
- Understand what pytest parametrizations are.
To see a list of Deep Lake's custom pytest options, run this command: pytest -h | sed -En '/custom options:/,/\[pytest\] ini\-options/p'
.
You can find more information on pytest fixtures here.
memory_storage
: If--memory-skip
is provided, tests with this fixture will be skipped. Otherwise, the test will run with only aMemoryProvider
.local_storage
: If--local
is not provided, tests with this fixture will be skipped. Otherwise, the test will run with only aLocalProvider
.s3_storage
: If--s3
is not provided, tests with this fixture will be skipped. Otherwise, the test will run with only anS3Provider
.storage
: All tests that use thestorage
fixture will be parametrized with the enabledStorageProvider
s (enabled via options defined below). If--cache-chains
is provided,storage
may also be a cache chain. Cache chains have the same interface asStorageProvider
, but instead of just a single provider, it is multiple chained in a sequence, where the last provider in the chain is considered the actual storage.ds
: The same as thestorage
fixture, but the storages that are parametrized are wrapped with aDataset
.
Each StorageProvider
/Dataset
that is created for a test via a fixture will automatically have a root created, and it will be destroyed after the test. If you want to keep this data after the test run, you can use the --keep-storage
option.
Single storage provider fixture:
def test_memory(memory_storage):
# Test will skip if `--memory-skip` is provided
memory_storage["key"] = b"1234" # This data will only be stored in memory
def test_local(local_storage):
# Test will skip if `--local` is not provided
memory_storage["key"] = b"1234" # This data will only be stored locally
def test_local(s3_storage):
# Test will skip if `--s3` is not provided
# Test will fail if credentials are not provided
memory_storage["key"] = b"1234" # This data will only be stored in s3
Multiple storage providers/cache chains:
from deeplake.core.tests.common import parametrize_all_storages, parametrize_all_caches, parametrize_all_storages_and_caches
@parametrize_all_storages
def test_storage(storage):
# Storage will be parametrized with all enabled `StorageProvider`s
pass
@parametrize_all_caches
def test_caches(storage):
# Storage will be parametrized with all common caches containing enabled `StorageProvider`s
pass
@parametrize_all_storages_and_caches
def test_storages_and_caches(storage):
# Storage will be parametrized with all enabled `StorageProvider`s and common caches containing enabled `StorageProvider`s
pass
Dataset storage providers/cache chains:
from deeplake.core.tests.common import parametrize_all_dataset_storages, parametrize_all_dataset_storages_and_caches
@parametrize_all_dataset_storages
def test_dataset(ds):
# `ds` will be parametrized with 1 `Dataset` object per enabled `StorageProvider`
pass
@parametrize_all_dataset_storages_and_caches
def test_dataset(ds):
# `ds` will be parametrized with 1 `Dataset` object per enabled `StorageProvider` and all cache chains containing enabled `StorageProvider`s
pass
Deep Lake uses pytest-benchmark for benchmarking, which is a plugin for pytest.
Deep Lake would not be possible without the work of our community.