Skip to content

Commit

Permalink
Merge pull request #68 from luxonis/dev
Browse files Browse the repository at this point in the history
DataDreamer - v0.2.0
  • Loading branch information
sokovninn authored Nov 12, 2024
2 parents ef5c55a + 0ebcd68 commit 23e18d4
Show file tree
Hide file tree
Showing 57 changed files with 5,226 additions and 1,187 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/gar-publish-dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:

steps:
- name: 'Checkout GitHub Action'
uses: actions/checkout@main
uses: actions/checkout@v4

- id: 'auth'
name: 'Authenticate to Google Cloud'
Expand All @@ -34,5 +34,5 @@ jobs:
- name: 'Build Inventory Image'
working-directory: .
run: |
docker build --build-arg GITHUB_TOKEN=${{secrets.GHCR_PAT}} . --tag $GAR_LOCATION-docker.pkg.dev/$PROJECT_ID/internal/datadreamer:dev
docker push $GAR_LOCATION-docker.pkg.dev/$PROJECT_ID/internal/datadreamer --all-tags
docker build --build-arg GITHUB_TOKEN=${{secrets.GHCR_PAT}} --build-arg BRANCH=${{ github.ref_name }} . --tag $GAR_LOCATION-docker.pkg.dev/$PROJECT_ID/internal/datadreamer:dev
docker push $GAR_LOCATION-docker.pkg.dev/$PROJECT_ID/internal/datadreamer --all-tags
2 changes: 0 additions & 2 deletions .github/workflows/gar-publish.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ name: Deploy single image to GAR (Google Artifact Registry)

on:
workflow_dispatch:
release:
types: [published]
env:
PROJECT_ID: easyml-394818
GAR_LOCATION: us-central1
Expand Down
41 changes: 41 additions & 0 deletions .github/workflows/ghcr-publish-manual.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Manually deploy image to GHCR

on:
workflow_dispatch:
inputs:
branch:
description: 'Branch to deploy'
required: true
default: 'dev'

env:
GHCR_REGISTRY: ghcr.io
IMAGE_NAME: datadreamer

jobs:
push-store:
name: Push the image to GHCR
runs-on: ubuntu-latest

steps:
- name: 'Checkout GitHub Action'
uses: actions/checkout@v2
with:
ref: ${{ inputs.branch }} # Checkout the selected branch

- name: 'Extract short commit hash'
id: commit_hash
run: echo "short_hash=$(git rev-parse --short HEAD)" >> $GITHUB_ENV

- name: Docker login to GHCR
uses: docker/login-action@v3
with:
registry: ghcr.io
username: luxonis-ml
password: ${{ secrets.GHCR_PAT }}

- name: 'Build and Push Image to GHCR'
run: |
docker build --build-arg GITHUB_TOKEN=${{secrets.GHCR_PAT}} --build-arg BRANCH=${{ inputs.branch }} . \
--tag ghcr.io/luxonis/datadreamer:${{ steps.commit_hash.outputs.short_hash }}
docker push ghcr.io/luxonis/datadreamer --all-tags
2 changes: 1 addition & 1 deletion .github/workflows/ghcr-publish.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Docker Build and Publish
name: Deploy latest image to GHCR on release

on:
workflow_dispatch:
Expand Down
34 changes: 17 additions & 17 deletions .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,23 @@ name: Tests

on:
pull_request:
branches: [ dev, main ]
branches: [ main ]
paths:
- 'datadreamer/**/**.py'
- 'tests/**/**.py'
- 'tests/core_tests/**/**.py'
- .github/workflows/tests.yaml
workflow_dispatch:

jobs:
run_tests:
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macOS-latest]
os: [buildjet-8vcpu-ubuntu-2204, windows-latest, macOS-latest]
version: ['3.10', '3.11']
exclude:
- os: buildjet-8vcpu-ubuntu-2204
version: '3.11'

runs-on: ${{ matrix.os }}

Expand All @@ -31,56 +35,52 @@ jobs:
cache: pip

- name: Install dependencies [Ubuntu]
if: matrix.os == 'ubuntu-latest'
if: matrix.os == 'buildjet-8vcpu-ubuntu-2204'
run: |
sudo apt update
sudo apt install -y pandoc
pip install -e .[dev]
pip install coverage-badge>=1.1.0 pytest-cov>=4.1.0
- name: Install dependencies [Windows]
if: matrix.os == 'windows-latest'
run: |
pip install -e .[dev]
pip install coverage-badge>=1.1.0 pytest-cov>=4.1.0
- name: Install dependencies [macOS]
if: matrix.os == 'macOS-latest'
run: |
pip install -e .[dev]
pip install coverage-badge>=1.1.0 pytest-cov>=4.1.0
- name: Run tests with coverage [Ubuntu]
if: matrix.os == 'ubuntu-latest' && matrix.version == '3.10'
run: pytest tests --cov=datadreamer --cov-report xml --junit-xml pytest.xml
if: matrix.os == 'buildjet-8vcpu-ubuntu-2204' && matrix.version == '3.10'
run: pytest tests/core_tests --cov=datadreamer --cov-report xml --junit-xml pytest.xml

- name: Run tests [Windows, macOS]
if: matrix.os != 'ubuntu-latest' || matrix.version != '3.10'
run: pytest tests --junit-xml pytest.xml
if: matrix.os != 'buildjet-8vcpu-ubuntu-2204'
run: pytest tests/core_tests --junit-xml pytest.xml

- name: Generate coverage badge [Ubuntu]
if: matrix.os == 'ubuntu-latest' && matrix.version == '3.10'
if: matrix.os == 'buildjet-8vcpu-ubuntu-2204' && matrix.version == '3.10'
run: coverage-badge -o media/coverage_badge.svg -f

- name: Generate coverage report [Ubuntu]
if: matrix.os == 'ubuntu-latest' && matrix.version == '3.10'
if: matrix.os == 'buildjet-8vcpu-ubuntu-2204' && matrix.version == '3.10'
uses: orgoro/[email protected]
with:
coverageFile: coverage.xml
token: ${{ secrets.GITHUB_TOKEN }}

- name: Commit coverage badge [Ubuntu]
if: matrix.os == 'ubuntu-latest' && matrix.version == '3.10'
if: matrix.os == 'buildjet-8vcpu-ubuntu-2204' && matrix.version == '3.10'
run: |
git config --global user.name 'GitHub Actions'
git config --global user.email '[email protected]'
git diff --quiet media/coverage_badge.svg || {
git add media/coverage_badge.svg
git commit -m "[Automated] Updated coverage badge"
}
- name: Push changes [Ubuntu]
if: matrix.os == 'ubuntu-latest' && matrix.version == '3.10'
if: matrix.os == 'buildjet-8vcpu-ubuntu-2204' && matrix.version == '3.10'
uses: ad-m/github-push-action@master
with:
branch: ${{ github.head_ref }}
Expand Down Expand Up @@ -117,4 +117,4 @@ jobs:
- name: Publish Test Results
uses: EnricoMi/publish-unit-test-result-action@v2
with:
files: "artifacts/**/*.xml"
files: "artifacts/**/*.xml"
116 changes: 116 additions & 0 deletions .github/workflows/unit-tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
name: Unit tests

on:
pull_request:
branches: [ dev ]
paths:
- 'datadreamer/**/**.py'
- 'tests/core_tests/unittests/**.py'
- .github/workflows/unit-tests.yaml

jobs:
run_tests:
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macOS-latest]
version: ['3.10', '3.11']

runs-on: ${{ matrix.os }}

steps:
- name: Checkout
uses: actions/checkout@v4
with:
ref: ${{ github.head_ref }}

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.version }}
cache: pip

- name: Install dependencies [Ubuntu]
if: matrix.os == 'ubuntu-latest'
run: |
sudo apt update
sudo apt install -y pandoc
pip install -e .[dev]
pip install coverage-badge>=1.1.0 pytest-cov>=4.1.0
- name: Install dependencies [Windows]
if: matrix.os == 'windows-latest'
run: |
pip install -e .[dev]
pip install coverage-badge>=1.1.0 pytest-cov>=4.1.0
- name: Install dependencies [macOS]
if: matrix.os == 'macOS-latest'
run: |
pip install -e .[dev]
pip install coverage-badge>=1.1.0 pytest-cov>=4.1.0
- name: Run tests with coverage [Ubuntu]
if: matrix.os == 'ubuntu-latest' && matrix.version == '3.10'
run: pytest tests/core_tests/unittests --cov=datadreamer --cov-report xml --junit-xml pytest.xml

- name: Run tests [Windows, macOS]
if: matrix.os != 'ubuntu-latest' || matrix.version != '3.10'
run: pytest tests/core_tests/unittests --junit-xml pytest.xml

- name: Generate coverage badge [Ubuntu]
if: matrix.os == 'ubuntu-latest' && matrix.version == '3.10'
run: coverage-badge -o media/coverage_badge.svg -f

- name: Generate coverage report [Ubuntu]
if: matrix.os == 'ubuntu-latest' && matrix.version == '3.10'
uses: orgoro/[email protected]
with:
coverageFile: coverage.xml
token: ${{ secrets.GITHUB_TOKEN }}

- name: Commit coverage badge [Ubuntu]
if: matrix.os == 'ubuntu-latest' && matrix.version == '3.10'
run: |
git config --global user.name 'GitHub Actions'
git config --global user.email '[email protected]'
git diff --quiet media/coverage_badge.svg || {
git add media/coverage_badge.svg
git commit -m "[Automated] Updated coverage badge"
}
- name: Push changes [Ubuntu]
if: matrix.os == 'ubuntu-latest' && matrix.version == '3.10'
uses: ad-m/github-push-action@master
with:
branch: ${{ github.head_ref }}

- name: Upload Test Results
if: always()
uses: actions/upload-artifact@v4
with:
name: Test Results [${{ matrix.os }}] (Python ${{ matrix.version }})
path: pytest.xml
retention-days: 10
if-no-files-found: error

publish-test-results:
name: "Publish Tests Results"
needs: run_tests
runs-on: ubuntu-latest
permissions:
checks: write
pull-requests: write
if: always()

steps:
- name: Checkout
uses: actions/checkout@v4
with:
ref: ${{ github.head_ref }}

- name: Download Artifacts
uses: actions/download-artifact@v4
with:
path: artifacts

- name: Publish Test Results
uses: EnricoMi/publish-unit-test-result-action@v2
with:
files: "artifacts/**/*.xml"
9 changes: 7 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,14 @@ WORKDIR /app
## instal
RUN apt-get update && apt-get install ffmpeg libsm6 libxext6 -y
RUN apt-get install -y git
RUN git clone https://github.com/luxonis/datadreamer.git -b main

## Define a build argument for the branch, defaulting to "main"
ARG BRANCH=main

## Clone the repository with the specified branch
RUN git clone --branch ${BRANCH} https://github.com/luxonis/datadreamer.git

RUN cd datadreamer && pip install .

## define image execution
ENTRYPOINT ["datadreamer"]
ENTRYPOINT ["datadreamer"]
30 changes: 26 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,13 +157,13 @@ datadreamer --config <path-to-config>

### 🔧 Additional Parameters

- `--task`: Choose between detection and classification. Default is `detection`.
- `--task`: Choose between detection, classification and instance segmentation. Default is `detection`.
- `--dataset_format`: Format of the dataset. Defaults to `raw`. Supported values: `raw`, `yolo`, `coco`, `luxonis-dataset`, `cls-single`.
- `--split_ratios`: Split ratios for train, validation, and test sets. Defaults to `[0.8, 0.1, 0.1]`.
- `--num_objects_range`: Range of objects in a prompt. Default is 1 to 3.
- `--prompt_generator`: Choose between `simple`, `lm` (language model) and `tiny` (tiny LM). Default is `simple`.
- `--prompt_generator`: Choose between `simple`, `lm` (Mistral-7B), `tiny` (tiny LM), and `qwen2` (Qwen2.5 LM). Default is `qwen2`.
- `--image_generator`: Choose image generator, e.g., `sdxl`, `sdxl-turbo` or `sdxl-lightning`. Default is `sdxl-turbo`.
- `--image_annotator`: Specify the image annotator, like `owlv2` for object detection or `clip` for image classification. Default is `owlv2`.
- `--image_annotator`: Specify the image annotator, like `owlv2` for object detection or `clip` for image classification or `owlv2-slimsam` for instance segmentation. Default is `owlv2`.
- `--conf_threshold`: Confidence threshold for annotation. Default is `0.15`.
- `--annotation_iou_threshold`: Intersection over Union (IoU) threshold for annotation. Default is `0.2`.
- `--prompt_prefix`: Prefix to add to every image generation prompt. Default is `""`.
Expand All @@ -175,6 +175,8 @@ datadreamer --config <path-to-config>
- `--image_tester_patience`: Patience level for image tester. Default is `1`.
- `--lm_quantization`: Quantization to use for Mistral language model. Choose between `none` and `4bit`. Default is `none`.
- `--annotator_size`: Size of the annotator model to use. Choose between `base` and `large`. Default is `base`.
- `--disable_lm_filter`: Use only a bad word list for profanity filtering. Default is `False`.
- `--keep_unlabeled_images`: Whether to keep images without any annotations. Default if `False`.
- `--batch_size_prompt`: Batch size for prompt generation. Default is 64.
- `--batch_size_annotation`: Batch size for annotation. Default is `1`.
- `--batch_size_image`: Batch size for image generation. Default is `1`.
Expand All @@ -190,12 +192,15 @@ datadreamer --config <path-to-config>
| ----------------- | ------------------------------------------------------------------------------------- | --------------------------------------- |
| Prompt Generation | [Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) | Semantically rich prompts |
| | [TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) | Tiny LM |
| | [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) | Qwen2.5 LM |
| | Simple random generator | Joins randomly chosen object names |
| Profanity Filter | [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) | Fast and accurate LM profanity filter |
| Image Generation | [SDXL-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) | Slow and accurate (1024x1024 images) |
| | [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo) | Fast and less accurate (512x512 images) |
| | [SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning) | Fast and accurate (1024x1024 images) |
| Image Annotation | [OWLv2](https://huggingface.co/google/owlv2-base-patch16-ensemble) | Open-Vocabulary object detector |
| | [CLIP](https://huggingface.co/openai/clip-vit-base-patch32) | Zero-shot-image-classification |
| | [SlimSAM](https://huggingface.co/Zigeng/SlimSAM-uniform-50) | Zero-shot-instance-segmentation |

<a name="example"></a>

Expand Down Expand Up @@ -268,6 +273,23 @@ save_dir/
}
```

3. Instance Segmentation Annotations (instance_segmentation_annotations.json):

- Each entry corresponds to an image and contains bounding boxes, masks and labels for objects in the image.
- Format:

```bash
{
"image_path": {
"boxes": [[x_min, y_min, x_max, y_max], ...],
"masks": [[[x0, y0],[x1, y1],...], [[x0, y0],[x1, y1],...], ....]
"labels": [label_index, ...]
},
...
"class_names": ["class1", "class2", ...]
}
```

<a name="limitations"></a>

## ⚠️ Limitations
Expand All @@ -292,7 +314,7 @@ The above license does not cover the models. Please see the license of each mode

## 🙏 Acknowledgements

This library was made possible by the use of several open-source projects, including Transformers, Diffusers, and others listed in the requirements.txt.
This library was made possible by the use of several open-source projects, including Transformers, Diffusers, and others listed in the requirements.txt. Furthermore, we utilized a bad words list from [`@coffeeandfun/google-profanity-words`](https://github.com/coffee-and-fun/google-profanity-words) Node.js module created by Robert James Gabriel from Coffee & Fun LLC.

[SD-XL 1.0 License](https://github.com/Stability-AI/generative-models/blob/main/model_licenses/LICENSE-SDXL1.0)
[SDXL-Turbo License](https://github.com/Stability-AI/generative-models/blob/main/model_licenses/LICENSE-SDXL-Turbo)
9 changes: 8 additions & 1 deletion datadreamer/dataset_annotation/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,12 @@
from .clip_annotator import CLIPAnnotator
from .image_annotator import BaseAnnotator, TaskList
from .owlv2_annotator import OWLv2Annotator
from .slimsam_annotator import SlimSAMAnnotator

__all__ = ["BaseAnnotator", "TaskList", "OWLv2Annotator", "CLIPAnnotator"]
__all__ = [
"BaseAnnotator",
"TaskList",
"OWLv2Annotator",
"CLIPAnnotator",
"SlimSAMAnnotator",
]
Loading

0 comments on commit 23e18d4

Please sign in to comment.