Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add open-clip #130

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
153 changes: 153 additions & 0 deletions open_clip/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
logs/
wandb/
models/
features/
results/

tests/data/
*.pt

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/
sync.sh
gpu1sync.sh
.idea
*.pdf
**/._*
**/*DS_*
**.jsonl
src/sbatch
src/misc
.vscode
src/debug
core.*

# Allow
!src/evaluation/misc/results_dbs/*
33 changes: 33 additions & 0 deletions open_clip/CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
cff-version: 1.1.0
message: If you use this software, please cite it as below.
authors:
- family-names: Ilharco
given-names: Gabriel
- family-names: Wortsman
given-names: Mitchell
- family-names: Wightman
given-names: Ross
- family-names: Gordon
given-names: Cade
- family-names: Carlini
given-names: Nicholas
- family-names: Taori
given-names: Rohan
- family-names: Dave
given-names: Achal
- family-names: Shankar
given-names: Vaishaal
- family-names: Namkoong
given-names: Hongseok
- family-names: Miller
given-names: John
- family-names: Hajishirzi
given-names: Hannaneh
- family-names: Farhadi
given-names: Ali
- family-names: Schmidt
given-names: Ludwig
title: OpenCLIP
version: v0.1
doi: 10.5281/zenodo.5143773
date-released: 2021-07-28
180 changes: 180 additions & 0 deletions open_clip/HISTORY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
## 2.19.0

* Add DataComp models

## 2.18.0

* Enable int8 inference without `.weight` attribute

## 2.17.2

* Update push_to_hf_hub

## 2.17.0

* Add int8 support
* Update notebook demo
* Refactor zero-shot classification code

## 2.16.2

* Fixes for context_length and vocab_size attributes

## 2.16.1

* Fixes for context_length and vocab_size attributes
* Fix --train-num-samples logic
* Add HF BERT configs for PubMed CLIP model

## 2.16.0

* Add improved g-14 weights
* Update protobuf version

## 2.15.0

* Add convnext_xxlarge weights
* Fixed import in readme
* Add samples per second per gpu logging
* Fix slurm example

## 2.14.0

* Move dataset mixtures logic to shard level
* Fix CoCa accum-grad training
* Safer transformers import guard
* get_labels refactoring

## 2.13.0

* Add support for dataset mixtures with different sampling weights
* Make transformers optional again

## 2.12.0

* Updated convnext configs for consistency
* Added input_patchnorm option
* Clean and improve CoCa generation
* Support model distillation
* Add ConvNeXt-Large 320x320 fine-tune weights

## 2.11.1

* Make transformers optional
* Add MSCOCO CoCa finetunes to pretrained models

## 2.11.0

* coca support and weights
* ConvNeXt-Large weights

## 2.10.1

* `hf-hub:org/model_id` support for loading models w/ config and weights in Hugging Face Hub

## 2.10.0

* Added a ViT-bigG-14 model.
* Added an up-to-date example slurm script for large training jobs.
* Added a option to sync logs and checkpoints to S3 during training.
* New options for LR schedulers, constant and constant with cooldown
* Fix wandb autoresuming when resume is not set
* ConvNeXt `base` & `base_w` pretrained models added
* `timm-` model prefix removed from configs
* `timm` augmentation + regularization (dropout / drop-path) supported

## 2.9.3

* Fix wandb collapsing multiple parallel runs into a single one

## 2.9.2

* Fix braceexpand memory explosion for complex webdataset urls

## 2.9.1

* Fix release

## 2.9.0

* Add training feature to auto-resume from the latest checkpoint on restart via `--resume latest`
* Allow webp in webdataset
* Fix logging for number of samples when using gradient accumulation
* Add model configs for convnext xxlarge

## 2.8.2

* wrapped patchdropout in a torch.nn.Module

## 2.8.1

* relax protobuf dependency
* override the default patch dropout value in 'vision_cfg'

## 2.8.0

* better support for HF models
* add support for gradient accumulation
* CI fixes
* add support for patch dropout
* add convnext configs


## 2.7.0

* add multilingual H/14 xlm roberta large

## 2.6.1

* fix setup.py _read_reqs

## 2.6.0

* Make openclip training usable from pypi.
* Add xlm roberta large vit h 14 config.

## 2.5.0

* pretrained B/32 xlm roberta base: first multilingual clip trained on laion5B
* pretrained B/32 roberta base: first clip trained using an HF text encoder

## 2.4.1

* Add missing hf_tokenizer_name in CLIPTextCfg.

## 2.4.0

* Fix #211, missing RN50x64 config. Fix type of dropout param for ResNet models
* Bring back LayerNorm impl that casts to input for non bf16/fp16
* zero_shot.py: set correct tokenizer based on args
* training/params.py: remove hf params and get them from model config

## 2.3.1

* Implement grad checkpointing for hf model.
* custom_text: True if hf_model_name is set
* Disable hf tokenizer parallelism

## 2.3.0

* Generalizable Text Transformer with HuggingFace Models (@iejMac)

## 2.2.0

* Support for custom text tower
* Add checksum verification for pretrained model weights

## 2.1.0

* lot including sota models, bfloat16 option, better loading, better metrics

## 1.2.0

* ViT-B/32 trained on Laion2B-en
* add missing openai RN50x64 model

## 1.1.1

* ViT-B/16+
* Add grad checkpointing support
* more robust data loader
23 changes: 23 additions & 0 deletions open_clip/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
Copyright (c) 2012-2021 Gabriel Ilharco, Mitchell Wortsman,
Nicholas Carlini, Rohan Taori, Achal Dave, Vaishaal Shankar,
John Miller, Hongseok Namkoong, Hannaneh Hajishirzi, Ali Farhadi,
Ludwig Schmidt

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
3 changes: 3 additions & 0 deletions open_clip/MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
include src/open_clip/bpe_simple_vocab_16e6.txt.gz
include src/open_clip/model_configs/*.json

Loading