Feature brainstorming #4

dmarx · 2022-04-09T05:16:42Z

loss scoring
multi-perceptor
weighted multi-perceptor
cutout methods? + augs? make that an independent library maybe?
perceptor weight interpolations/schedules - https://discord.com/channels/729741769192767510/730484623028519072/956979309686423602
API should be agnostic wrt media type, i.e. contrasting modalities could both be text, or one be audio and other video, etc.
optionally augment w positional information/embeddings?
Maybe some minimal translation API to facilitate use by non-english users and conversely support for non-english models
- see aphantasia's SBERT utilization: https://github.com/eps696/aphantasia
Check for installed/available CLIP, use vendored if not available

dmarx · 2022-04-09T05:17:34Z

let's not boil the ocean. goals for MVP:

centralize stuff
simplify downloading and installing
- assuming target use case is notebooks: don't even need to worry about updating or checking for existing models for immediate use case

MVP is basically just a git clone --recurse-submodules with maybe a few bells and whistles.

dmarx · 2022-04-09T05:17:51Z

imagining usage...

import perceptors as pct

pct.available_models() # list all models
pct.available_models('clip') # pattern match

clip_rn50 = pct.Perceptor('clip_rn50') # load a model
clip_vit16 = pct.Perceptor('clip_vit16') # load another

# combine models for multi-clip
multi_clip = clip_rn50 + clip_vit16

# adjust model-specific weight
multi_clip.set_weight('clip_vit16', .1) # set weight by name
multi_clip.set_weight(0, .5) # set weight by index

# manage models
multi_clip += pct.Perceptor('clip_rn101') # add another model algebraically
multi_clip.bind('clip_vit32') # add another clip model by name
multi_clip.unbind('clip_vit16') # dissociate a bound model by name

text = clip_rn50.tokenize_text('foo bar')
text_emb = clip_rn50.embed_text('foo bar')

img_emb = clip_rn50.embed_image('path/to/image')
img_emb = clip_rn50.embed_image(img: torch.Tensor)
img_emb = clip_rn50.embed_image(img: PIL.Image)

multi_clip.embed_text('foo bar')
multi_clip.embed_image(img: ...)

apolinario · 2022-04-19T08:46:39Z

One small issue people had when they were adding SLIP to many different text-to-image notebooks and code-bases was that the input resolution wasn't part of the model

So you see things like this on Disco Diffusion for e.g.:

#when using SLIP Base model the dimensions need to be hard coded to avoid AttributeError: 'VisionTransformer' object has no attribute 'input_resolution'
          try:
                input_resolution=model_stat["clip_model"].visual.input_resolution
          except:
                input_resolution=224

I feel having a default but user-changeable input resolution per model model if the model itself doesn't present one could be part of the feature-list

dmarx · 2022-04-19T10:24:55Z

100%, I already encountered this issue with other CLIP providers too. I tracked down the code snippet in the original openai release that calculates this, but I like the idea of a default attribute too

apolinario · 2022-04-19T14:30:49Z

Another point in reference to usage:
I feel there could be two ways of using it. One way very similar to how you wrote at imagining usage, but the other I feel it could be identical to OpenAI's CLIP. It may be the case that this wouldn't allow for some of the fancy combinations of perceptors (although I feel this could be bridged), but on the other hand this would allow for a snappy adoption.

Someone could just replace the from CLIP import clip to from mmc import clip and everything would work automatically with a bunch of more perceptors out of the box. Could be a way to entry to then say "hey now that you are using this library, why not replace your custom multi-perceptor code with this one"

dmarx · 2022-04-30T13:11:40Z

This is a great idea. I've noticed that there seem to be two "families" of CLIP implementations: codebases based on openai/CLIP, and codebases based on huggingface's CLIP. Rather than changing the classes we have now, maybe we could add a wrapper class or decorator for specifying if a user wants an interface that resembles a common model family. This way, we could keep using the modality-agnostic system and leverage similar wrappers for making drop-in-able tools for tasks beyond TTI. Is that contrived? How this might look: ``` my_mmc = ...loading code my_mmc = mmc.api_wrappers.openai_clip(my_mmc) ``` Or actually... i guess there's no reason we couldn't go a step further and wrap the multi-mmc to make convenience classes that are pinned to specific modalities and emulate the desired APIs. I think this is closer to what you originally had in mind. The more I think about this, the more I like it.

…

On Tue, Apr 19, 2022, 07:31 apolinario ***@***.***> wrote: Another point in reference to usage: I feel there could be two ways of using it. One way very similar to how you wrote at imagining usage, but the other I feel it could be identical to OpenAI's CLIP. It may be the case that this wouldn't allow for some of the fancy combinations of perceptors (although I feel this could be bridged), but on the other hand this would allow for a snappy adoption. Someone could just replace the from CLIP import clip to from mmc import clip and everything would work automatically with a bunch of more perceptors out of the box. Could be a way to entry to then say "hey now that you are using this library, why not replace your custom multi-perceptor code with this one" — Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALGEAJ5JDOGEJIOU7M76Z3VF27SHANCNFSM5S6LZGDA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

dmarx mentioned this issue Apr 19, 2022

Add input image resolution attribute for image-compatible models #12

Closed

dmarx mentioned this issue May 13, 2022

improved packaging #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature brainstorming #4

Feature brainstorming #4

dmarx commented Apr 9, 2022 •

edited

Loading

dmarx commented Apr 9, 2022

dmarx commented Apr 9, 2022

apolinario commented Apr 19, 2022 •

edited

Loading

dmarx commented Apr 19, 2022

apolinario commented Apr 19, 2022

dmarx commented Apr 30, 2022 via email

Feature brainstorming #4

Feature brainstorming #4

Comments

dmarx commented Apr 9, 2022 • edited Loading

dmarx commented Apr 9, 2022

dmarx commented Apr 9, 2022

apolinario commented Apr 19, 2022 • edited Loading

dmarx commented Apr 19, 2022

apolinario commented Apr 19, 2022

dmarx commented Apr 30, 2022 via email

dmarx commented Apr 9, 2022 •

edited

Loading

apolinario commented Apr 19, 2022 •

edited

Loading