paraphernalia.torch.clip module¶

Evaluate images with CLIP.

class CLIP(prompt, anti_prompt=None, detail=None, use_tiling=True, macro=0.5, chops=64, model='ViT-B/32', device=None)[source]¶

A CLIP-based perceptor that evaluates how well an image fits with on or more target text prompts.

The underlying model is limited to (224, 224) resolution, so this class presents it with multiple perspectives on an image:

Macro: random crops of 90-100% of the image, used to counteract aliasing
Micro: small near-pixel-perfect random crops, and an optional tiling to enable the fine details of high-resolution images to be processed.

A lot of internals are exposed via methods to facilitate debugging and experimentation.

Parameters

prompt (Union[str, List[str]]) – the text prompt to use in general
anti_prompt (Optional[Union[str, List[str]]]) – a description to avoid
detail (Optional[Union[str, List[str]]]) – a text prompt to use for micro-perception, defaults to “A detail from a picture of {prompt}”
use_tiling (bool) –
macro (float) –
chops (int) –
model (str) –
device (Optional[str]) –

use_tiling¶

if true, add a covering of near-pixel-perfect perceptors into the mix

chops¶

augmentation operations, these get split 50-50 between macro and micro

Initializes internal Module state, shared by both nn.Module and ScriptModule.

encode_text(text_or_texts)[source]¶

Encode text.

Returns a detached tensor.

encode_image(batch)[source]¶

Encode an image.

Does not detach.

get_macro(img)[source]¶

Get a set of high-level views on an image batch.

get_micro(img)[source]¶

Get a set of detailed (near pixel-perfect) views on an image batch.

get_similarity(img, prompts, batch_size, match='all')[source]¶

Compute the average similarity between a combined but contiguous batch of images and set of prompts.

Parameters

imgs (Tensor) – A combined-but-contiguous image batch with shape (batch_size * t, c, h, w)
prompts (Tensor) – A tensor of prompt embeddings with shape (n, 512)
batch_size (int) – The size of the original image batch
match (str) – Policy for multiple prompts. “any”, “all” or (in future) “one”
img (torch.Tensor) –

Returns

A tensor of average similarities with shape (batch_size,)

Return type

Tensor

forward(img)[source]¶

Returns a similarity (0, 1) for each image in the provided batch.

TODO:

- Enable micro/macro weighting beyond what we get naturally from chops
- Add some kind of masking

paraphernalia.torch package

paraphernalia.torch.dall_e module