Readme

Files and versions

4.2 KiB

Raw Blame History

Image-Text Retrieval Embdding with SLIP

author: David Wang

Description

This operator extracts features for image or text with SLIP, a multi-task learning framework for combining self-supervised learning and CLIP pre-training. This is an adaptation from facebookresearch/SLIP.

Code Example

Load an image from path './moon.jpg' to generate an image embedding.

Read the text 'moon in the night.' to generate a text embedding.

Write a pipeline with explicit inputs/outputs name specifications:

from towhee import pipe, ops, DataCollection

img_pipe = (
    pipe.input('url')
    .map('url', 'img', ops.image_decode.cv2_rgb())
    .map('img', 'vec', ops.image_text_embedding.slip(model_name='slip_vit_small', modality='image'))
    .output('img', 'vec')
)

text_pipe = (
    pipe.input('text')
    .map('text', 'vec', ops.image_text_embedding.slip(model_name='slip_vit_small', modality='text'))
    .output('text', 'vec')
)

DataCollection(img_pipe('./moon.jpg')).show()
DataCollection(text_pipe('moon in the night.')).show()

Factory Constructor

Create the operator via the following factory method

slip(model_name, modality)

Parameters:

model_name: str

The model name of SLIP. Supported model names:

slip_vit_small
slip_vit_base
slip_vit_large

modality: str

Which modality(image or text) is used to generate the embedding.

Interface

An image-text embedding operator takes a towhee image or string as input and generate an embedding in ndarray.

Parameters:

data: towhee.types.Image (a sub-class of numpy.ndarray) or str

The data (image or text based on specified modality) to generate embedding.

Returns: numpy.ndarray

The data embedding extracted by model.

More Resources

CLIP Object Detection: Merging AI Vision with Language Understanding - Zilliz blog: CLIP Object Detection combines CLIP's text-image understanding with object detection tasks, allowing CLIP to locate and identify objects in images using texts.
Supercharged Semantic Similarity Search in Production - Zilliz blog: Building a Blazing Fast, Highly Scalable Text-to-Image Search with CLIP embeddings and Milvus, the most advanced open-source vector database.
The guide to clip-vit-base-patch32 | OpenAI: clip-vit-base-patch32: a CLIP multimodal model variant by OpenAI for image and text embedding.
Hybrid Search: Combining Text and Image for Enhanced Search Capabilities - Zilliz blog: Milvus enables hybrid sparse and dense vector search and multi-vector search capabilities, simplifying the vectorization and search process.
The guide to all-MiniLM-L12-v2 | Hugging Face: all-MiniLM-L12-v2: a text embedding model ideal for semantic search and RAG and fine-tuned based on Microsoft/MiniLM-L12-H384-uncased
Build a Multimodal Search System with Milvus - Zilliz blog: Implementing a Multimodal Similarity Search System Using Milvus, Radient, ImageBind, and Meta-Chameleon-7b
Sparse and Dense Embeddings: A Guide for Effective Information Retrieval with Milvus | Zilliz Webinar: Zilliz webinar covering what sparse and dense embeddings are and when you'd want to use one over the other.
Image Embeddings for Enhanced Image Search - Zilliz blog: Image Embeddings are the core of modern computer vision algorithms. Understand their implementation and use cases and explore different image embedding models.

4.2 KiB

Raw Blame History

Image-Text Retrieval Embdding with SLIP

author: David Wang

Description

Code Example

Load an image from path './moon.jpg' to generate an image embedding.

Read the text 'moon in the night.' to generate a text embedding.

Write a pipeline with explicit inputs/outputs name specifications:

from towhee import pipe, ops, DataCollection

img_pipe = (
    pipe.input('url')
    .map('url', 'img', ops.image_decode.cv2_rgb())
    .map('img', 'vec', ops.image_text_embedding.slip(model_name='slip_vit_small', modality='image'))
    .output('img', 'vec')
)

text_pipe = (
    pipe.input('text')
    .map('text', 'vec', ops.image_text_embedding.slip(model_name='slip_vit_small', modality='text'))
    .output('text', 'vec')
)

DataCollection(img_pipe('./moon.jpg')).show()
DataCollection(text_pipe('moon in the night.')).show()

Factory Constructor

Create the operator via the following factory method

slip(model_name, modality)

Parameters:

model_name: str

The model name of SLIP. Supported model names:

slip_vit_small
slip_vit_base
slip_vit_large

modality: str

Which modality(image or text) is used to generate the embedding.

Interface

An image-text embedding operator takes a towhee image or string as input and generate an embedding in ndarray.

Parameters:

data: towhee.types.Image (a sub-class of numpy.ndarray) or str

The data (image or text based on specified modality) to generate embedding.

Returns: numpy.ndarray

The data embedding extracted by model.

More Resources

CLIP Object Detection: Merging AI Vision with Language Understanding - Zilliz blog: CLIP Object Detection combines CLIP's text-image understanding with object detection tasks, allowing CLIP to locate and identify objects in images using texts.
Supercharged Semantic Similarity Search in Production - Zilliz blog: Building a Blazing Fast, Highly Scalable Text-to-Image Search with CLIP embeddings and Milvus, the most advanced open-source vector database.
The guide to clip-vit-base-patch32 | OpenAI: clip-vit-base-patch32: a CLIP multimodal model variant by OpenAI for image and text embedding.
Hybrid Search: Combining Text and Image for Enhanced Search Capabilities - Zilliz blog: Milvus enables hybrid sparse and dense vector search and multi-vector search capabilities, simplifying the vectorization and search process.
The guide to all-MiniLM-L12-v2 | Hugging Face: all-MiniLM-L12-v2: a text embedding model ideal for semantic search and RAG and fine-tuned based on Microsoft/MiniLM-L12-H384-uncased
Build a Multimodal Search System with Milvus - Zilliz blog: Implementing a Multimodal Similarity Search System Using Milvus, Radient, ImageBind, and Meta-Chameleon-7b
Sparse and Dense Embeddings: A Guide for Effective Information Retrieval with Milvus | Zilliz Webinar: Zilliz webinar covering what sparse and dense embeddings are and when you'd want to use one over the other.
Image Embeddings for Enhanced Image Search - Zilliz blog: Image Embeddings are the core of modern computer vision algorithms. Understand their implementation and use cases and explore different image embedding models.

Readme

Files and versions

4.2 KiB Raw Blame History

Image-Text Retrieval Embdding with SLIP

Description

Code Example

Factory Constructor

Interface

More Resources

4.2 KiB Raw Blame History

Image-Text Retrieval Embdding with SLIP

Description

Code Example

Factory Constructor

Interface

More Resources

4.2 KiB

Raw Blame History

4.2 KiB

Raw Blame History