logo

Text-Image Search

The main objective of Text-Image Search is to find the relevant images by textual description. Both image and texual caption can be embedded into the same embedding space. Then their similiarty can be got from the distance of corresponding embedding vectors.

Image credit: Comment of a cat object

text-image-search

Models

Model(s)

coco_1k_r1

coco_1k_r5

coco_1k_r10

coco_5k_r1

coco_5k_r5

coco_5k_r10

dim

eccv_map_at_r

eccv_rprecision

Model(s) from

Evaluation

For each model of text-image search, we evaluate its performance on MS COCO dataset by using the method in ECCV Caption. Details can be refered in its publication and project.

Built-in Pipeline

We can use the built-in pipeline to generate text and image embeddings with different modality, insert image embeddings into the vector database, and search related images results in the vector database by text content. More details refer to Text Image Search Pipeline Example.

Example

Generate Image Modality Embeddings

We can use the built-in text_image_embedding pipeline to get image modality embedding, which will use the clip_vit_base_patch16 model default to generate embedding for one image or batch-generate embeddings for multi-images.

from towhee import AutoPipes

# get the built-in text_image_embedding pipeline
image_pipe = AutoPipes.pipeline('text_image_embedding')

# generate image embedding
embedding = image_embedding('./test1.png').get()

# batch generate image embeddings
embeddings = image_embedding.batch(['./test1.png', './test2.png'])
embeddings = [e.get() for e in embeddings]

The model in the pipeline can be set to the Models list above using the AutoConfig interface, refer to TextImageEmbeddingConfig Interface. And the modality configuration to this pipeline defaults to 'image'.

Generate Text Modality Embeddings

We can also set modality to 'text' to get text modality embedding with the default clip_vit_base_patch16 model.

from towhee import AutoPipes, AutoConfig

# set TextImageEmbeddingConfig for the pipeline
text_conf = AutoConfig.load_config('text_image_embedding')
text_conf.modality = 'text'

text_pipe = AutoPipes.pipeline('text_image_embedding', text_conf)

# generate image embedding
embedding = text_pipe('A running dog.').get()

# batch generate image embeddings
embeddings = text_pipe.batch(['A running dog.', 'Puppy Corgi.'])
embeddings = [e.get() for e in embeddings]

Insert Image into Milvus

We can use the built-in insert_milvus pipeline to insert the image modality embedding into the Milvus vector database, which needs to specify the name of the collection.

Before running the following code, please make sure you have created a collection, for example, named text_image_search, and the same dimensions(512) to the model, and the fields are id(auto_id), url(DataType.VARCHAR) and embedding(FLOAT_VECTOR).

from towhee import AutoPipes, AutoConfig

# set MilvusInsertConfig for the built-in insert_milvus pipeline
insert_conf = AutoConfig.load_config('insert_milvus')
insert_conf.collection_name = 'text_image_search'

insert_pipe = AutoPipes.pipeline('insert_milvus', insert_conf)

# generate embedding
embedding = image_embedding('./test1.png').get()[0]

# insert text and embedding into Milvus
insert_pipe(['./test1.png', embedding])

You can also set host and port parameters for Milvus, and if you are a Cloud user, there are also user and password parameters, refer to MilvusInsertConfig Interface.

Search Text in Milvus

After inserting image modality embeddings into Milvus, we can search the text and get the related image results with the built-in search_milvus pipeline, which needs to specify the name of the collection. And set search_params = {'output_fields': ['url']} to return the 'url' field.

Before searching in Milvus, you need to load the collection first.

from towhee import AutoPipes, AutoConfig

# set MilvusSearchConfig for the built-in search_milvus pipeline
search_conf = AutoConfig.load_config('search_milvus')
search_conf.collection_name = 'text_image_search'
search_conf.search_params = {'output_fields': ['url']}

search_pipe = AutoPipes.pipeline('search_milvus', search_conf)

# generate embedding
embedding = text_embedding('A running dog').get()[0]

# search embedding and get results in Milvus
search_pipe(embedding).get_dict()

You can also set host and port parameters for Milvus, and if you are a Cloud user, there are also user and password parameters, refer to MilvusSearchConfig Interface.

Interface

AutoPipes.pipeline(name, **kwargs)

name: str The name of the built-in pipeline, such as 'text_image_embedding', insert_milvus and 'search_milvus'.

config: REGISTERED_CONFIG

AutoConfig is registered with the pipeline name, which defaults to AutoConfig.load_config(name), such as if the name is text_image_embedding and config defaults to AutoConfig.load_config('text_image_embedding').

TextImageEmbeddingConfig

device

And you can also set the above parameters for the text image embedding, for example, you can set model to 'clip_vit_base_patch32' with AutoConfig, and set device to GPU0:

from towhee import AutoPipes, AutoConfig

config = AutoConfig.load_config('text_image_embedding')
config.model = 'clip_vit_base_patch32'
config.device = 0

image_embedding = AutoPipes.pipeline('text_image_embedding', config=config)
embedding = image_embedding('./test.png').get()

MilvusInsertConfig

The code AutoConfig.load_config('insert_milvus') will return an auto-set MilvusInsertConfig object that automatically configures some parameters of the insert Milvus pipeline:

MilvusSearchConfig

The code AutoConfig.load_config('search_milvus') will return an auto-set MilvusSearchConfig object that automatically configures some parameters of search Milvus pipeline: