The main objective of Text-Image Search is to find the relevant images by textual description. Both image and texual caption can be embedded into the same embedding space. Then their similiarty can be got from the distance of corresponding embedding vectors.
Image credit: Comment of a cat object
|
Model(s) |
coco_1k_r1 |
coco_1k_r5 |
coco_1k_r10 |
coco_5k_r1 |
coco_5k_r5 |
coco_5k_r10 |
dim |
eccv_map_at_r |
eccv_rprecision |
Model(s) from |
|---|
For each model of text-image search, we evaluate its performance on MS COCO dataset by using the method in ECCV Caption. Details can be refered in its publication and project.
We can use the built-in pipeline to generate text and image embeddings with different modality, insert image embeddings into the vector database, and search related images results in the vector database by text content. More details refer to Text Image Search Pipeline Example.
We can use the built-in text_image_embedding pipeline to get image modality embedding, which will use the clip_vit_base_patch16 model default to generate embedding for one image or batch-generate embeddings for multi-images.
from towhee import AutoPipes
# get the built-in text_image_embedding pipeline
image_embedding = AutoPipes.pipeline('text_image_embedding')
# generate image embedding
embedding = image_embedding('./test1.png').get()
# batch generate image embeddings
embeddings = image_embedding.batch(['./test1.png', './test2.png'])
embeddings = [e.get() for e in embeddings]
The model in the pipeline can be set to the Models list above using the AutoConfig interface, refer to TextImageEmbeddingConfig Interface. And the modality configuration to this pipeline defaults to 'image'.
We can also set modality to 'text' to get text modality embedding with the default clip_vit_base_patch16 model.
from towhee import AutoPipes, AutoConfig
# set TextImageEmbeddingConfig for the pipeline
text_conf = AutoConfig.load_config('text_image_embedding')
text_conf.modality = 'text'
text_pipe = AutoPipes.pipeline('text_image_embedding', text_conf)
# generate image embedding
embedding = text_pipe('A running dog.').get()
# batch generate image embeddings
embeddings = text_pipe.batch(['A running dog.', 'Puppy Corgi.'])
embeddings = [e.get() for e in embeddings]
We can use the built-in insert_milvus pipeline to insert the image modality embedding into the Milvus vector database, which needs to specify the name of the collection.
Before running the following code, please make sure you have created a collection, for example, named
text_image_search, and the same dimensions(512) to the model, and the fields are id(auto_id), url(DataType.VARCHAR) and embedding(FLOAT_VECTOR).
from towhee import AutoPipes, AutoConfig
# set MilvusInsertConfig for the built-in insert_milvus pipeline
insert_conf = AutoConfig.load_config('insert_milvus')
insert_conf.collection_name = 'text_image_search'
insert_pipe = AutoPipes.pipeline('insert_milvus', insert_conf)
# generate embedding
embedding = image_embedding('./test1.png').get()[0]
# insert text and embedding into Milvus
insert_pipe(['./test1.png', embedding])
You can also set host and port parameters for Milvus, and if you are a Cloud user, there are also user and password parameters, refer to MilvusInsertConfig Interface.
After inserting image modality embeddings into Milvus, we can search the text and get the related image results with the built-in search_milvus pipeline, which needs to specify the name of the collection. And set search_params = {'output_fields': ['url']} to return the 'url' field.
Before searching in Milvus, you need to load the collection first.
from towhee import AutoPipes, AutoConfig
# set MilvusSearchConfig for the built-in search_milvus pipeline
search_conf = AutoConfig.load_config('search_milvus')
search_conf.collection_name = 'text_image_search'
search_conf.search_params = {'output_fields': ['url']}
search_pipe = AutoPipes.pipeline('search_milvus', search_conf)
# generate embedding
embedding = text_embedding('A running dog').get()[0]
# search embedding and get results in Milvus
search_pipe(embedding).get_dict()
You can also set host and port parameters for Milvus, and if you are a Cloud user, there are also user and password parameters, refer to MilvusSearchConfig Interface.
name: str
The name of the built-in pipeline, such as 'text_image_embedding', insert_milvus and 'search_milvus'.
config: REGISTERED_CONFIG
AutoConfig is registered with the pipeline name, which defaults to AutoConfig.load_config(name), such as if the name is text_image_embedding and config defaults to AutoConfig.load_config('text_image_embedding').
model: str
The model name in the sentence embedding pipeline, defaults to 'clip_vit_base_patch16'. You can refer to the above Model(s) list to set the model.
modality
The modality for the text_image multimodal, defaults to 'image', and you can also set to 'text'.
normalize_vec: bool
Whether to normalize the embedding vectors, defaults to True.
customize_embedding_op: str
The name of the customize embedding operator, defaults to None.
device
device: int
The number of devices, defaults to -1, which means using the CPU.
If the setting is not -1, the specified GPU device will be used.
And you can also set the above parameters for the text image embedding, for example, you can set model to 'clip_vit_base_patch32' with AutoConfig, and set device to GPU0:
from towhee import AutoPipes, AutoConfig
config = AutoConfig.load_config('text_image_embedding')
config.model = 'clip_vit_base_patch32'
config.device = 0
image_embedding = AutoPipes.pipeline('text_image_embedding', config=config)
embedding = image_embedding('./test.png').get()
The code AutoConfig.load_config('insert_milvus') will return an auto-set MilvusInsertConfig object that automatically configures some parameters of the insert Milvus pipeline:
host: str
Host of Milvus vector database, default is '127.0.0.1'.
port: str
Port of Milvus vector database, default is '19530'.
collection_name: str
The collection name for Milvus vector database, is required when inserting data into Milvus.
user: str
The user name for Cloud user, defaults to None.
password: str
The user password for Cloud user, defaults to None.
The code AutoConfig.load_config('search_milvus') will return an auto-set MilvusSearchConfig object that automatically configures some parameters of search Milvus pipeline:
host: str
Host of Milvus vector database, default is '127.0.0.1'.
port: str
Port of Milvus vector database, default is '19530'.
collection_name: str
The collection name for Milvus vector database, is required when inserting data into Milvus.
search_param: dict
The search parameter for Milvus vector database, defaults to None, more details can refer to it.
user: str
The user name for Cloud user, defaults to None.
password: str
The user password for Cloud user, defaults to None.