This operator extracts features for video or text with [DRL(Disentangled Representation Learning for Text-Video Retrieval)](https://arxiv.org/pdf/2203.07111v1.pdf), and then it can get the similarity by Weighted Token-wise Interaction (WTI) module.
<br/>

## Code Example
Read the text 'kids feeding and playing with the horse' to generate a text embedding.
Create the operator via the following factory method
***drl(base_encoder, modality)***
**Parameters:**
***base_encoder:*** *str*
The base CLIP encode name in DRL model. Supported model names:
- clip_vit_b32
***modality:*** *str*
Which modality(*video* or *text*) is used to generate the embedding.
<br/>
## Interface
An video-text embedding operator takes a list of [towhee VideoFrame](link/to/towhee/image/api/doc) or string as input and generate an embedding in ndarray.
**Parameters:**
***data:*** *List[towhee.types.VideoFrame]* or *str*
The data (list of VideoFrame(which is uniform subsampled from a video) or text based on specified modality) to generate embedding.
**Returns:** *numpy.ndarray*
The data embedding extracted by model. When text, the shape is (text_token_num, model_dim), when video, the shape is (video_token_num, model_dim)
- [Vector Database Use Cases: Video Similarity Search - Zilliz](https://zilliz.com/vector-database-use-cases/video-similarity-search): Experience a 10x performance boost and unparalleled precision when your video similarity search system is powered by Zilliz Cloud.
- [ColBERT: A Token-Level Embedding and Ranking Model - Zilliz blog](https://zilliz.com/learn/explore-colbert-token-level-embedding-and-ranking-model-for-similarity-search): Unlike traditional embedding models like BERT, which focus on pooling embeddings into a single vector, ColBERT retains individual token representations. Through its innovative late interaction mechanism, it enables more precise and granular similarity calculations.
- [The guide to mistral-embed | Mistral AI](https://zilliz.com/ai-models/mistral-embed): mistral-embed: a specialized embedding model for text data with a context window of 8,000 tokens. Optimized for similarity retrieval and RAG applications.
- [Supercharged Semantic Similarity Search in Production - Zilliz blog](https://zilliz.com/learn/supercharged-semantic-similarity-search-in-production): Building a Blazing Fast, Highly Scalable Text-to-Image Search with CLIP embeddings and Milvus, the most advanced open-source vector database.
- [The guide to all-MiniLM-L12-v2 | Hugging Face](https://zilliz.com/ai-models/all-MiniLM-L12-v2): all-MiniLM-L12-v2: a text embedding model ideal for semantic search and RAG and fine-tuned based on Microsoft/MiniLM-L12-H384-uncased
- [Build a Multimodal Search System with Milvus - Zilliz blog](https://zilliz.com/blog/how-vector-dbs-are-revolutionizing-unstructured-data-search-ai-applications): Implementing a Multimodal Similarity Search System Using Milvus, Radient, ImageBind, and Meta-Chameleon-7b
- [Sparse and Dense Embeddings: A Guide for Effective Information Retrieval with Milvus | Zilliz Webinar](https://zilliz.com/event/sparse-and-dense-embeddings-webinar): Zilliz webinar covering what sparse and dense embeddings are and when you'd want to use one over the other.
- [Sparse and Dense Embeddings: A Guide for Effective Information Retrieval with Milvus | Zilliz Webinar](https://zilliz.com/event/sparse-and-dense-embeddings-webinar/success): Zilliz webinar covering what sparse and dense embeddings are and when you'd want to use one over the other.
- [Training Text Embeddings with Jina AI - Zilliz blog](https://zilliz.com/blog/training-text-embeddings-with-jina-ai): In a recent talk by Bo Wang, he discussed the creation of Jina text embeddings for modern vector search and RAG systems. He also shared methodologies for training embedding models that effectively encode extensive information, along with guidance o