This operator extracts features for video or text with [Frozen In Time](https://arxiv.org/abs/2104.00650) which can generate embeddings for text and video by jointly training a video encoder and text encoder to maximize the cosine similarity.
This operator extracts features for video or text with [Frozen In Time](https://arxiv.org/abs/2104.00650) which can generate embeddings for text and video by jointly training a video encoder and text encoder to maximize the cosine similarity.