copied
Readme
Files and versions
4.3 KiB
Video Alignment with Temporal Network
author: David Wang
Description
This operator can compare two ordered sequences, then detect the range which features from each sequence are computationally similar in order.
Code Example
from towhee import pipe, ops, DataCollection
import numpy as np
# simulate a video feature by 10 frames of 512d vectors.
videos_embeddings = np.random.randn(10,512)
videos_embeddings = videos_embeddings / np.linalg.norm(videos_embeddings,axis=1).reshape(10,-1)
p = (
pipe.input('src', 'dest') \
.map(('src', 'dest'), ('range', 'range_score'), ops.video_copy_detection.temporal_network()) \
.output('src', 'dest', 'range', 'range_score')
)
DataCollection(p(videos_embeddings, videos_embeddings)).show()
Factory Constructor
Create the operator via the following factory method
clip(model_name, modality) temporal_network(tn_max_step, tn_top_k, max_path, min_sim, min_length, max_iou)
Parameters:
tn_max_step: str
Max step range in TN.
tn_top_k: str
Top k frame similarity selection in TN.
max_path: str
Max loop for multiply segments detection.
min_sim: str
Min average similarity score for each aligned segment.
min_length: str
Min segment length.
max_iout: str
Max iou for filtering overlap segments (bbox).
Interface
A Temporal Network operator takes two numpy.ndarray(shape(N,D) N: number of features. D: dimension of features) and get the duplicated ranges and scores.
Parameters:
src_video_vec numpy.ndarray
Source video feature vectors.
dst_video_vec: numpy.ndarray
Destination video feature vectors.
Returns:
aligned_ranges: List[List[Int]]
The returned aligned range.
aligned_scores: List[float]
The returned similarity scores(length same as aligned_ranges).
More Resources
- DNA Sequence Classification based on Milvus - Zilliz blog: Use Milvus, an open-source vector database, to recognize gene families of DNA sequences. Less space but higher accuracy.
- Vector Database Use Cases: Video Similarity Search - Zilliz: Experience a 10x performance boost and unparalleled precision when your video similarity search system is powered by Zilliz Cloud.
- How to Get the Right Vector Embeddings - Zilliz blog: A comprehensive introduction to vector embeddings and how to generate them with popular open-source models.
- What is a Convolutional Neural Network? An Engineer's Guide: Convolutional Neural Network is a type of deep neural network that processes images, speeches, and videos. Let's find out more about CNN.
- The guide to clip-vit-base-patch32 | OpenAI: clip-vit-base-patch32: a CLIP multimodal model variant by OpenAI for image and text embedding.
- Understanding ImageNet: A Key Resource for Computer Vision and AI Research: The large-scale image database with over 14 million annotated images. Learn how this dataset supports advancements in computer vision.
- Build a Multimodal Search System with Milvus - Zilliz blog: Implementing a Multimodal Similarity Search System Using Milvus, Radient, ImageBind, and Meta-Chameleon-7b
- Unlock Advanced Recommendation Engines with Milvus' New Range Search - Zilliz blog: Exploring Milvusâs newly released range search feature, how it differs from the traditional KNN search, and when to use it.
- Similarity Metrics for Vector Search - Zilliz blog: Exploring five similarity metrics for vector search: L2 or Euclidean distance, cosine distance, inner product, and hamming distance.
4.3 KiB
Video Alignment with Temporal Network
author: David Wang
Description
This operator can compare two ordered sequences, then detect the range which features from each sequence are computationally similar in order.
Code Example
from towhee import pipe, ops, DataCollection
import numpy as np
# simulate a video feature by 10 frames of 512d vectors.
videos_embeddings = np.random.randn(10,512)
videos_embeddings = videos_embeddings / np.linalg.norm(videos_embeddings,axis=1).reshape(10,-1)
p = (
pipe.input('src', 'dest') \
.map(('src', 'dest'), ('range', 'range_score'), ops.video_copy_detection.temporal_network()) \
.output('src', 'dest', 'range', 'range_score')
)
DataCollection(p(videos_embeddings, videos_embeddings)).show()
Factory Constructor
Create the operator via the following factory method
clip(model_name, modality) temporal_network(tn_max_step, tn_top_k, max_path, min_sim, min_length, max_iou)
Parameters:
tn_max_step: str
Max step range in TN.
tn_top_k: str
Top k frame similarity selection in TN.
max_path: str
Max loop for multiply segments detection.
min_sim: str
Min average similarity score for each aligned segment.
min_length: str
Min segment length.
max_iout: str
Max iou for filtering overlap segments (bbox).
Interface
A Temporal Network operator takes two numpy.ndarray(shape(N,D) N: number of features. D: dimension of features) and get the duplicated ranges and scores.
Parameters:
src_video_vec numpy.ndarray
Source video feature vectors.
dst_video_vec: numpy.ndarray
Destination video feature vectors.
Returns:
aligned_ranges: List[List[Int]]
The returned aligned range.
aligned_scores: List[float]
The returned similarity scores(length same as aligned_ranges).
More Resources
- DNA Sequence Classification based on Milvus - Zilliz blog: Use Milvus, an open-source vector database, to recognize gene families of DNA sequences. Less space but higher accuracy.
- Vector Database Use Cases: Video Similarity Search - Zilliz: Experience a 10x performance boost and unparalleled precision when your video similarity search system is powered by Zilliz Cloud.
- How to Get the Right Vector Embeddings - Zilliz blog: A comprehensive introduction to vector embeddings and how to generate them with popular open-source models.
- What is a Convolutional Neural Network? An Engineer's Guide: Convolutional Neural Network is a type of deep neural network that processes images, speeches, and videos. Let's find out more about CNN.
- The guide to clip-vit-base-patch32 | OpenAI: clip-vit-base-patch32: a CLIP multimodal model variant by OpenAI for image and text embedding.
- Understanding ImageNet: A Key Resource for Computer Vision and AI Research: The large-scale image database with over 14 million annotated images. Learn how this dataset supports advancements in computer vision.
- Build a Multimodal Search System with Milvus - Zilliz blog: Implementing a Multimodal Similarity Search System Using Milvus, Radient, ImageBind, and Meta-Chameleon-7b
- Unlock Advanced Recommendation Engines with Milvus' New Range Search - Zilliz blog: Exploring Milvusâs newly released range search feature, how it differs from the traditional KNN search, and when to use it.
- Similarity Metrics for Vector Search - Zilliz blog: Exploring five similarity metrics for vector search: L2 or Euclidean distance, cosine distance, inner product, and hamming distance.