Video Alignment with Temporal Network

author: David Wang

Description

This operator can compare two ordered sequences, then detect the range which features from each sequence are computationally similar in order.

Code Example

from towhee import pipe, ops, DataCollection
import numpy as np

# simulate a video feature by 10 frames of 512d vectors.  
videos_embeddings = np.random.randn(10,512)
videos_embeddings = videos_embeddings / np.linalg.norm(videos_embeddings,axis=1).reshape(10,-1)

p = (
    pipe.input('src', 'dest') \
        .map(('src', 'dest'), ('range', 'range_score'), ops.video_copy_detection.temporal_network()) \
        .output('src', 'dest', 'range', 'range_score')
)

DataCollection(p(videos_embeddings, videos_embeddings)).show()

Factory Constructor

Create the operator via the following factory method

clip(model_name, modality) temporal_network(tn_max_step, tn_top_k, max_path, min_sim, min_length, max_iou)

Parameters:

tn_max_step: str

Max step range in TN.

tn_top_k: str

Top k frame similarity selection in TN.

max_path: str

Max loop for multiply segments detection.

min_sim: str

Min average similarity score for each aligned segment.

min_length: str

Min segment length.

max_iout: str

Max iou for filtering overlap segments (bbox).

Interface

A Temporal Network operator takes two numpy.ndarray(shape(N,D) N: number of features. D: dimension of features) and get the duplicated ranges and scores.

Parameters:

src_video_vec numpy.ndarray

Source video feature vectors.

dst_video_vec: numpy.ndarray

Destination video feature vectors.

Returns:

aligned_ranges: List[List[Int]]

The returned aligned range.

aligned_scores: List[float]

The returned similarity scores(length same as aligned_ranges).

More Resources

DNA Sequence Classification based on Milvus - Zilliz blog: Use Milvus, an open-source vector database, to recognize gene families of DNA sequences. Less space but higher accuracy.
Vector Database Use Cases: Video Similarity Search - Zilliz: Experience a 10x performance boost and unparalleled precision when your video similarity search system is powered by Zilliz Cloud.
How to Get the Right Vector Embeddings - Zilliz blog: A comprehensive introduction to vector embeddings and how to generate them with popular open-source models.
What is a Convolutional Neural Network? An Engineer's Guide: Convolutional Neural Network is a type of deep neural network that processes images, speeches, and videos. Let's find out more about CNN.
The guide to clip-vit-base-patch32 | OpenAI: clip-vit-base-patch32: a CLIP multimodal model variant by OpenAI for image and text embedding.
Understanding ImageNet: A Key Resource for Computer Vision and AI Research: The large-scale image database with over 14 million annotated images. Learn how this dataset supports advancements in computer vision.
Build a Multimodal Search System with Milvus - Zilliz blog: Implementing a Multimodal Similarity Search System Using Milvus, Radient, ImageBind, and Meta-Chameleon-7b
Unlock Advanced Recommendation Engines with Milvus' New Range Search - Zilliz blog: Exploring Milvusâs newly released range search feature, how it differs from the traditional KNN search, and when to use it.
Similarity Metrics for Vector Search - Zilliz blog: Exploring five similarity metrics for vector search: L2 or Euclidean distance, cosine distance, inner product, and hamming distance.

temporal-network

Jael Gu 736fc43de2 Add more resources Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>			10 Commits
.gitattributes	1.1 KiB	Initial commit	3 years ago
README.md	4.3 KiB	Add more resources	11 months ago
__init__.py	866 B	init the operator.	3 years ago
requirements.txt	17 B	init the operator.	3 years ago
tabular.png	41 KiB	update the readme.	3 years ago
tn.py	7.9 KiB	Use bool to replace np.bool.	2 years ago