|
|
|
# Video Classification with TSM
|
|
|
|
|
|
|
|
*Author: [Xinyu Ge](https://github.com/gexy185)*
|
|
|
|
|
|
|
|
<br />
|
|
|
|
|
|
|
|
## Description
|
|
|
|
|
|
|
|
A video classification operator generates labels (and corresponding scores) and extracts features for the input video.
|
|
|
|
It transforms the video into frames and loads pre-trained models by model names.
|
|
|
|
This operator has implemented pre-trained models from [TSM](https://arxiv.org/abs/1811.08383)
|
|
|
|
and maps vectors with labels provided by datasets used for pre-training.
|
|
|
|
|
|
|
|
<br />
|
|
|
|
|
|
|
|
## Code Example
|
|
|
|
|
|
|
|
Use the pretrained TSM model to classify and generate a vector for the given video path './archery.mp4'
|
|
|
|
([download](https://dl.fbaipublicfiles.com/pytorchvideo/projects/archery.mp4)).
|
|
|
|
|
|
|
|
*Write a pipeline with explicit inputs/outputs name specifications*:
|
|
|
|
|
|
|
|
```python
|
|
|
|
from towhee import pipe, ops, DataCollection
|
|
|
|
|
|
|
|
p = (
|
|
|
|
pipe.input('path')
|
|
|
|
.map('path', 'frames', ops.video_decode.ffmpeg())
|
|
|
|
.map('frames', ('labels', 'scores', 'features'),
|
|
|
|
ops.action_classification.tsm(model_name='tsm_k400_r50_seg8'))
|
|
|
|
.output('path', 'labels', 'scores', 'features')
|
|
|
|
)
|
|
|
|
|
|
|
|
DataCollection(p('./archery.mp4')).show()
|
|
|
|
```
|
|
|
|
|
|
|
|
<img src="./result.png" height="px"/>
|
|
|
|
|
|
|
|
<br />
|
|
|
|
|
|
|
|
## Factory Constructor
|
|
|
|
|
|
|
|
Create the operator via the following factory method
|
|
|
|
|
|
|
|
***action_classification.tsm(
|
|
|
|
model_name='tsm_k400_r50_seg8', skip_preprocess=False, classmap=None, topk=5)***
|
|
|
|
|
|
|
|
**Parameters:**
|
|
|
|
|
|
|
|
***model_name***: *str*
|
|
|
|
|
|
|
|
The name of pre-trained tsm model.
|
|
|
|
|
|
|
|
Supported model names:
|
|
|
|
- tsm_k400_r50_seg8
|
|
|
|
- tsm_k400_r50_seg16
|
|
|
|
|
|
|
|
***skip_preprocess***: *bool*
|
|
|
|
|
|
|
|
Flag to control whether to skip video transforms, defaults to False.
|
|
|
|
If set to True, the step to transform videos will be skipped.
|
|
|
|
In this case, the user should guarantee that all the input video frames are already reprocessed properly,
|
|
|
|
and thus can be fed to model directly.
|
|
|
|
|
|
|
|
***classmap***: *Dict[str: int]*:
|
|
|
|
|
|
|
|
Dictionary that maps class names to one hot vectors.
|
|
|
|
If not given, the operator will load the default class map dictionary.
|
|
|
|
|
|
|
|
***topk***: *int*
|
|
|
|
|
|
|
|
The topk labels & scores to present in result. The default value is 5.
|
|
|
|
|
|
|
|
## Interface
|
|
|
|
|
|
|
|
A video classification operator generates a list of class labels
|
|
|
|
and a corresponding vector in numpy.ndarray given a video input data.
|
|
|
|
|
|
|
|
**Parameters:**
|
|
|
|
|
|
|
|
***video***: *Union[str, numpy.ndarray]*
|
|
|
|
|
|
|
|
Input video data using local path in string or video frames in ndarray.
|
|
|
|
|
|
|
|
|
|
|
|
**Returns**: *(list, list, torch.Tensor)*
|
|
|
|
|
|
|
|
A tuple of (labels, scores, features),
|
|
|
|
which contains lists of predicted class names and corresponding scores.
|