|
|
|
# Video Classification with Omnivore
|
|
|
|
|
|
|
|
*Author: [Xinyu Ge](https://github.com/gexy185)*
|
|
|
|
|
|
|
|
<br />
|
|
|
|
|
|
|
|
## Description
|
|
|
|
|
|
|
|
A video classification operator generates labels (and corresponding scores) and extracts features for the input video.
|
|
|
|
It transforms the video into frames and loads pre-trained models by model names.
|
|
|
|
This operator has implemented pre-trained models from [Omnivore](https://arxiv.org/abs/2201.08377)
|
|
|
|
and maps vectors with labels provided by datasets used for pre-training.
|
|
|
|
|
|
|
|
<br />
|
|
|
|
|
|
|
|
## Code Example
|
|
|
|
|
|
|
|
Use the pretrained Omnivore model to classify and generate a vector for the given video path './archery.mp4'
|
|
|
|
([download](https://dl.fbaipublicfiles.com/pytorchvideo/projects/archery.mp4)).
|
|
|
|
|
|
|
|
*Write the pipeline in simplified style*:
|
|
|
|
|
|
|
|
- Predict labels (default):
|
|
|
|
```python
|
|
|
|
import towhee
|
|
|
|
|
|
|
|
(
|
|
|
|
towhee.glob('./archery.mp4')
|
|
|
|
.video_decode.ffmpeg()
|
|
|
|
.action_classification.omnivore(
|
|
|
|
model_name='omnivore_swinT', topk=5)
|
|
|
|
.show()
|
|
|
|
)
|
|
|
|
```
|
|
|
|
<img src="./result1.png" height="px"/>
|
|
|
|
|
|
|
|
*Write a same pipeline with explicit inputs/outputs name specifications*:
|
|
|
|
|
|
|
|
```python
|
|
|
|
import towhee
|
|
|
|
|
|
|
|
(
|
|
|
|
towhee.glob['path']('./archery.mp4')
|
|
|
|
.video_decode.ffmpeg['path', 'frames']()
|
|
|
|
.action_classification.omnivore['frames', ('labels', 'scores', 'features')](
|
|
|
|
model_name='omnivore_swinT')
|
|
|
|
.select['path', 'labels', 'scores', 'features']()
|
|
|
|
.show(formatter={'path': 'video_path'})
|
|
|
|
)
|
|
|
|
```
|
|
|
|
|
|
|
|
<img src="./result2.png" height="px"/>
|
|
|
|
|
|
|
|
<br />
|
|
|
|
|
|
|
|
## Factory Constructor
|
|
|
|
|
|
|
|
Create the operator via the following factory method
|
|
|
|
|
|
|
|
***video_classification.omnivore(
|
|
|
|
model_name='omnivore_swinT', skip_preprocess=False, classmap=None, topk=5)***
|
|
|
|
|
|
|
|
**Parameters:**
|
|
|
|
|
|
|
|
***model_name***: *str*
|
|
|
|
|
|
|
|
The name of pre-trained tsm model.
|
|
|
|
|
|
|
|
Supported model names:
|
|
|
|
- omnivore_swinT
|
|
|
|
- omnivore_swinS
|
|
|
|
- omnivore_swinB
|
|
|
|
- omnivore_swinB_in21k
|
|
|
|
- omnivore_swinL_in21k
|
|
|
|
- omnivore_swinB_epic
|
|
|
|
|
|
|
|
***skip_preprocess***: *bool*
|
|
|
|
|
|
|
|
Flag to control whether to skip video transforms, defaults to False.
|
|
|
|
If set to True, the step to transform videos will be skipped.
|
|
|
|
In this case, the user should guarantee that all the input video frames are already reprocessed properly,
|
|
|
|
and thus can be fed to model directly.
|
|
|
|
|
|
|
|
***classmap***: *Dict[str: int]*:
|
|
|
|
|
|
|
|
Dictionary that maps class names to one hot vectors.
|
|
|
|
If not given, the operator will load the default class map dictionary.
|
|
|
|
|
|
|
|
***topk***: *int*
|
|
|
|
|
|
|
|
The topk labels & scores to present in result. The default value is 5.
|
|
|
|
|
|
|
|
## Interface
|
|
|
|
|
|
|
|
A video classification operator generates a list of class labels
|
|
|
|
and a corresponding vector in numpy.ndarray given a video input data.
|
|
|
|
|
|
|
|
**Parameters:**
|
|
|
|
|
|
|
|
***video***: *Union[str, numpy.ndarray]*
|
|
|
|
|
|
|
|
Input video data using local path in string or video frames in ndarray.
|
|
|
|
|
|
|
|
|
|
|
|
**Returns**: *(list, list, torch.Tensor)*
|
|
|
|
|
|
|
|
A tuple of (labels, scores, features),
|
|
|
|
which contains lists of predicted class names and corresponding scores.
|