add omnivore

Signed-off-by: gexy5 <xinyu.ge@zilliz.com>
3 years ago · 4a3f19cd49
5 changed files with 108 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -1,2 +1,108 @@
-# omnivore
+# Video Classification with Omnivore

+*Author: [Xinyu Ge](https://github.com/gexy185)*
+
+<br />
+
+## Description
+
+A video classification operator generates labels (and corresponding scores) and extracts features for the input video.
+It transforms the video into frames and loads pre-trained models by model names.
+This operator has implemented pre-trained models from [Omnivore](https://arxiv.org/abs/2201.08377)
+and maps vectors with labels provided by datasets used for pre-training.
+
+<br />
+
+## Code Example
+
+Use the pretrained Omnivore model to classify and generate a vector for the given video path './archery.mp4' 
+([download](https://dl.fbaipublicfiles.com/pytorchvideo/projects/archery.mp4)).
+
+ *Write the pipeline in simplified style*:
+
+- Predict labels (default):
+```python
+import towhee
+
+(
+    towhee.glob('./archery.mp4') 
+          .video_decode.ffmpeg()
+          .action_classification.omnivore(
+            model_name='omnivore_swinT', topk=5)
+          .show()
+)
+```
+<img src="./result1.png" height="px"/>
+
+*Write a same pipeline with explicit inputs/outputs name specifications*:
+
+```python
+import towhee
+
+(
+    towhee.glob['path']('./archery.mp4')
+          .video_decode.ffmpeg['path', 'frames']()
+          .action_classification.omnivore['frames', ('labels', 'scores', 'features')](
+                model_name='omnivore_swinT')
+          .select['path', 'labels', 'scores', 'features']()
+          .show(formatter={'path': 'video_path'})
+)
+```
+
+<img src="./result2.png" height="px"/>
+
+<br />
+
+## Factory Constructor
+
+Create the operator via the following factory method
+
+***video_classification.omnivore(
+model_name='tsm_k400_r50_seg8', skip_preprocess=False, classmap=None, topk=5)***
+
+**Parameters:**
+
+	***model_name***: *str*
+
+	The name of pre-trained tsm model.
+
+    Supported model names:
+- omnivore_swinT
+- omnivore_swinS
+- omnivore_swinB
+- omnivore_swinB_in21k
+- omnivore_swinL_in21k
+- omnivore_swinB_epic
+
+	***skip_preprocess***: *bool*
+
+	Flag to control whether to skip video transforms, defaults to False.
+If set to True, the step to transform videos will be skipped.
+In this case, the user should guarantee that all the input video frames are already reprocessed properly,
+and thus can be fed to model directly.
+
+	***classmap***: *Dict[str: int]*: 
+
+	Dictionary that maps class names to one hot vectors.
+If not given, the operator will load the default class map dictionary.
+
+	***topk***: *int*
+
+	The topk labels & scores to present in result. The default value is 5.
+
+## Interface
+
+A video classification operator generates a list of class labels
+and a corresponding vector in numpy.ndarray given a video input data.
+
+**Parameters:**
+
+	***video***: *Union[str, numpy.ndarray]*
+
+	Input video data using local path in string or video frames in ndarray.
+
+
+**Returns**: *(list, list, torch.Tensor)*
+
+	A tuple of (labels, scores, features),
+which contains lists of predicted class names and corresponding scores.
--- a/archery.mp4
+++ b/archery.mp4
--- a/omnivore.py
+++ b/omnivore.py
@ -66,7 +66,7 @@ class Omnivore(NNOperator):
        self.input_mean=[0.485, 0.456, 0.406]
        self.input_std=[0.229, 0.224, 0.225]
        self.transform_cfgs = get_configs(
-                side_size=256,
+                side_size=224,
                crop_size=224,
                num_frames=24,
                mean=self.input_mean,
--- a/result1.png
+++ b/result1.png
--- a/result2.png
+++ b/result2.png