diff --git a/README.md b/README.md index 88d0b48..675b016 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,108 @@ -# omnivore +# Video Classification with Omnivore +*Author: [Xinyu Ge](https://github.com/gexy185)* + +
+ +## Description + +A video classification operator generates labels (and corresponding scores) and extracts features for the input video. +It transforms the video into frames and loads pre-trained models by model names. +This operator has implemented pre-trained models from [Omnivore](https://arxiv.org/abs/2201.08377) +and maps vectors with labels provided by datasets used for pre-training. + +
+ +## Code Example + +Use the pretrained Omnivore model to classify and generate a vector for the given video path './archery.mp4' +([download](https://dl.fbaipublicfiles.com/pytorchvideo/projects/archery.mp4)). + + *Write the pipeline in simplified style*: + +- Predict labels (default): +```python +import towhee + +( + towhee.glob('./archery.mp4') + .video_decode.ffmpeg() + .action_classification.omnivore( + model_name='omnivore_swinT', topk=5) + .show() +) +``` + + +*Write a same pipeline with explicit inputs/outputs name specifications*: + +```python +import towhee + +( + towhee.glob['path']('./archery.mp4') + .video_decode.ffmpeg['path', 'frames']() + .action_classification.omnivore['frames', ('labels', 'scores', 'features')]( + model_name='omnivore_swinT') + .select['path', 'labels', 'scores', 'features']() + .show(formatter={'path': 'video_path'}) +) +``` + + + +
+ +## Factory Constructor + +Create the operator via the following factory method + +***video_classification.omnivore( +model_name='tsm_k400_r50_seg8', skip_preprocess=False, classmap=None, topk=5)*** + +**Parameters:** + +​ ***model_name***: *str* + +​ The name of pre-trained tsm model. + +​ Supported model names: +- omnivore_swinT +- omnivore_swinS +- omnivore_swinB +- omnivore_swinB_in21k +- omnivore_swinL_in21k +- omnivore_swinB_epic + +​ ***skip_preprocess***: *bool* + +​ Flag to control whether to skip video transforms, defaults to False. +If set to True, the step to transform videos will be skipped. +In this case, the user should guarantee that all the input video frames are already reprocessed properly, +and thus can be fed to model directly. + +​ ***classmap***: *Dict[str: int]*: + +​ Dictionary that maps class names to one hot vectors. +If not given, the operator will load the default class map dictionary. + +​ ***topk***: *int* + +​ The topk labels & scores to present in result. The default value is 5. + +## Interface + +A video classification operator generates a list of class labels +and a corresponding vector in numpy.ndarray given a video input data. + +**Parameters:** + +​ ***video***: *Union[str, numpy.ndarray]* + +​ Input video data using local path in string or video frames in ndarray. + + +**Returns**: *(list, list, torch.Tensor)* + +​ A tuple of (labels, scores, features), +which contains lists of predicted class names and corresponding scores. diff --git a/archery.mp4 b/archery.mp4 deleted file mode 100644 index 4a724d6..0000000 Binary files a/archery.mp4 and /dev/null differ diff --git a/omnivore.py b/omnivore.py index ebebc72..5487a9b 100644 --- a/omnivore.py +++ b/omnivore.py @@ -66,7 +66,7 @@ class Omnivore(NNOperator): self.input_mean=[0.485, 0.456, 0.406] self.input_std=[0.229, 0.224, 0.225] self.transform_cfgs = get_configs( - side_size=256, + side_size=224, crop_size=224, num_frames=24, mean=self.input_mean, diff --git a/result1.png b/result1.png new file mode 100644 index 0000000..bee5a5c Binary files /dev/null and b/result1.png differ diff --git a/result2.png b/result2.png new file mode 100644 index 0000000..f3689fa Binary files /dev/null and b/result2.png differ