|  |  |  | # Video Classification with Omnivore
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | *Author: [Xinyu Ge](https://github.com/gexy185)* | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <br /> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Description
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A video classification operator generates labels (and corresponding scores) and extracts features for the input video. | 
					
						
							|  |  |  | It transforms the video into frames and loads pre-trained models by model names. | 
					
						
							|  |  |  | This operator has implemented pre-trained models from [Omnivore](https://arxiv.org/abs/2201.08377) | 
					
						
							|  |  |  | and maps vectors with labels provided by datasets used for pre-training. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <br /> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Code Example
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Use the pretrained Omnivore model to classify and generate a vector for the given video path './archery.mp4'  | 
					
						
							|  |  |  | ([download](https://dl.fbaipublicfiles.com/pytorchvideo/projects/archery.mp4)). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |  *Write the pipeline in simplified style*: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | - Predict labels (default): | 
					
						
							|  |  |  | ```python | 
					
						
							|  |  |  | import towhee | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ( | 
					
						
							|  |  |  |     towhee.glob('./archery.mp4')  | 
					
						
							|  |  |  |           .video_decode.ffmpeg() | 
					
						
							|  |  |  |           .action_classification.omnivore( | 
					
						
							|  |  |  |             model_name='omnivore_swinT', topk=5) | 
					
						
							|  |  |  |           .show() | 
					
						
							|  |  |  | ) | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | <img src="./result1.png" height="px"/> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | *Write a same pipeline with explicit inputs/outputs name specifications*: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```python | 
					
						
							|  |  |  | import towhee | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ( | 
					
						
							|  |  |  |     towhee.glob['path']('./archery.mp4') | 
					
						
							|  |  |  |           .video_decode.ffmpeg['path', 'frames']() | 
					
						
							|  |  |  |           .action_classification.omnivore['frames', ('labels', 'scores', 'features')]( | 
					
						
							|  |  |  |                 model_name='omnivore_swinT') | 
					
						
							|  |  |  |           .select['path', 'labels', 'scores', 'features']() | 
					
						
							|  |  |  |           .show(formatter={'path': 'video_path'}) | 
					
						
							|  |  |  | ) | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <img src="./result2.png" height="px"/> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <br /> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Factory Constructor
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Create the operator via the following factory method | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ***video_classification.omnivore( | 
					
						
							|  |  |  | model_name='omnivore_swinT', skip_preprocess=False, classmap=None, topk=5)*** | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | **Parameters:** | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	***model_name***: *str* | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	The name of pre-trained uniformer model. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     Supported model names: | 
					
						
							|  |  |  | - omnivore_swinT | 
					
						
							|  |  |  | - omnivore_swinS | 
					
						
							|  |  |  | - omnivore_swinB | 
					
						
							|  |  |  | - omnivore_swinB_in21k | 
					
						
							|  |  |  | - omnivore_swinL_in21k | 
					
						
							|  |  |  | - omnivore_swinB_epic | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	***skip_preprocess***: *bool* | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	Flag to control whether to skip video transforms, defaults to False. | 
					
						
							|  |  |  | If set to True, the step to transform videos will be skipped. | 
					
						
							|  |  |  | In this case, the user should guarantee that all the input video frames are already reprocessed properly, | 
					
						
							|  |  |  | and thus can be fed to model directly. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	***classmap***: *Dict[str: int]*:  | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	Dictionary that maps class names to one hot vectors. | 
					
						
							|  |  |  | If not given, the operator will load the default class map dictionary. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	***topk***: *int* | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	The topk labels & scores to present in result. The default value is 5. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Interface
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A video classification operator generates a list of class labels | 
					
						
							|  |  |  | and a corresponding vector in numpy.ndarray given a video input data. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | **Parameters:** | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	***video***: *Union[str, numpy.ndarray]* | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	Input video data using local path in string or video frames in ndarray. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | **Returns**: *(list, list, torch.Tensor)* | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	A tuple of (labels, scores, features), | 
					
						
							|  |  |  | which contains lists of predicted class names and corresponding scores. |