# Audio Embedding with Vggish *Author: Jael Gu* ## Desription The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics. This operator is built on top of the VGGish model with Pytorch. It is originally implemented in [Tensorflow](https://github.com/tensorflow/models/tree/master/research/audioset/vggish). The model is pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset). As suggested, it is suitable to extract features at high level or warm up a larger model. ```python import numpy as np from towhee import ops audio_encoder = ops.audio_embedding.vggish() # Path or url as input audio_embedding = audio_encoder("/audio/path/or/url/") # Audio data as input audio_data = np.zeros((2, 441344)) sample_rate = 44100 audio_embedding = audio_encoder(audio_data, sample_rate) ``` ## Factory Constructor Create the operator via the following factory method ***ops.audio_embedding.vggish()*** ## Interface An audio embedding operator generates vectors in numpy.ndarray given an audio file path. **Parameters:** ​ None. **Returns**: *numpy.ndarray* ​ Audio embeddings in shape (num_clips, 128). ## Code Example Generate embeddings for the audio "test.wav". *Write the pipeline in simplified style*: ```python from towhee import dc dc.glob('test.wav') .audio_embedding.vggish() .show() ``` *Write a same pipeline with explicit inputs/outputs name specifications:* ```python from towhee import dc dc.glob['path']('test.wav') .audio_embedding.vggish['path', 'vecs']() .select('vecs') .show() ```