Pipeline: Audio Embedding using VGGish

Authors: Jael Gu

Overview

This pipeline extracts features of a given audio file using a VGGish model implemented in Pytorch. This is a supervised model pre-trained with AudioSet, which contains over 2 million sound clips.

Interface

Input Arguments:

audio_path:
- the input audio in .wav
- supported types: str (path to the audio)
- the audio should be as least 1 second

Pipeline Output:

The Operator returns a list of named tuple [NamedTuple('AudioOutput', [('vec', 'ndarray')])] containing following fields:

each item in the output list represents for embedding(s) for an audio clip, which depends on time-window in yaml.
vec:
- embeddings of input audio
- data type: numpy.ndarray
- shape: (num_clips, 128)

How to use

Install Towhee

$ pip3 install towhee

You can refer to Getting Started with Towhee for more details. If you have any questions, you can submit an issue to the towhee repository.

Run it with Towhee

>>> from towhee import pipeline

>>> embedding_pipeline = pipeline('towhee/audio-embedding-vggish')
>>> embedding = embedding_pipeline('/path/to/your/audio')

How it works

This pipeline includes a main operator type audio-embedding (default: towhee/torch-vggish). The pipeline first decodes the input audio file into audio frames and then combine frames depending on time-window configs. The audio-embedding operator takes combined frames as input and generate corresponding audio embeddings.

Jael Gu 23d6530e0c Adapt new audio-decoder & vggish Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>			10 Commits
README.md	1.8 KiB	Update readme	3 years ago
audio_embedding_vggish.yaml	2.3 KiB	Adapt new audio-decoder & vggish	3 years ago