Pipeline: Audio Embedding using VGGish

Authors: Jael Gu

Overview

Recommend upgrading Towhee to >=1.1.1 and using https://towhee.io/towhee/audio-embedding

This pipeline extracts features of a given audio file using a VGGish model implemented in Pytorch. This is a supervised model pre-trained with AudioSet, which contains over 2 million sound clips.

Interface

Input Arguments:

audio_path:
- the input audio in .wav
- supported types: str (path to the audio)
- the audio should be as least 1 second

Pipeline Output:

The Operator returns a list of named tuple [NamedTuple('AudioOutput', [('vec', 'ndarray')])] containing following fields:

each item in the output list represents for embedding(s) for an audio clip, length & timestamps of which depend on time-window in yaml (You can modify time_range_sec & time_step_sec to change the way of audio split.)
vec:
- embeddings of input audio
- data type: numpy.ndarray
- shape: (num_clips, 128)

How to use

Install Towhee

$ pip3 install towhee

You can refer to Getting Started with Towhee for more details. If you have any questions, you can submit an issue to the towhee repository.

Run it with Towhee

>>> from towhee import pipeline

>>> embedding_pipeline = pipeline('towhee/audio-embedding-vggish')
>>> outs = embedding_pipeline('/path/to/your/audio')
>>> embeds = outs[0][0]

How it works

This pipeline includes two main operator types: audio-decode & audio-embedding. By default, the pipeline uses towhee/audio-decoder to load audio path as a list of audio frames in ndarray. Then time-window will combine audio frames into a list of ndarray, each of which represents an audio clip in fixed length. At the end, the towhee/torch-vggish) operator will generate a list of audio embeddings for each audio clip.

junjie.jiang e59dd955b0 Update readme Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>			13 Commits
README.md	2.2 KiB	Update readme	2 years ago
audio_embedding_vggish.py	285 B	add audio_embedding_vggish.py	3 years ago
audio_embedding_vggish.yaml	2.3 KiB	Adapt new audio-decoder & vggish	4 years ago