logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

2.2 KiB

Pipeline: Audio Embedding using VGGish

Authors: Jael Gu

Overview

Recommend upgrading Towhee to >=1.1.1 and using https://towhee.io/towhee/audio-embedding

This pipeline extracts features of a given audio file using a VGGish model implemented in Pytorch. This is a supervised model pre-trained with AudioSet, which contains over 2 million sound clips.

Interface

Input Arguments:

  • audio_path:
    • the input audio in .wav
    • supported types: str (path to the audio)
    • the audio should be as least 1 second

Pipeline Output:

The Operator returns a list of named tuple [NamedTuple('AudioOutput', [('vec', 'ndarray')])] containing following fields:

  • each item in the output list represents for embedding(s) for an audio clip, length & timestamps of which depend on time-window in yaml (You can modify time_range_sec & time_step_sec to change the way of audio split.)

  • vec:

    • embeddings of input audio
    • data type: numpy.ndarray
    • shape: (num_clips, 128)

How to use

  1. Install Towhee
$ pip3 install towhee

You can refer to Getting Started with Towhee for more details. If you have any questions, you can submit an issue to the towhee repository.

  1. Run it with Towhee
>>> from towhee import pipeline

>>> embedding_pipeline = pipeline('towhee/audio-embedding-vggish')
>>> outs = embedding_pipeline('/path/to/your/audio')
>>> embeds = outs[0][0]

How it works

This pipeline includes two main operator types: audio-decode & audio-embedding. By default, the pipeline uses towhee/audio-decoder to load audio path as a list of audio frames in ndarray. Then time-window will combine audio frames into a list of ndarray, each of which represents an audio clip in fixed length. At the end, the towhee/torch-vggish) operator will generate a list of audio embeddings for each audio clip.

2.2 KiB

Pipeline: Audio Embedding using VGGish

Authors: Jael Gu

Overview

Recommend upgrading Towhee to >=1.1.1 and using https://towhee.io/towhee/audio-embedding

This pipeline extracts features of a given audio file using a VGGish model implemented in Pytorch. This is a supervised model pre-trained with AudioSet, which contains over 2 million sound clips.

Interface

Input Arguments:

  • audio_path:
    • the input audio in .wav
    • supported types: str (path to the audio)
    • the audio should be as least 1 second

Pipeline Output:

The Operator returns a list of named tuple [NamedTuple('AudioOutput', [('vec', 'ndarray')])] containing following fields:

  • each item in the output list represents for embedding(s) for an audio clip, length & timestamps of which depend on time-window in yaml (You can modify time_range_sec & time_step_sec to change the way of audio split.)

  • vec:

    • embeddings of input audio
    • data type: numpy.ndarray
    • shape: (num_clips, 128)

How to use

  1. Install Towhee
$ pip3 install towhee

You can refer to Getting Started with Towhee for more details. If you have any questions, you can submit an issue to the towhee repository.

  1. Run it with Towhee
>>> from towhee import pipeline

>>> embedding_pipeline = pipeline('towhee/audio-embedding-vggish')
>>> outs = embedding_pipeline('/path/to/your/audio')
>>> embeds = outs[0][0]

How it works

This pipeline includes two main operator types: audio-decode & audio-embedding. By default, the pipeline uses towhee/audio-decoder to load audio path as a list of audio frames in ndarray. Then time-window will combine audio frames into a list of ndarray, each of which represents an audio clip in fixed length. At the end, the towhee/torch-vggish) operator will generate a list of audio embeddings for each audio clip.