logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

Updated 4 months ago

towhee

Pipeline: Audio Embedding using VGGish

Authors: Jael Gu

Overview

Recommend upgrading Towhee to >=1.1.1 and using https://towhee.io/towhee/audio-embedding

This pipeline extracts features of a given audio file using a VGGish model implemented in Pytorch. This is a supervised model pre-trained with AudioSet, which contains over 2 million sound clips.

Interface

Input Arguments:

  • audio_path:
    • the input audio in .wav
    • supported types: str (path to the audio)
    • the audio should be as least 1 second

Pipeline Output:

The Operator returns a list of named tuple [NamedTuple('AudioOutput', [('vec', 'ndarray')])] containing following fields:

  • each item in the output list represents for embedding(s) for an audio clip, length & timestamps of which depend on time-window in yaml (You can modify time_range_sec & time_step_sec to change the way of audio split.)

  • vec:

    • embeddings of input audio
    • data type: numpy.ndarray
    • shape: (num_clips, 128)

How to use

  1. Install Towhee
$ pip3 install towhee

You can refer to Getting Started with Towhee for more details. If you have any questions, you can submit an issue to the towhee repository.

  1. Run it with Towhee
>>> from towhee import pipeline

>>> embedding_pipeline = pipeline('towhee/audio-embedding-vggish')
>>> outs = embedding_pipeline('/path/to/your/audio')
>>> embeds = outs[0][0]

How it works

This pipeline includes two main operator types: audio-decode & audio-embedding. By default, the pipeline uses towhee/audio-decoder to load audio path as a list of audio frames in ndarray. Then time-window will combine audio frames into a list of ndarray, each of which represents an audio clip in fixed length. At the end, the towhee/torch-vggish) operator will generate a list of audio embeddings for each audio clip.

# More Resources

- [Exploring Multimodal Embeddings with FiftyOne and Milvus - Zilliz blog](https://zilliz.com/blog/exploring-multimodal-embeddings-with-fiftyone-and-milvus): This post explored how multimodal embeddings work with Voxel51 and Milvus.
Jael Gu af48b23265 Add more resources 14 Commits
file-icon README.md
4.1 KiB
download-icon
Add more resources 4 months ago
file-icon audio_embedding_vggish.py
285 B
download-icon
add audio_embedding_vggish.py 2 years ago
file-icon audio_embedding_vggish.yaml
2.3 KiB
download-icon
Adapt new audio-decoder & vggish 3 years ago