logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

1.7 KiB

Pipeline: Audio Embedding using CLMR

Authors: Jael Gu

Overview

The pipeline uses a pre-trained CLMR model to extract embeddings of a given audio. It first transforms the input audio to a wave file with sample rate of 22050. Then the model splits the audio data into shorter clips with a fixed length. Finally it generates vectors of each clip, which composes the fingerprint of the input audio.

Interface

Input Arguments:

  • filepath:
    • the input audio in .wav (audio length > 3 seconds)
    • supported types: str (path to the audio)

Pipeline Output:

The Operator returns a tuple Tuple[('embs', numpy.ndarray)] containing following fields:

  • embs:
    • embeddings of input audio
    • data type: numpy.ndarray
    • shape: (num_clips,512)

How to use

  1. Install Towhee
$ pip3 install towhee

You can refer to Getting Started with Towhee for more details. If you have any questions, you can submit an issue to the towhee repository.

  1. Install ffmpeg
$ brew install ffmpeg # for Mac

OR

$ apt install ffmpeg # for Ubuntu
  1. Run it with Towhee
>>> from towhee import pipeline

>>> embedding_pipeline = pipeline('towhee/audio-embedding-clmr')
>>> embedding = embedding_pipeline('path/to/your/audio')

How it works

This pipeline includes a main operator: audio-embedding (implemented as towhee/clmr-magnatagatune). The audio embedding operator encodes audio file and finally output a set of vectors of the given audio.

1.7 KiB

Pipeline: Audio Embedding using CLMR

Authors: Jael Gu

Overview

The pipeline uses a pre-trained CLMR model to extract embeddings of a given audio. It first transforms the input audio to a wave file with sample rate of 22050. Then the model splits the audio data into shorter clips with a fixed length. Finally it generates vectors of each clip, which composes the fingerprint of the input audio.

Interface

Input Arguments:

  • filepath:
    • the input audio in .wav (audio length > 3 seconds)
    • supported types: str (path to the audio)

Pipeline Output:

The Operator returns a tuple Tuple[('embs', numpy.ndarray)] containing following fields:

  • embs:
    • embeddings of input audio
    • data type: numpy.ndarray
    • shape: (num_clips,512)

How to use

  1. Install Towhee
$ pip3 install towhee

You can refer to Getting Started with Towhee for more details. If you have any questions, you can submit an issue to the towhee repository.

  1. Install ffmpeg
$ brew install ffmpeg # for Mac

OR

$ apt install ffmpeg # for Ubuntu
  1. Run it with Towhee
>>> from towhee import pipeline

>>> embedding_pipeline = pipeline('towhee/audio-embedding-clmr')
>>> embedding = embedding_pipeline('path/to/your/audio')

How it works

This pipeline includes a main operator: audio-embedding (implemented as towhee/clmr-magnatagatune). The audio embedding operator encodes audio file and finally output a set of vectors of the given audio.