logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

76 lines
1.6 KiB

# Audio Embedding with CLMR
*Author: Jael Gu*
## Desription
The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics.
This operator is built on top of the original implementation of [CLMR](https://github.com/Spijkervet/CLMR).
The [default model weight](./checkpoints/clmr_checkpoint_10000.pt) provided is pretrained on [Magnatagatune Dataset](https://paperswithcode.com/dataset/magnatagatune) with [SampleCNN](./models/sample_cnn.py).
```python
import numpy as np
from towhee import ops
audio_encoder = ops.audio_embedding.clmr()
# Path or url as input
audio_embedding = audio_encoder("/audio/path/or/url/")
# Audio data as input
audio_data = np.zeros((2, 441344))
sample_rate = 44100
audio_embedding = audio_encoder(audio_data, sample_rate)
```
## Factory Constructor
Create the operator via the following factory method
***ops.audio_embedding.clmr()***
## Interface
An audio embedding operator generates vectors in numpy.ndarray given an audio file path or audio data in numpy.ndarray.
**Parameters:**
​ None.
**Returns**: *numpy.ndarray*
​ Audio embeddings in shape (num_clips, 512).
## Code Example
Generate embeddings for the audio "test.wav".
*Write the pipeline in simplified style*:
```python
from towhee import dc
dc.glob('test.wav')
.audio_embedding.clmr()
.show()
```
*Write a same pipeline with explicit inputs/outputs name specifications:*
```python
from towhee import dc
dc.glob['path']('test.wav')
.audio_embedding.clmr['path', 'vecs']()
.select('vecs')
.show()
```
3 years ago