clmr/README.md

# Audio Embedding with CLMR

*Author: [Jael Gu](https://github.com/jaelgu)*

<br />

## Description

The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics.
Each vector represents for an audio clip with a fixed length of around 2s.
This operator is built on top of the original implementation of [CLMR](https://github.com/Spijkervet/CLMR).
The [default model weight](clmr_checkpoint_10000.pt) provided is pretrained on [Magnatagatune Dataset](https://paperswithcode.com/dataset/magnatagatune) with [SampleCNN](sample_cnn.py).

<br />

## Code Example

Generate embeddings for the audio "test.wav".

*Write a pipeline with explicit inputs/outputs name specifications:*

```python
from towhee.dc2 import pipe, ops, DataCollection

p = (
    pipe.input('path')
        .map('path', 'frame', ops.audio_decode.ffmpeg())
        .map('frame', 'vecs', ops.audio_embedding.clmr())
        .output('path', 'vecs')
)

DataCollection(p('./test.wav')).show()
```

<img src="./result.png" width="800px"/>

<br />

## Factory Constructor

Create the operator via the following factory method

***audio_embedding.clmr(framework="pytorch")***

**Parameters:**

*framework: str*

The framework of model implementation.
Default value is "pytorch" since the model is implemented in Pytorch.

<br />

## Interface

An audio embedding operator generates vectors in numpy.ndarray given towhee audio frames.

**Parameters:**

*data: List[towhee.types.audio_frame.AudioFrame]*

Input audio data is a list of towhee audio frames.
The input data should represent for an audio longer than 3s.

**Returns**:

*numpy.ndarray*

Audio embeddings in shape (num_clips, 512).
Each embedding stands for features of an audio clip with length of 2.7s.
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`# Audio Embedding with CLMR`

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Author: [Jael Gu](https://github.com/jaelgu)`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`<br />`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`## Description`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
			`The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Each vector represents for an audio clip with a fixed length of around 2s.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`This operator is built on top of the original implementation of [CLMR](https://github.com/Spijkervet/CLMR).`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`The [default model weight](clmr_checkpoint_10000.pt) provided is pretrained on [Magnatagatune Dataset](https://paperswithcode.com/dataset/magnatagatune) with [SampleCNN](sample_cnn.py).`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`<br />`

Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`## Code Example`

Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Generate embeddings for the audio "test.wav".`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`Write a pipeline with explicit inputs/outputs name specifications:`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
			```python
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`from towhee.dc2 import pipe, ops, DataCollection`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`p = (`
			`pipe.input('path')`
			`.map('path', 'frame', ops.audio_decode.ffmpeg())`
			`.map('frame', 'vecs', ops.audio_embedding.clmr())`
			`.output('path', 'vecs')`
			`)`

			`DataCollection(p('./test.wav')).show()`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			```
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
			`<img src="./result.png" width="800px"/>`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`<br />`

Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`## Factory Constructor`

			`Create the operator via the following factory method`

			`*audio_embedding.clmr(framework="pytorch")*`

			`Parameters:`

Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`framework: str`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`The framework of model implementation.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Default value is "pytorch" since the model is implemented in Pytorch.`

Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`<br />`

Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`## Interface`

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`An audio embedding operator generates vectors in numpy.ndarray given towhee audio frames.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
			`Parameters:`

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`data: List[towhee.types.audio_frame.AudioFrame]`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Input audio data is a list of towhee audio frames.`
Replace resampy with torchaudio Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`The input data should represent for an audio longer than 3s.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
			`Returns:`

Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`numpy.ndarray`
Initial commit 3 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Audio embeddings in shape (num_clips, 512).`
Replace resampy with torchaudio Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Each embedding stands for features of an audio clip with length of 2.7s.`