clmr/README.md

# Audio Embedding with CLMR

*Author: [Jael Gu](https://github.com/jaelgu)*

<br />

## Description

The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics.
Each vector represents for an audio clip with a fixed length of around 2s.
This operator is built on top of the original implementation of [CLMR](https://github.com/Spijkervet/CLMR).
The [default model weight](clmr_checkpoint_10000.pt) provided is pretrained on [Magnatagatune Dataset](https://paperswithcode.com/dataset/magnatagatune) with [SampleCNN](sample_cnn.py).

<br />

## Code Example

Generate embeddings for the audio "test.wav".

*Write the pipeline in simplified style*:

```python
import towhee

(
    towhee.glob('test.wav')
          .audio_decode.ffmpeg()
          .runas_op(func=lambda x:[y[0] for y in x])
          .audio_embedding.clmr()
          .show()
)
```
    | [-2.1045141, 0.55381, 0.4537212, ...] shape=(6, 512) |

*Write a same pipeline with explicit inputs/outputs name specifications:*

```python
import towhee

(
    towhee.glob['path']('test.wav')
          .audio_decode.ffmpeg['path', 'frames']()
          .runas_op['frames', 'frames'](func=lambda x:[y[0] for y in x])
          .audio_embedding.clmr['frames', 'vecs']()
          .select['path', 'vecs']()
          .show()
)
```
    [array([[-2.1045141 ,  0.55381   ,  0.4537212 , ...,  0.18805158,
          0.3079657 , -1.216063  ],
        [-2.1045141 ,  0.55381036,  0.45372102, ...,  0.18805173,
          0.3079657 , -1.216063  ],
        [-2.0874703 ,  0.5511826 ,  0.46051833, ...,  0.18650496,
          0.33218473, -1.2182183 ],
        [-2.0874703 ,  0.55118287,  0.4605182 , ...,  0.18650502,
          0.3321851 , -1.2182183 ],
        [-2.0771544 ,  0.5641223 ,  0.43814823, ...,  0.18220925,
          0.33022994, -1.2070589 ],
        [-2.0771549 ,  0.5641221 ,  0.43814805, ...,  0.1822092 ,
          0.33022994, -1.2070588 ]], dtype=float32)]

<br />

## Factory Constructor

Create the operator via the following factory method

***audio_embedding.clmr(framework="pytorch")***

**Parameters:**

*framework: str*

The framework of model implementation.
Default value is "pytorch" since the model is implemented in Pytorch.

<br />

## Interface

An audio embedding operator generates vectors in numpy.ndarray given towhee audio frames.

**Parameters:**

*data: List[towhee.types.audio_frame.AudioFrame]*

Input audio data is a list of towhee audio frames.
The input data should represent for an audio longer than 3s.

**Returns**:

*numpy.ndarray*

Audio embeddings in shape (num_clips, 512).
Each embedding stands for features of an audio clip with length of 2.7s.
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`# Audio Embedding with CLMR`

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Author: [Jael Gu](https://github.com/jaelgu)`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`<br />`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`## Description`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
			`The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Each vector represents for an audio clip with a fixed length of around 2s.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`This operator is built on top of the original implementation of [CLMR](https://github.com/Spijkervet/CLMR).`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`The [default model weight](clmr_checkpoint_10000.pt) provided is pretrained on [Magnatagatune Dataset](https://paperswithcode.com/dataset/magnatagatune) with [SampleCNN](sample_cnn.py).`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`<br />`

Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`## Code Example`

Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Generate embeddings for the audio "test.wav".`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Write the pipeline in simplified style:`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
			```python
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`import towhee`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`(`
			`towhee.glob('test.wav')`
			`.audio_decode.ffmpeg()`
			`.runas_op(func=lambda x:[y[0] for y in x])`
			`.audio_embedding.clmr()`
			`.show()`
			`)`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			```
Add sample outputs Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`\| [-2.1045141, 0.55381, 0.4537212, ...] shape=(6, 512) \|`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
			`Write a same pipeline with explicit inputs/outputs name specifications:`

			```python
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`import towhee`

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`(`
			`towhee.glob['path']('test.wav')`
			`.audio_decode.ffmpeg['path', 'frames']()`
			`.runas_op['frames', 'frames'](func=lambda x:[y[0] for y in x])`
			`.audio_embedding.clmr['frames', 'vecs']()`
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`.select['path', 'vecs']()`
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`.show()`
			`)`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			```
Add sample outputs Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`[array([[-2.1045141 , 0.55381 , 0.4537212 , ..., 0.18805158,`
			`0.3079657 , -1.216063 ],`
			`[-2.1045141 , 0.55381036, 0.45372102, ..., 0.18805173,`
			`0.3079657 , -1.216063 ],`
			`[-2.0874703 , 0.5511826 , 0.46051833, ..., 0.18650496,`
			`0.33218473, -1.2182183 ],`
			`[-2.0874703 , 0.55118287, 0.4605182 , ..., 0.18650502,`
			`0.3321851 , -1.2182183 ],`
			`[-2.0771544 , 0.5641223 , 0.43814823, ..., 0.18220925,`
			`0.33022994, -1.2070589 ],`
			`[-2.0771549 , 0.5641221 , 0.43814805, ..., 0.1822092 ,`
			`0.33022994, -1.2070588 ]], dtype=float32)]`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`<br />`

Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`## Factory Constructor`

			`Create the operator via the following factory method`

			`*audio_embedding.clmr(framework="pytorch")*`

			`Parameters:`

Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`framework: str`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`The framework of model implementation.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Default value is "pytorch" since the model is implemented in Pytorch.`

Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`<br />`

Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`## Interface`

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`An audio embedding operator generates vectors in numpy.ndarray given towhee audio frames.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
			`Parameters:`

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`data: List[towhee.types.audio_frame.AudioFrame]`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Input audio data is a list of towhee audio frames.`
Replace resampy with torchaudio Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`The input data should represent for an audio longer than 3s.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
			`Returns:`

Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`numpy.ndarray`
Initial commit 3 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Audio embeddings in shape (num_clips, 512).`
Replace resampy with torchaudio Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Each embedding stands for features of an audio clip with length of 2.7s.`