panns/README.md

# Audio Classification with PANNS

*Author: [Jael Gu](https://github.com/jaelgu)*

<br />

## Description

The audio classification operator classify the given audio data with 527 labels from the large-scale [AudioSet dataset](https://research.google.com/audioset/).
The pre-trained model used here is from the paper **PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition** ([paper link](https://arxiv.org/abs/1912.10211)).

<br />

## Code Example

Predict labels and generate embeddings given the audio path "test.wav".

*Write a pipeline with explicit inputs/outputs name specifications:*

```python
from towhee import pipe, ops, DataCollection

p = (
    pipe.input('path')
        .map('path', 'frame', ops.audio_decode.ffmpeg())
        .map('frame', ('labels', 'scores', 'vec'), ops.audio_classification.panns())
        .output('path', 'labels', 'scores', 'vec')
)

DataCollection(p('./test.wav')).show()
```
<img src="./result.png" width="800px"/>

<br />

## Factory Constructor

Create the operator via the following factory method

***audio_classification.panns(weights_path=None, framework='pytorch',
sample_rate=32000, topk=5)***

**Parameters:**

*weights_path: str*

The path to model weights. If None, it will load default model weights.

*framework: str*

The framework of model implementation.
Default value is "pytorch" since the model is implemented in Pytorch.

*sample_rate: int*

The target sample rate of audio data after convention, defaults to 32000.

*topk: int*

The number of labels & corresponding scores to be returned, sorting by possibility from high to low.
Default value is 5.

<br/>

## Interface

An audio embedding operator generates vectors in numpy.ndarray given towhee audio frames.

**Parameters:**

*data: List[towhee.types.audio_frame.AudioFrame]*

Input audio data is a list of towhee audio frames.
The input data should represent for an audio longer than 2s.


**Returns**:

*labels, scores, vec: Tuple(List[str], List(float), numpy.ndarray)*

- labels: a list of topk predicted labels by model.
- scores: a list of scores corresponding to labels, representing for possibility.
- vec: a audio embedding generated by model, shape of which is (2048,)
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`# Audio Classification with PANNS`

Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`Author: [Jael Gu](https://github.com/jaelgu)`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`<br />`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`## Description`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
			`The audio classification operator classify the given audio data with 527 labels from the large-scale [AudioSet dataset](https://research.google.com/audioset/).`
			`The pre-trained model used here is from the paper PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition ([paper link](https://arxiv.org/abs/1912.10211)).`

Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`<br />`

			`## Code Example`

Add result Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`Predict labels and generate embeddings given the audio path "test.wav".`
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Write a pipeline with explicit inputs/outputs name specifications:`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			```python
Remove dc2 Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`from towhee import pipe, ops, DataCollection`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`p = (`
			`pipe.input('path')`
			`.map('path', 'frame', ops.audio_decode.ffmpeg())`
			`.map('frame', ('labels', 'scores', 'vec'), ops.audio_classification.panns())`
			`.output('path', 'labels', 'scores', 'vec')`
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`)`
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
			`DataCollection(p('./test.wav')).show()`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			```
Add result Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`<img src="./result.png" width="800px"/>`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`<br />`

Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`## Factory Constructor`

			`Create the operator via the following factory method`

Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`***audio_classification.panns(weights_path=None, framework='pytorch',`
			`sample_rate=32000, topk=5)***`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`Parameters:`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`weights_path: str`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`The path to model weights. If None, it will load default model weights.`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`framework: str`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`The framework of model implementation.`
			`Default value is "pytorch" since the model is implemented in Pytorch.`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`sample_rate: int`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`The target sample rate of audio data after convention, defaults to 32000.`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`topk: int`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`The number of labels & corresponding scores to be returned, sorting by possibility from high to low.`
			`Default value is 5.`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`<br/>`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`## Interface`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`An audio embedding operator generates vectors in numpy.ndarray given towhee audio frames.`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`Parameters:`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`data: List[towhee.types.audio_frame.AudioFrame]`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`Input audio data is a list of towhee audio frames.`
			`The input data should represent for an audio longer than 2s.`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago

Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`Returns:`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`labels, scores, vec: Tuple(List[str], List(float), numpy.ndarray)`
Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Refactor & update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`- labels: a list of topk predicted labels by model.`
			`- scores: a list of scores corresponding to labels, representing for possibility.`
			`- vec: a audio embedding generated by model, shape of which is (2048,)`