panns
copied
4 changed files with 119 additions and 86 deletions
@ -1,76 +1,96 @@ |
|||
# Audio Classification with PANNS |
|||
|
|||
*Author: Jael Gu* |
|||
*Author: [Jael Gu](https://github.com/jaelgu)* |
|||
|
|||
<br /> |
|||
|
|||
## Desription |
|||
## Description |
|||
|
|||
The audio classification operator classify the given audio data with 527 labels from the large-scale [AudioSet dataset](https://research.google.com/audioset/). |
|||
The pre-trained model used here is from the paper **PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition** ([paper link](https://arxiv.org/abs/1912.10211)). |
|||
|
|||
```python |
|||
import numpy as np |
|||
from towhee import ops |
|||
<br /> |
|||
|
|||
## Code Example |
|||
|
|||
Predict labels and generate embeddings given the audio path "test.wav". |
|||
|
|||
*Write the pipeline in simplified style*: |
|||
|
|||
audio_classifier = ops.audio_classification.panns() |
|||
```python |
|||
import towhee |
|||
|
|||
( |
|||
towhee.glob('test.wav') |
|||
.audio_decode.ffmpeg() |
|||
.runas_op(func=lambda x:[y[0] for y in x]) |
|||
.audio_classification.panns() |
|||
.show() |
|||
) |
|||
``` |
|||
|
|||
# Path or url as input |
|||
tags, audio_embedding = audio_classifier("/audio/path/or/url/") |
|||
*Write a same pipeline with explicit inputs/outputs name specifications:* |
|||
|
|||
# Audio data as input |
|||
audio_data = np.zeros((2, 441344)) |
|||
sample_rate = 44100 |
|||
tags, audio_embedding = audio_classifier(audio_data, sample_rate) |
|||
```python |
|||
import towhee |
|||
|
|||
( |
|||
towhee.glob['path']('test.wav') |
|||
.audio_decode.ffmpeg['path', 'frames']() |
|||
.runas_op['frames', 'frames'](func=lambda x:[y[0] for y in x]) |
|||
.audio_classification.panns['frames', ('labels', 'scores', 'vec')]() |
|||
.show() |
|||
) |
|||
``` |
|||
|
|||
<br /> |
|||
|
|||
## Factory Constructor |
|||
|
|||
Create the operator via the following factory method |
|||
|
|||
***ops.audio_classification.panns()*** |
|||
|
|||
***audio_classification.panns(weights_path=None, framework='pytorch', |
|||
sample_rate=32000, topk=5)*** |
|||
|
|||
## Interface |
|||
**Parameters:** |
|||
|
|||
Given an audio (file path, link, or waveform), |
|||
the audio classification operator generates a list of labels |
|||
and a vector in numpy.ndarray. |
|||
*weights_path: str* |
|||
|
|||
The path to model weights. If None, it will load default model weights. |
|||
|
|||
**Parameters:** |
|||
*framework: str* |
|||
|
|||
None. |
|||
The framework of model implementation. |
|||
Default value is "pytorch" since the model is implemented in Pytorch. |
|||
|
|||
*sample_rate: int* |
|||
|
|||
**Returns**: *numpy.ndarray* |
|||
The target sample rate of audio data after convention, defaults to 32000. |
|||
|
|||
labels [(tag, score)], audio embedding in shape (2048,). |
|||
*topk: int* |
|||
|
|||
The number of labels & corresponding scores to be returned, sorting by possibility from high to low. |
|||
Default value is 5. |
|||
|
|||
<br/> |
|||
|
|||
## Code Example |
|||
## Interface |
|||
|
|||
Generate embeddings for the audio "test.wav". |
|||
An audio embedding operator generates vectors in numpy.ndarray given towhee audio frames. |
|||
|
|||
*Write the pipeline in simplified style*: |
|||
**Parameters:** |
|||
|
|||
```python |
|||
from towhee import dc |
|||
*data: List[towhee.types.audio_frame.AudioFrame]* |
|||
|
|||
dc.glob('test.wav') |
|||
.audio_classification.panns() |
|||
.show() |
|||
``` |
|||
Input audio data is a list of towhee audio frames. |
|||
The input data should represent for an audio longer than 2s. |
|||
|
|||
*Write a same pipeline with explicit inputs/outputs name specifications:* |
|||
|
|||
```python |
|||
from towhee import dc |
|||
**Returns**: |
|||
|
|||
dc.glob['path']('test.wav') |
|||
.audio_classification.panns['path', 'vecs']() |
|||
.select('vecs') |
|||
.show() |
|||
``` |
|||
*labels, scores, vec: Tuple(List[str], List(float), numpy.ndarray)* |
|||
|
|||
- labels: a list of topk predicted labels by model. |
|||
- scores: a list of scores corresponding to labels, representing for possibility. |
|||
- vec: a audio embedding generated by model, shape of which is (2048,) |
|||
|
|||
|
@ -1,4 +1,4 @@ |
|||
panns_inference |
|||
torchaudio |
|||
resampy |
|||
torch |
|||
towhee |
|||
towhee>=0.7.0 |
Loading…
Reference in new issue