Audio Classification with PANNS

Author: Jael Gu

Desription

The audio classification operator classify the given audio data with 527 labels from the large-scale AudioSet dataset. The pre-trained model used here is from the paper PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition (paper link).

import numpy as np
from towhee import ops

audio_classifier = ops.audio_classification.panns()

# Path or url as input
tags, audio_embedding = audio_classifier("/audio/path/or/url/")

# Audio data as input
audio_data = np.zeros((2, 441344))
sample_rate = 44100
tags, audio_embedding = audio_classifier(audio_data, sample_rate)

Factory Constructor

Create the operator via the following factory method

ops.audio_classification.panns()

Interface

Given an audio (file path, link, or waveform), the audio classification operator generates a list of labels and a vector in numpy.ndarray.

Parameters:

None.

Returns: numpy.ndarray

labels [(tag, score)], audio embedding in shape (2048,).

Code Example

Generate embeddings for the audio "test.wav".

Write the pipeline in simplified style:

from towhee import dc

dc.glob('test.wav')
  .audio_classification.panns()
  .show()

Write a same pipeline with explicit inputs/outputs name specifications:

from towhee import dc

dc.glob['path']('test.wav')
  .audio_classification.panns['path', 'vecs']()
  .select('vecs')
  .show()

Jael Gu 718af46e1f Add readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>			3 Commits
.gitattributes	1.1 KiB	Initial commit	3 years ago
README.md	1.5 KiB	Add readme	3 years ago
__init__.py	702 B	Add	3 years ago
panns.py	2.9 KiB	Add	3 years ago
requirements.txt	32 B	Add	3 years ago