logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

99 lines
4.8 KiB

# Audio Embedding with data2vec
*author: David Wang*
<br />
## Description
This operator extracts features for audio with [data2vec](https://arxiv.org/abs/2202.03555). The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.
<br />
## Code Example
Generate embeddings for the audio "test.wav".
*Write a pipeline with explicit inputs/outputs name specifications:*
```python
from towhee import pipe, ops, DataCollection
p = (
pipe.input('path')
.map('path', 'frame', ops.audio_decode.ffmpeg(sample_rate=16000))
.map('frame', 'vecs', ops.audio_embedding.data2vec(model_name='facebook/data2vec-audio-base-960h'))
.output('path', 'vecs')
)
DataCollection(p('test.wav')).show()
```
<img src="./result.png" width="800px"/>
<br />
## Factory Constructor
Create the operator via the following factory method
***data2vec(model_name='facebook/data2vec-audio-base')***
**Parameters:**
***model_name***: *str*
The model name in string.
The default value is "facebook/data2vec-audio-base-960h".
Supported model name:
-
- facebook/data2vec-audio-base-960h
- facebook/data2vec-audio-large-960h
- facebook/data2vec-audio-base
- facebook/data2vec-audio-base-100h
- facebook/data2vec-audio-base-10m
- facebook/data2vec-audio-large
- facebook/data2vec-audio-large-100h
- facebook/data2vec-audio-large-10m
<br />
## Interface
An audio embedding operator generates vectors in numpy.ndarray given an audio file path or towhee audio frames.
**Parameters:**
***data:*** *List[towhee.types.audio_frame.AudioFrame]*
​ Input audio data is a list of towhee audio frames. The input data should represent for an audio longer than 0.9s.
**Returns:** *numpy.ndarray*
​ The audio embedding extracted by model.
# More Resources
- [What is a Transformer Model? An Engineer's Guide](https://zilliz.com/glossary/transformer-models): A transformer model is a neural network architecture. It's proficient in converting a particular type of input into a distinct output. Its core strength lies in its ability to handle inputs and outputs of different sequence length. It does this through encoding the input into a matrix with predefined dimensions and then combining that with another attention matrix to decode. This transformation unfolds through a sequence of collaborative layers, which deconstruct words into their corresponding numerical representations.
At its heart, a transformer model is a bridge between disparate linguistic structures, employing sophisticated neural network configurations to decode and manipulate human language input. An example of a transformer model is GPT-3, which ingests human language and generates text output.
- [Exploring Multimodal Embeddings with FiftyOne and Milvus - Zilliz blog](https://zilliz.com/blog/exploring-multimodal-embeddings-with-fiftyone-and-milvus): This post explored how multimodal embeddings work with Voxel51 and Milvus.
- [How to Get the Right Vector Embeddings - Zilliz blog](https://zilliz.com/blog/how-to-get-the-right-vector-embeddings): A comprehensive introduction to vector embeddings and how to generate them with popular open-source models.
- [What is BERT (Bidirectional Encoder Representations from Transformers)? - Zilliz blog](https://zilliz.com/learn/what-is-bert): Learn what Bidirectional Encoder Representations from Transformers (BERT) is and how it uses pre-training and fine-tuning to achieve its remarkable performance.
- [Transforming Text: The Rise of Sentence Transformers in NLP - Zilliz blog](https://zilliz.com/learn/transforming-text-the-rise-of-sentence-transformers-in-nlp): Everything you need to know about the Transformers model, exploring its architecture, implementation, and limitations
- [What Are Vector Embeddings?](https://zilliz.com/glossary/vector-embeddings): Learn the definition of vector embeddings, how to create vector embeddings, and more.
- [Vector Database Use Case: Audio Similarity Search - Zilliz](https://zilliz.com/vector-database-use-cases/audio-similarity-search): Building agile and reliable audio similarity search with Zilliz vector database (fully managed Milvus).
- [Enhancing Information Retrieval with Sparse Embeddings | Zilliz Learn - Zilliz blog](https://zilliz.com/learn/enhancing-information-retrieval-learned-sparse-embeddings): Explore the inner workings, advantages, and practical applications of learned sparse embeddings with the Milvus vector database
- [An Introduction to Vector Embeddings: What They Are and How to Use Them - Zilliz blog](https://zilliz.com/learn/everything-you-should-know-about-vector-embeddings): In this blog post, we will understand the concept of vector embeddings and explore its applications, best practices, and tools for working with embeddings.