# Audio Embedding with data2vec
 
							 
						 
					
						
							
								
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								*author: David Wang*
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								< br  / > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								## Description
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								This operator extracts features for audio with [data2vec ](https://arxiv.org/abs/2202.03555 ). The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								< br  / > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								## Code Example
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								Generate embeddings for the audio "test.wav".
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								 *Write the pipeline in simplified style* :
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								import towhee
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								(
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								    towhee.glob('test.wav')
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								          .audio_decode.ffmpeg()
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								          .runas_op(func=lambda x:[y[0] for y in x])
							 
						 
					
						
							
								
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								          .audio_embedding.data2vec()
							 
						 
					
						
							
								
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								          .show()
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								)
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								*Write a same pipeline with explicit inputs/outputs name specifications:*
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								import towhee
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								(
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								    towhee.glob['path']('test.wav')
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								          .audio_decode.ffmpeg['path', 'frames']()
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								          .runas_op['frames', 'frames'](func=lambda x:[y[0] for y in x])
							 
						 
					
						
							
								
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								          .audio_embedding.data2vec['frames', 'vecs'](model_name="facebook/data2vec-audio-base-960h")
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								          .select['path', 'vecs']()
							 
						 
					
						
							
								
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								          .show()
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								)
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								< br  / > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								## Factory Constructor
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								Create the operator via the following factory method
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								***data2vec(model_name='facebook/data2vec-audio-base')***
							 
						 
					
						
							
								
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								**Parameters:**
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								  ** *model_name***: *str* 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								The model name in string.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								The default value is "facebook/data2vec-audio-base-960h".
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								Supported model name:
							 
						 
					
						
							
								
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								-
							 
						 
					
						
							
								
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								-  facebook/data2vec-audio-base-960h
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								-  facebook/data2vec-audio-large-960h
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								-  facebook/data2vec-audio-base
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								-  facebook/data2vec-audio-base-100h
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								-  facebook/data2vec-audio-base-10m
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								-  facebook/data2vec-audio-large
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								-  facebook/data2vec-audio-large-100h
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								-  facebook/data2vec-audio-large-10m
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								< br  / > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								## Interface
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								An audio embedding operator generates vectors in numpy.ndarray given an audio file path or towhee audio frames.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								**Parameters:**
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
									** *data:*** *List[towhee.types.audio_frame.AudioFrame]* 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
									Input audio data is a list of towhee audio frames. The input data should represent for an audio longer than 0.9s.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								**Returns:** *numpy.ndarray* 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								   The audio embedding extracted by model.