logo
Browse Source

update

Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>
main
Jael Gu 3 years ago
parent
commit
0627c26aaf
  1. 43
      README.md

43
README.md

@ -1,10 +1,18 @@
# Operator: nlp-longformer
# NLP embedding: Longformer Operator
Author: Kyle He, Jael Gu
Authors: Kyle He, Jael Gu
## Overview ## Overview
This operator uses Longformer to convert long text to embeddings.
The Longformer model was presented in Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan[1].
**Longformer** models were proposed in “[Longformer: The Long-Document Transformer][2].
Transformer-based models are unable to process long sequences due to their self-attention
operation, which scales quadratically with the sequence length. To address this limitation,
we introduce the Longformer with an attention mechanism that scales linearly with sequence
length, making it easy to process documents of thousands of tokens or longer[2].
## Interface ## Interface
@ -12,40 +20,45 @@ Author: Kyle He, Jael Gu
__init__(self, model_name: str, framework: str = 'pytorch') __init__(self, model_name: str, framework: str = 'pytorch')
``` ```
Args:
**Args:**
- model_name: - model_name:
- the model name for embedding - the model name for embedding
- supported types: str, for example 'xxx' or 'xxx'
- supported types: `str`, for example 'allenai/longformer-base-4096' or 'allenai/longformer-large-4096'
- framework: - framework:
- the framework of the model - the framework of the model
- supported types: str, default is 'pytorch'
- supported types: `str`, default is 'pytorch'
```python ```python
__call__(self, call_arg_1: xxx)
__call__(self, txt: str)
``` ```
Args:
**Args:**
- txt:
- input text in words, sentences, or paragraphs
txt:
- the input text content
- supported types: str - supported types: str
Returns:
The Operator returns a tuple Tuple[('feature_vector', numpy.ndarray)] containing following fields:
**Returns:**
The Operator returns a tuple `Tuple[('feature_vector', numpy.ndarray)]` containing following fields:
- feature_vector: - feature_vector:
- the embedding of the text - the embedding of the text
- data type: numpy.ndarray
- shape: (x, dim) where x is number of vectors and dim is dimension of vector depending on model_name
- data type: `numpy.ndarray`
- shape: (dim,)
## Requirements ## Requirements
You can get the required python package by [requirements.txt](./requirements.txt).
## How it works ## How it works
The `towhee/nlp-longformer` Operator implements the conversion from text to embedding, which can add to the pipeline.
## Reference ## Reference
[1].https://huggingface.co/docs/transformers/v4.16.2/en/model_doc/longformer#transformers.LongformerConfig
[2].https://arxiv.org/pdf/2004.05150.pdf

Loading…
Cancel
Save