From 0627c26aaf2058d9403f18e14ad750a8e9b5c630 Mon Sep 17 00:00:00 2001
From: Jael Gu <mengjia.gu@zilliz.com>
Date: Wed, 2 Mar 2022 21:48:44 +0800
Subject: [PATCH] update

Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>
---
 README.md | 43 ++++++++++++++++++++++++++++---------------
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/README.md b/README.md
index ec3a213..b574033 100644
--- a/README.md
+++ b/README.md
@@ -1,10 +1,18 @@
-# Operator: nlp-longformer
+# NLP embedding: Longformer Operator
 
-Author: Kyle He, Jael Gu
+Authors: Kyle He, Jael Gu
 
 ## Overview
+This operator uses Longformer to convert long text to embeddings.
 
+The Longformer model was presented in Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan[1].
 
+**Longformer** models were proposed in “[Longformer: The Long-Document Transformer][2].
+
+Transformer-based models are unable to process long sequences due to their self-attention
+operation, which scales quadratically with the sequence length. To address this limitation,
+we introduce the Longformer with an attention mechanism that scales linearly with sequence
+length, making it easy to process documents of thousands of tokens or longer[2].
 
 ## Interface
 
@@ -12,40 +20,45 @@ Author: Kyle He, Jael Gu
 __init__(self, model_name: str, framework: str = 'pytorch')
 ```
 
-Args:
+**Args:**
 
 - model_name:
   - the model name for embedding
-  - supported types: str, for example 'xxx' or 'xxx'
+  - supported types: `str`, for example 'allenai/longformer-base-4096' or 'allenai/longformer-large-4096'
 - framework:
   - the framework of the model
-  - supported types: str, default is 'pytorch'
+  - supported types: `str`, default is 'pytorch'
 
 ```python
-__call__(self, call_arg_1: xxx)
+__call__(self,  txt: str)
 ```
 
-Args:
+**Args:**
 
-- txt:
-  - input text in words, sentences, or paragraphs
+ txt:
+  - the input text content
   - supported types: str
 
-Returns:
 
-The Operator returns a tuple Tuple[('feature_vector', numpy.ndarray)] containing following fields:
+**Returns:**
+
+The Operator returns a tuple `Tuple[('feature_vector', numpy.ndarray)]` containing following fields:
 
 - feature_vector:
   - the embedding of the text
-  - data type: numpy.ndarray
-  - shape: (x, dim) where x is number of vectors and dim is dimension of vector depending on model_name
+  - data type: `numpy.ndarray`
+  - shape: (dim,)
 
 ## Requirements
 
-
+You can get the required python package by [requirements.txt](./requirements.txt).
 
 ## How it works
 
-
+The `towhee/nlp-longformer` Operator implements the conversion from text to embedding, which can add to the pipeline.
 
 ## Reference
+
+[1].https://huggingface.co/docs/transformers/v4.16.2/en/model_doc/longformer#transformers.LongformerConfig
+
+[2].https://arxiv.org/pdf/2004.05150.pdf