Add README

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
2 years ago · 473d585a48
1 changed files with 132 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -1,2 +1,133 @@
 # eqa-insert
 # Enhanced QA Insert
 ## Description
 **Enhanced question-answering** is the process of creating the knowledge base and generating answers with LLMs(large language model), thus preventing illusions. It involves inserting data as knowledge base and querying questions, and **eqa-insert** is used to insert document data for knowledge base.
 <br />
 ## Code Example
 - Create Milvus Collection
 Before running the pipeline, please create Milvus collection first.
 > The `dim` is the dimensionality of the feature vector generated by the configured `model` in the `eqa-insert` pipeline.
 ```python
 from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility
 collection_name = 'chatbot'
 dim = 384
 connections.connect(host='172.16.70.4', port='19530')
 fields = [
   FieldSchema(name='id', dtype=DataType.INT64, descrition='ids', is_primary=True, auto_id=True),
   FieldSchema(name='text_id', dtype=DataType.VARCHAR, descrition='text', max_length=500),
   FieldSchema(name='text', dtype=DataType.VARCHAR, descrition='text', max_length=1000),
   FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, descrition='embedding vectors', dim=dim)
 ]
 schema = CollectionSchema(fields=fields, description='enhanced qa')
 collection = Collection(name=collection_name, schema=schema)
 index_params = {
    'metric_type':"IP",
    'index_type':"IVF_FLAT",
    'params':{"nlist":2048}
 }
 collection.create_index(field_name="embedding", index_params=index_params)
 ```
 - Create image embedding pipeline and set the configuration.
 > More parameters refer to the Configuration.
 ```python
 from towhee import AutoPipes, AutoConfig
 config = AutoConfig.load_config('eqa-insert')
 config.model = 'all-MiniLM-L6-v2'
 config.host = '127.0.0.1'
 config.port = '19530'
 config.collection_name = collection_name
 p = AutoPipes.pipeline('eqa-insert', config=config)
 res = p('https://raw.githubusercontent.com/towhee-io/towhee/main/README.md')
 ```
 Then you can run `collection.num_entities` to check the number of the data in Milvus as a knowledge base.
 <br />
 ## Configuration 
 **EnhancedQAInsertConfig**
 - Configuration for [Text Loader](https://towhee.io/towhee/text-loader)
 ***chunk_size: int***
 The size of each chunk, defaults to 300.
 ***source_type: str***
 The type of the soure, defaults to `'file'`, you can also set to `'url'` for you url of your documentation. 
 - Configuration for Sentence Embedding
 ***model: str***
 The model name in the sentence embedding pipeline, defaults to `'all-MiniLM-L6-v2'`.
 You can refer to the above [Model(s) list ](https://towhee.io/tasks/detail/operator?field_name=Natural-Language-Processing&task_name=Sentence-Embedding)to set the model, some of these models are from [HuggingFace](https://huggingface.co/) (open source), and some are from [OpenAI](https://openai.com/) (not open, required API key).
 ***openai_api_key: str***
 The api key of openai, default to `None`. 
 This key is required if  the model is from OpenAI, you can check the model provider in the above [Model(s) list](https://towhee.io/sentence-embedding/openai).
 ***customize_embedding_op: str***
 The name of the customize embedding operator, defaults to `None`.
 ***normalize_vec: bool***
 Whether to normalize the embedding vectors, defaults to `True`.
 ***device:*** ***int***
 The number of devices, defaults to `-1`, which means using the CPU. 
 If the setting is not `-1`, the specified GPU device will be used.
 - Configuration for [Milvus](https://towhee.io/ann-insert/milvus-client)
 ***host: str***
 Host of Milvus vector database, default is `'127.0.0.1'`.
 ***port: str***
 Port of Milvus vector database, default is `'19530'`. 
 ***collection_name: str***
 The collection name for Milvus vector database, is required when inserting data into Milvus.
 ***user: str***
 The user name for [Cloud user](https://zilliz.com/cloud), defaults to `None`.
 ***password: str***
 The user password for [Cloud user](https://zilliz.com/cloud), defaults to `None`.
 <br />
 ## Interface
 Insert documentation into Milvus as a knowledge base.
 **Parameters:**
 ***doc***: str
 Path or url of the document to be loaded.
 **Returns:** MutationResult
 A MutationResult after inserting Milvus.