diff --git a/README.md b/README.md index 60fced5..bbc5efe 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,133 @@ -# eqa-insert +# Enhanced QA Insert +## Description + +**Enhanced question-answering** is the process of creating the knowledge base and generating answers with LLMs(large language model), thus preventing illusions. It involves inserting data as knowledge base and querying questions, and **eqa-insert** is used to insert document data for knowledge base. + +
+ + + +## Code Example + +- Create Milvus Collection + +Before running the pipeline, please create Milvus collection first. + +> The `dim` is the dimensionality of the feature vector generated by the configured `model` in the `eqa-insert` pipeline. + +```python +from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility + +collection_name = 'chatbot' +dim = 384 + +connections.connect(host='172.16.70.4', port='19530') + +fields = [ + FieldSchema(name='id', dtype=DataType.INT64, descrition='ids', is_primary=True, auto_id=True), + FieldSchema(name='text_id', dtype=DataType.VARCHAR, descrition='text', max_length=500), + FieldSchema(name='text', dtype=DataType.VARCHAR, descrition='text', max_length=1000), + FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, descrition='embedding vectors', dim=dim) +] +schema = CollectionSchema(fields=fields, description='enhanced qa') +collection = Collection(name=collection_name, schema=schema) + +index_params = { + 'metric_type':"IP", + 'index_type':"IVF_FLAT", + 'params':{"nlist":2048} +} +collection.create_index(field_name="embedding", index_params=index_params) +``` + +- Create image embedding pipeline and set the configuration. + +> More parameters refer to the Configuration. + +```python +from towhee import AutoPipes, AutoConfig + +config = AutoConfig.load_config('eqa-insert') +config.model = 'all-MiniLM-L6-v2' +config.host = '127.0.0.1' +config.port = '19530' +config.collection_name = collection_name + +p = AutoPipes.pipeline('eqa-insert', config=config) +res = p('https://raw.githubusercontent.com/towhee-io/towhee/main/README.md') +``` + +Then you can run `collection.num_entities` to check the number of the data in Milvus as a knowledge base. + +
+ + + + +## Configuration + +**EnhancedQAInsertConfig** + +- Configuration for [Text Loader](https://towhee.io/towhee/text-loader) + +***chunk_size: int*** +The size of each chunk, defaults to 300. + +***source_type: str*** +The type of the soure, defaults to `'file'`, you can also set to `'url'` for you url of your documentation. + +- Configuration for Sentence Embedding + +***model: str*** +The model name in the sentence embedding pipeline, defaults to `'all-MiniLM-L6-v2'`. +You can refer to the above [Model(s) list ](https://towhee.io/tasks/detail/operator?field_name=Natural-Language-Processing&task_name=Sentence-Embedding)to set the model, some of these models are from [HuggingFace](https://huggingface.co/) (open source), and some are from [OpenAI](https://openai.com/) (not open, required API key). + +***openai_api_key: str*** +The api key of openai, default to `None`. +This key is required if the model is from OpenAI, you can check the model provider in the above [Model(s) list](https://towhee.io/sentence-embedding/openai). + +***customize_embedding_op: str*** +The name of the customize embedding operator, defaults to `None`. + +***normalize_vec: bool*** +Whether to normalize the embedding vectors, defaults to `True`. + +***device:*** ***int*** +The number of devices, defaults to `-1`, which means using the CPU. +If the setting is not `-1`, the specified GPU device will be used. + +- Configuration for [Milvus](https://towhee.io/ann-insert/milvus-client) + +***host: str*** +Host of Milvus vector database, default is `'127.0.0.1'`. + +***port: str*** +Port of Milvus vector database, default is `'19530'`. + +***collection_name: str*** +The collection name for Milvus vector database, is required when inserting data into Milvus. + +***user: str*** +The user name for [Cloud user](https://zilliz.com/cloud), defaults to `None`. + +***password: str*** +The user password for [Cloud user](https://zilliz.com/cloud), defaults to `None`. + +
+ + + +## Interface + +Insert documentation into Milvus as a knowledge base. + +**Parameters:** + + ***doc***: str + +Path or url of the document to be loaded. + +**Returns:** MutationResult + +A MutationResult after inserting Milvus.