towhee
/
eqa-insert
copied
shiyu22
2 years ago
1 changed files with 132 additions and 1 deletions
@ -1,2 +1,133 @@ |
|||||
# eqa-insert |
|
||||
|
# Enhanced QA Insert |
||||
|
|
||||
|
## Description |
||||
|
|
||||
|
**Enhanced question-answering** is the process of creating the knowledge base and generating answers with LLMs(large language model), thus preventing illusions. It involves inserting data as knowledge base and querying questions, and **eqa-insert** is used to insert document data for knowledge base. |
||||
|
|
||||
|
<br /> |
||||
|
|
||||
|
|
||||
|
|
||||
|
## Code Example |
||||
|
|
||||
|
- Create Milvus Collection |
||||
|
|
||||
|
Before running the pipeline, please create Milvus collection first. |
||||
|
|
||||
|
> The `dim` is the dimensionality of the feature vector generated by the configured `model` in the `eqa-insert` pipeline. |
||||
|
|
||||
|
```python |
||||
|
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility |
||||
|
|
||||
|
collection_name = 'chatbot' |
||||
|
dim = 384 |
||||
|
|
||||
|
connections.connect(host='172.16.70.4', port='19530') |
||||
|
|
||||
|
fields = [ |
||||
|
FieldSchema(name='id', dtype=DataType.INT64, descrition='ids', is_primary=True, auto_id=True), |
||||
|
FieldSchema(name='text_id', dtype=DataType.VARCHAR, descrition='text', max_length=500), |
||||
|
FieldSchema(name='text', dtype=DataType.VARCHAR, descrition='text', max_length=1000), |
||||
|
FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, descrition='embedding vectors', dim=dim) |
||||
|
] |
||||
|
schema = CollectionSchema(fields=fields, description='enhanced qa') |
||||
|
collection = Collection(name=collection_name, schema=schema) |
||||
|
|
||||
|
index_params = { |
||||
|
'metric_type':"IP", |
||||
|
'index_type':"IVF_FLAT", |
||||
|
'params':{"nlist":2048} |
||||
|
} |
||||
|
collection.create_index(field_name="embedding", index_params=index_params) |
||||
|
``` |
||||
|
|
||||
|
- Create image embedding pipeline and set the configuration. |
||||
|
|
||||
|
> More parameters refer to the Configuration. |
||||
|
|
||||
|
```python |
||||
|
from towhee import AutoPipes, AutoConfig |
||||
|
|
||||
|
config = AutoConfig.load_config('eqa-insert') |
||||
|
config.model = 'all-MiniLM-L6-v2' |
||||
|
config.host = '127.0.0.1' |
||||
|
config.port = '19530' |
||||
|
config.collection_name = collection_name |
||||
|
|
||||
|
p = AutoPipes.pipeline('eqa-insert', config=config) |
||||
|
res = p('https://raw.githubusercontent.com/towhee-io/towhee/main/README.md') |
||||
|
``` |
||||
|
|
||||
|
Then you can run `collection.num_entities` to check the number of the data in Milvus as a knowledge base. |
||||
|
|
||||
|
<br /> |
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
## Configuration |
||||
|
|
||||
|
**EnhancedQAInsertConfig** |
||||
|
|
||||
|
- Configuration for [Text Loader](https://towhee.io/towhee/text-loader) |
||||
|
|
||||
|
***chunk_size: int*** |
||||
|
The size of each chunk, defaults to 300. |
||||
|
|
||||
|
***source_type: str*** |
||||
|
The type of the soure, defaults to `'file'`, you can also set to `'url'` for you url of your documentation. |
||||
|
|
||||
|
- Configuration for Sentence Embedding |
||||
|
|
||||
|
***model: str*** |
||||
|
The model name in the sentence embedding pipeline, defaults to `'all-MiniLM-L6-v2'`. |
||||
|
You can refer to the above [Model(s) list ](https://towhee.io/tasks/detail/operator?field_name=Natural-Language-Processing&task_name=Sentence-Embedding)to set the model, some of these models are from [HuggingFace](https://huggingface.co/) (open source), and some are from [OpenAI](https://openai.com/) (not open, required API key). |
||||
|
|
||||
|
***openai_api_key: str*** |
||||
|
The api key of openai, default to `None`. |
||||
|
This key is required if the model is from OpenAI, you can check the model provider in the above [Model(s) list](https://towhee.io/sentence-embedding/openai). |
||||
|
|
||||
|
***customize_embedding_op: str*** |
||||
|
The name of the customize embedding operator, defaults to `None`. |
||||
|
|
||||
|
***normalize_vec: bool*** |
||||
|
Whether to normalize the embedding vectors, defaults to `True`. |
||||
|
|
||||
|
***device:*** ***int*** |
||||
|
The number of devices, defaults to `-1`, which means using the CPU. |
||||
|
If the setting is not `-1`, the specified GPU device will be used. |
||||
|
|
||||
|
- Configuration for [Milvus](https://towhee.io/ann-insert/milvus-client) |
||||
|
|
||||
|
***host: str*** |
||||
|
Host of Milvus vector database, default is `'127.0.0.1'`. |
||||
|
|
||||
|
***port: str*** |
||||
|
Port of Milvus vector database, default is `'19530'`. |
||||
|
|
||||
|
***collection_name: str*** |
||||
|
The collection name for Milvus vector database, is required when inserting data into Milvus. |
||||
|
|
||||
|
***user: str*** |
||||
|
The user name for [Cloud user](https://zilliz.com/cloud), defaults to `None`. |
||||
|
|
||||
|
***password: str*** |
||||
|
The user password for [Cloud user](https://zilliz.com/cloud), defaults to `None`. |
||||
|
|
||||
|
<br /> |
||||
|
|
||||
|
|
||||
|
|
||||
|
## Interface |
||||
|
|
||||
|
Insert documentation into Milvus as a knowledge base. |
||||
|
|
||||
|
**Parameters:** |
||||
|
|
||||
|
***doc***: str |
||||
|
|
||||
|
Path or url of the document to be loaded. |
||||
|
|
||||
|
**Returns:** MutationResult |
||||
|
|
||||
|
A MutationResult after inserting Milvus. |
||||
|
Loading…
Reference in new issue