towhee
/
eqa-insert
copied
shiyu22
2 years ago
1 changed files with 132 additions and 1 deletions
@ -1,2 +1,133 @@ |
|||
# eqa-insert |
|||
# Enhanced QA Insert |
|||
|
|||
## Description |
|||
|
|||
**Enhanced question-answering** is the process of creating the knowledge base and generating answers with LLMs(large language model), thus preventing illusions. It involves inserting data as knowledge base and querying questions, and **eqa-insert** is used to insert document data for knowledge base. |
|||
|
|||
<br /> |
|||
|
|||
|
|||
|
|||
## Code Example |
|||
|
|||
- Create Milvus Collection |
|||
|
|||
Before running the pipeline, please create Milvus collection first. |
|||
|
|||
> The `dim` is the dimensionality of the feature vector generated by the configured `model` in the `eqa-insert` pipeline. |
|||
|
|||
```python |
|||
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility |
|||
|
|||
collection_name = 'chatbot' |
|||
dim = 384 |
|||
|
|||
connections.connect(host='172.16.70.4', port='19530') |
|||
|
|||
fields = [ |
|||
FieldSchema(name='id', dtype=DataType.INT64, descrition='ids', is_primary=True, auto_id=True), |
|||
FieldSchema(name='text_id', dtype=DataType.VARCHAR, descrition='text', max_length=500), |
|||
FieldSchema(name='text', dtype=DataType.VARCHAR, descrition='text', max_length=1000), |
|||
FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, descrition='embedding vectors', dim=dim) |
|||
] |
|||
schema = CollectionSchema(fields=fields, description='enhanced qa') |
|||
collection = Collection(name=collection_name, schema=schema) |
|||
|
|||
index_params = { |
|||
'metric_type':"IP", |
|||
'index_type':"IVF_FLAT", |
|||
'params':{"nlist":2048} |
|||
} |
|||
collection.create_index(field_name="embedding", index_params=index_params) |
|||
``` |
|||
|
|||
- Create image embedding pipeline and set the configuration. |
|||
|
|||
> More parameters refer to the Configuration. |
|||
|
|||
```python |
|||
from towhee import AutoPipes, AutoConfig |
|||
|
|||
config = AutoConfig.load_config('eqa-insert') |
|||
config.model = 'all-MiniLM-L6-v2' |
|||
config.host = '127.0.0.1' |
|||
config.port = '19530' |
|||
config.collection_name = collection_name |
|||
|
|||
p = AutoPipes.pipeline('eqa-insert', config=config) |
|||
res = p('https://raw.githubusercontent.com/towhee-io/towhee/main/README.md') |
|||
``` |
|||
|
|||
Then you can run `collection.num_entities` to check the number of the data in Milvus as a knowledge base. |
|||
|
|||
<br /> |
|||
|
|||
|
|||
|
|||
|
|||
## Configuration |
|||
|
|||
**EnhancedQAInsertConfig** |
|||
|
|||
- Configuration for [Text Loader](https://towhee.io/towhee/text-loader) |
|||
|
|||
***chunk_size: int*** |
|||
The size of each chunk, defaults to 300. |
|||
|
|||
***source_type: str*** |
|||
The type of the soure, defaults to `'file'`, you can also set to `'url'` for you url of your documentation. |
|||
|
|||
- Configuration for Sentence Embedding |
|||
|
|||
***model: str*** |
|||
The model name in the sentence embedding pipeline, defaults to `'all-MiniLM-L6-v2'`. |
|||
You can refer to the above [Model(s) list ](https://towhee.io/tasks/detail/operator?field_name=Natural-Language-Processing&task_name=Sentence-Embedding)to set the model, some of these models are from [HuggingFace](https://huggingface.co/) (open source), and some are from [OpenAI](https://openai.com/) (not open, required API key). |
|||
|
|||
***openai_api_key: str*** |
|||
The api key of openai, default to `None`. |
|||
This key is required if the model is from OpenAI, you can check the model provider in the above [Model(s) list](https://towhee.io/sentence-embedding/openai). |
|||
|
|||
***customize_embedding_op: str*** |
|||
The name of the customize embedding operator, defaults to `None`. |
|||
|
|||
***normalize_vec: bool*** |
|||
Whether to normalize the embedding vectors, defaults to `True`. |
|||
|
|||
***device:*** ***int*** |
|||
The number of devices, defaults to `-1`, which means using the CPU. |
|||
If the setting is not `-1`, the specified GPU device will be used. |
|||
|
|||
- Configuration for [Milvus](https://towhee.io/ann-insert/milvus-client) |
|||
|
|||
***host: str*** |
|||
Host of Milvus vector database, default is `'127.0.0.1'`. |
|||
|
|||
***port: str*** |
|||
Port of Milvus vector database, default is `'19530'`. |
|||
|
|||
***collection_name: str*** |
|||
The collection name for Milvus vector database, is required when inserting data into Milvus. |
|||
|
|||
***user: str*** |
|||
The user name for [Cloud user](https://zilliz.com/cloud), defaults to `None`. |
|||
|
|||
***password: str*** |
|||
The user password for [Cloud user](https://zilliz.com/cloud), defaults to `None`. |
|||
|
|||
<br /> |
|||
|
|||
|
|||
|
|||
## Interface |
|||
|
|||
Insert documentation into Milvus as a knowledge base. |
|||
|
|||
**Parameters:** |
|||
|
|||
***doc***: str |
|||
|
|||
Path or url of the document to be loaded. |
|||
|
|||
**Returns:** MutationResult |
|||
|
|||
A MutationResult after inserting Milvus. |
|||
|
Loading…
Reference in new issue