logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

4.1 KiB

Enhanced QA Insert

Description

Enhanced question-answering is the process of creating the knowledge base and generating answers with LLMs(large language model), thus preventing illusions. It involves inserting data as knowledge base and querying questions, and eqa-insert is used to insert document data for knowledge base.


Code Example

  • Create Milvus Collection

Before running the pipeline, please create Milvus collection first.

The dim is the dimensionality of the feature vector generated by the configured model in the eqa-insert pipeline.

from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility

collection_name = 'chatbot'
dim = 384

connections.connect(host='172.16.70.4', port='19530')

fields = [
   FieldSchema(name='id', dtype=DataType.INT64, descrition='ids', is_primary=True, auto_id=True),
   FieldSchema(name='text_id', dtype=DataType.VARCHAR, descrition='text', max_length=500),
   FieldSchema(name='text', dtype=DataType.VARCHAR, descrition='text', max_length=1000),
   FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, descrition='embedding vectors', dim=dim)
]
schema = CollectionSchema(fields=fields, description='enhanced qa')
collection = Collection(name=collection_name, schema=schema)

index_params = {
    'metric_type':"IP",
    'index_type':"IVF_FLAT",
    'params':{"nlist":2048}
}
collection.create_index(field_name="embedding", index_params=index_params)
  • Create image embedding pipeline and set the configuration

More parameters refer to the Configuration.

from towhee import AutoPipes, AutoConfig

config = AutoConfig.load_config('eqa-insert')
config.model = 'all-MiniLM-L6-v2'
config.host = '127.0.0.1'
config.port = '19530'
config.collection_name = collection_name

p = AutoPipes.pipeline('eqa-insert', config=config)
res = p('https://raw.githubusercontent.com/towhee-io/towhee/main/README.md')

Then you can run collection.num_entities to check the number of the data in Milvus as a knowledge base.


Configuration

EnhancedQAInsertConfig

Configuration for Text Loader

chunk_size: int The size of each chunk, defaults to 300.

source_type: str The type of the soure, defaults to 'file', you can also set to 'url' for you url of your documentation.

Configuration for Sentence Embedding

model: str The model name in the sentence embedding pipeline, defaults to 'all-MiniLM-L6-v2'. You can refer to the above Model(s) list to set the model, some of these models are from HuggingFace (open source), and some are from OpenAI (not open, required API key).

openai_api_key: str The api key of openai, default to None. This key is required if the model is from OpenAI, you can check the model provider in the above Model(s) list.

customize_embedding_op: str The name of the customize embedding operator, defaults to None.

normalize_vec: bool Whether to normalize the embedding vectors, defaults to True.

device: int The number of devices, defaults to -1, which means using the CPU. If the setting is not -1, the specified GPU device will be used.

Configuration for Milvus

host: str Host of Milvus vector database, default is '127.0.0.1'.

port: str Port of Milvus vector database, default is '19530'.

collection_name: str The collection name for Milvus vector database, is required when inserting data into Milvus.

user: str The user name for Cloud user, defaults to None.

password: str The user password for Cloud user, defaults to None.


Interface

Insert documentation into Milvus as a knowledge base.

Parameters:

doc: str

Path or url of the document to be loaded.

Returns: MutationResult

A MutationResult after inserting Milvus.

4.1 KiB

Enhanced QA Insert

Description

Enhanced question-answering is the process of creating the knowledge base and generating answers with LLMs(large language model), thus preventing illusions. It involves inserting data as knowledge base and querying questions, and eqa-insert is used to insert document data for knowledge base.


Code Example

  • Create Milvus Collection

Before running the pipeline, please create Milvus collection first.

The dim is the dimensionality of the feature vector generated by the configured model in the eqa-insert pipeline.

from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility

collection_name = 'chatbot'
dim = 384

connections.connect(host='172.16.70.4', port='19530')

fields = [
   FieldSchema(name='id', dtype=DataType.INT64, descrition='ids', is_primary=True, auto_id=True),
   FieldSchema(name='text_id', dtype=DataType.VARCHAR, descrition='text', max_length=500),
   FieldSchema(name='text', dtype=DataType.VARCHAR, descrition='text', max_length=1000),
   FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, descrition='embedding vectors', dim=dim)
]
schema = CollectionSchema(fields=fields, description='enhanced qa')
collection = Collection(name=collection_name, schema=schema)

index_params = {
    'metric_type':"IP",
    'index_type':"IVF_FLAT",
    'params':{"nlist":2048}
}
collection.create_index(field_name="embedding", index_params=index_params)
  • Create image embedding pipeline and set the configuration

More parameters refer to the Configuration.

from towhee import AutoPipes, AutoConfig

config = AutoConfig.load_config('eqa-insert')
config.model = 'all-MiniLM-L6-v2'
config.host = '127.0.0.1'
config.port = '19530'
config.collection_name = collection_name

p = AutoPipes.pipeline('eqa-insert', config=config)
res = p('https://raw.githubusercontent.com/towhee-io/towhee/main/README.md')

Then you can run collection.num_entities to check the number of the data in Milvus as a knowledge base.


Configuration

EnhancedQAInsertConfig

Configuration for Text Loader

chunk_size: int The size of each chunk, defaults to 300.

source_type: str The type of the soure, defaults to 'file', you can also set to 'url' for you url of your documentation.

Configuration for Sentence Embedding

model: str The model name in the sentence embedding pipeline, defaults to 'all-MiniLM-L6-v2'. You can refer to the above Model(s) list to set the model, some of these models are from HuggingFace (open source), and some are from OpenAI (not open, required API key).

openai_api_key: str The api key of openai, default to None. This key is required if the model is from OpenAI, you can check the model provider in the above Model(s) list.

customize_embedding_op: str The name of the customize embedding operator, defaults to None.

normalize_vec: bool Whether to normalize the embedding vectors, defaults to True.

device: int The number of devices, defaults to -1, which means using the CPU. If the setting is not -1, the specified GPU device will be used.

Configuration for Milvus

host: str Host of Milvus vector database, default is '127.0.0.1'.

port: str Port of Milvus vector database, default is '19530'.

collection_name: str The collection name for Milvus vector database, is required when inserting data into Milvus.

user: str The user name for Cloud user, defaults to None.

password: str The user password for Cloud user, defaults to None.


Interface

Insert documentation into Milvus as a knowledge base.

Parameters:

doc: str

Path or url of the document to be loaded.

Returns: MutationResult

A MutationResult after inserting Milvus.