Readme

Files and versions

7.2 KiB

Raw Blame History

Enhanced QA Insert

Description

Enhanced question-answering is the process of creating the knowledge base and generating answers with LLMs(large language model), thus preventing illusions. It involves inserting data as knowledge base and querying questions, and eqa-insert is used to insert document data for knowledge base.

Code Example

Create Milvus collection

Before running the pipeline, please create Milvus collection first.

The dim is the dimensionality of the feature vector generated by the configured model in the eqa-insert pipeline.

from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility

collection_name = 'chatbot'
dim = 384

connections.connect(host='127.0.0.1', port='19530')

fields = [
   FieldSchema(name='id', dtype=DataType.INT64, description='ids', is_primary=True, auto_id=True),
   FieldSchema(name='text_id', dtype=DataType.VARCHAR, description='text', max_length=500),
   FieldSchema(name='text', dtype=DataType.VARCHAR, description='text', max_length=1000),
   FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, description='embedding vectors', dim=dim)
]
schema = CollectionSchema(fields=fields, description='enhanced qa')
collection = Collection(name=collection_name, schema=schema)

index_params = {
    'metric_type':"IP",
    'index_type':"IVF_FLAT",
    'params':{"nlist":2048}
}
collection.create_index(field_name="embedding", index_params=index_params)

Create pipeline and set the configuration

More parameters refer to the Configuration.

from towhee import AutoPipes, AutoConfig

config = AutoConfig.load_config('eqa-insert')
config.embedding_model = 'all-MiniLM-L6-v2'
config.host = '127.0.0.1'
config.port = '19530'
config.collection_name = collection_name

p = AutoPipes.pipeline('eqa-insert', config=config)
res = p('https://github.com/towhee-io/towhee/blob/main/README.md')

Then you can run collection.flush() and collection.num_entities to check the number of the data in Milvus as a knowledge base.

Configuration

EnhancedQAInsertConfig

Configuration for Text Splitter:

type: str

The type of splitter, defaults to 'RecursiveCharacter'. You can set this parameter in ['RecursiveCharacter', 'Markdown', 'PythonCode', 'Character', 'NLTK', 'Spacy', 'Tiktoken', 'HuggingFace'].

chunk_size: int The size of each chunk, defaults to 300.

splitter_kwargs: dict

The kwargs for the splitter, defaults to {}.

Configuration for Sentence Embedding:

embedding_model: str The model name for sentence embedding, defaults to 'all-MiniLM-L6-v2'. You can refer to the above Model(s) list to set the model, some of these models are from HuggingFace (open source), and some are from OpenAI (not open, required API key).

openai_api_key: str The api key of openai, default to None. This key is required if the model is from OpenAI, you can check the model provider in the above Model(s) list.

embedding_device: int The number of device, defaults to -1, which means using the CPU. If the setting is not -1, the specified GPU device will be used.

Configuration for Milvus:

host: str Host of Milvus vector database, default is '127.0.0.1'.

port: str Port of Milvus vector database, default is '19530'.

collection_name: str The collection name for Milvus vector database, is required when inserting data into Milvus.

user: str The user name for Cloud user, defaults to None.

password: str The user password for Cloud user, defaults to None.

Interface

Insert documentation into Milvus as a knowledge base.

Parameters:

doc: str

Path or url of the document to be loaded.

Returns: MutationResult

A MutationResult after inserting Milvus.

More Resources

Enhancing RAG with Knowledge Graphs - Zilliz blog: Knowledge Graphs (KGs) store and link data based on their relationships. KG-enhanced RAG can significantly improve retrieval capabilities and answer quality.
Metrics-Driven Development of RAGs - Zilliz blog: Evaluating and improving Retrieval-Augmented Generation (RAG) systems is a nuanced but essential task in the realm of AI-driven information retrieval. By leveraging a metrics-driven approach, as demonstrated by Jithin James and Shahul Es, you can systematically refine your RAG systems to ensure they deliver accurate, relevant, and trustworthy information.
How to Evaluate RAG Applications - Zilliz blog: Effective Evaluation strategies for your RAG Application
Building an Intelligent QA System with NLP and Milvus - Zilliz blog: The Next-Gen QA Bot is here
Using Voyage AI's embedding models in Zilliz Cloud Pipelines - Zilliz blog: Assess the effectiveness of a RAG system implemented with various embedding models for code-related tasks.
How to Build Retrieval Augmented Generation (RAG) with Milvus Lite, Llama3 and LlamaIndex - Zilliz blog: Retrieval Augmented Generation (RAG) is a method for mitigating LLM hallucinations. Learn how to build a chatbot RAG with Milvus, Llama3, and LlamaIndex.
Safeguarding Data Integrity: On-Prem RAG Deployment: Register for a free webinar exploring how you can deploy RAG applications on-prem using open source tools such as LLMWare and Milvus.
Safeguarding Data Integrity: On-Prem RAG Deployment: Register for a free webinar exploring how you can deploy RAG applications on-prem using open source tools such as LLMWare and Milvus.

7.2 KiB

Raw Blame History

Enhanced QA Insert

Description

Code Example

Create Milvus collection

Before running the pipeline, please create Milvus collection first.

The dim is the dimensionality of the feature vector generated by the configured model in the eqa-insert pipeline.

from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility

collection_name = 'chatbot'
dim = 384

connections.connect(host='127.0.0.1', port='19530')

fields = [
   FieldSchema(name='id', dtype=DataType.INT64, description='ids', is_primary=True, auto_id=True),
   FieldSchema(name='text_id', dtype=DataType.VARCHAR, description='text', max_length=500),
   FieldSchema(name='text', dtype=DataType.VARCHAR, description='text', max_length=1000),
   FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, description='embedding vectors', dim=dim)
]
schema = CollectionSchema(fields=fields, description='enhanced qa')
collection = Collection(name=collection_name, schema=schema)

index_params = {
    'metric_type':"IP",
    'index_type':"IVF_FLAT",
    'params':{"nlist":2048}
}
collection.create_index(field_name="embedding", index_params=index_params)

Create pipeline and set the configuration

More parameters refer to the Configuration.

from towhee import AutoPipes, AutoConfig

config = AutoConfig.load_config('eqa-insert')
config.embedding_model = 'all-MiniLM-L6-v2'
config.host = '127.0.0.1'
config.port = '19530'
config.collection_name = collection_name

p = AutoPipes.pipeline('eqa-insert', config=config)
res = p('https://github.com/towhee-io/towhee/blob/main/README.md')

Then you can run collection.flush() and collection.num_entities to check the number of the data in Milvus as a knowledge base.

Configuration

EnhancedQAInsertConfig

Configuration for Text Splitter:

type: str

The type of splitter, defaults to 'RecursiveCharacter'. You can set this parameter in ['RecursiveCharacter', 'Markdown', 'PythonCode', 'Character', 'NLTK', 'Spacy', 'Tiktoken', 'HuggingFace'].

chunk_size: int The size of each chunk, defaults to 300.

splitter_kwargs: dict

The kwargs for the splitter, defaults to {}.

Configuration for Sentence Embedding:

openai_api_key: str The api key of openai, default to None. This key is required if the model is from OpenAI, you can check the model provider in the above Model(s) list.

embedding_device: int The number of device, defaults to -1, which means using the CPU. If the setting is not -1, the specified GPU device will be used.

Configuration for Milvus:

host: str Host of Milvus vector database, default is '127.0.0.1'.

port: str Port of Milvus vector database, default is '19530'.

collection_name: str The collection name for Milvus vector database, is required when inserting data into Milvus.

user: str The user name for Cloud user, defaults to None.

password: str The user password for Cloud user, defaults to None.

Interface

Insert documentation into Milvus as a knowledge base.

Parameters:

doc: str

Path or url of the document to be loaded.

Returns: MutationResult

A MutationResult after inserting Milvus.

More Resources

Enhancing RAG with Knowledge Graphs - Zilliz blog: Knowledge Graphs (KGs) store and link data based on their relationships. KG-enhanced RAG can significantly improve retrieval capabilities and answer quality.
Metrics-Driven Development of RAGs - Zilliz blog: Evaluating and improving Retrieval-Augmented Generation (RAG) systems is a nuanced but essential task in the realm of AI-driven information retrieval. By leveraging a metrics-driven approach, as demonstrated by Jithin James and Shahul Es, you can systematically refine your RAG systems to ensure they deliver accurate, relevant, and trustworthy information.
How to Evaluate RAG Applications - Zilliz blog: Effective Evaluation strategies for your RAG Application
Building an Intelligent QA System with NLP and Milvus - Zilliz blog: The Next-Gen QA Bot is here
Using Voyage AI's embedding models in Zilliz Cloud Pipelines - Zilliz blog: Assess the effectiveness of a RAG system implemented with various embedding models for code-related tasks.
How to Build Retrieval Augmented Generation (RAG) with Milvus Lite, Llama3 and LlamaIndex - Zilliz blog: Retrieval Augmented Generation (RAG) is a method for mitigating LLM hallucinations. Learn how to build a chatbot RAG with Milvus, Llama3, and LlamaIndex.
Safeguarding Data Integrity: On-Prem RAG Deployment: Register for a free webinar exploring how you can deploy RAG applications on-prem using open source tools such as LLMWare and Milvus.
Safeguarding Data Integrity: On-Prem RAG Deployment: Register for a free webinar exploring how you can deploy RAG applications on-prem using open source tools such as LLMWare and Milvus.

Readme

Files and versions

7.2 KiB Raw Blame History

Enhanced QA Insert

Description

Code Example

Create Milvus collection

Create pipeline and set the configuration

Configuration

EnhancedQAInsertConfig

Configuration for Text Splitter:

Configuration for Sentence Embedding:

Configuration for Milvus:

Interface

More Resources

7.2 KiB Raw Blame History

Enhanced QA Insert

Description

Code Example

Create Milvus collection

Create pipeline and set the configuration

Configuration

EnhancedQAInsertConfig

Configuration for Text Splitter:

Configuration for Sentence Embedding:

Configuration for Milvus:

Interface

More Resources

7.2 KiB

Raw Blame History

7.2 KiB

Raw Blame History