Readme

Files and versions

Updated 2 years ago

towhee

eqa-search

Enhanced QA Search

Description

Enhanced question-answering is the process of creating the knowledge base and generating answers with LLMs(large language model), thus preventing illusions. It involves inserting data as knowledge base and querying questions, and eqa-search is used to query questions from knowledge base.

Code Example

Create pipeline and set the configuration

More parameters refer to the Configuration.

from towhee import AutoPipes, AutoConfig

config = AutoConfig.load_config('eqa-search')
config.host = '127.0.0.1'
config.port = '19530'
config.collection_name = 'chatbot'
config.top_k = 5

# If using zilliz cloud
# config.user = [zilliz-cloud-username]
# config.password = [zilliz-cloud-password]

# OpenAI api key
config.openai_api_key = [your-openai-api-key]
# Embedding model
config.embedding_model = 'all-MiniLM-L6-v2'
# Embedding model device
config.embedding_device = -1

# Rerank the docs searched from knowledge base
config.rerank = True

# The llm model source, openai or dolly
config.llm_src = 'openai'
# The openai model name
config.openai_model = 'gpt-3.5-turbo'
# The dolly model name
# config.dolly_model = 'databricks/dolly-v2-12b'

p = AutoPipes.pipeline('eqa-search', config=config)
res = p('What is towhee?', [])

Enhanced QA Search Config

Configuration for Sentence Embedding

model (str):

The model name in the sentence embedding pipeline, defaults to 'all-MiniLM-L6-v2'. You can refer to the above Model(s) list to set the model, some of these models are from HuggingFace (open source), and some are from OpenAI (not open, required API key).

openai_api_key (str):

The api key of openai, default to None. This key is required if the model is from OpenAI, you can check the model provider in the above Model(s) list.

embedding_device (int):

The number of devices, defaults to -1, which means using the CPU. If the setting is not -1, the specified GPU device will be used.

Configuration for Milvus

host (str):

Host of Milvus vector database, default is '127.0.0.1'.

port (str):

Port of Milvus vector database, default is '19530'.

top_k (int):

The number of nearest search results, defaults to 5.

collection_name (str):

The collection name for Milvus vector database.

user (str):

The user name for Cloud user, defaults to None.

password (str):

The user password for Cloud user, defaults to None.

Configuration for Rerank

rerank: bool

Whether to rerank the docs searched from knowledge base, defaults to False. If set it to True it will using the rerank operator.

rerank_model: str

The name of rerank model, you can set it according to the rerank operator.

threshold: Union[float, int]

The threshold for rerank, defaults to 0.6. If the rerank is False, it will filter the milvus search result, otherwise it will be filtered with the rerank operator.

Configuration for LLM

llm_src (str):

The llm model source, openai or dolly, defaults to openai.

openai_model (str):

The openai model name, defaults to gpt-3.5-turbo.

dolly_model (str):

The dolly model name, defaults to databricks/dolly-v2-3b.

customize_llm (Any):*

Users customize LLM.

customize_prompt (Any):*

Users customize prompt.

ernie_api_key (str):

ernie_api_key for ernie bot

ernie_secret_key (str):

ernie_secret_key for ernie bot

Interface

Query a question from Milvus knowledge base.

Parameters:

question (str): The question to query.
history (List[str]): The chat history to provide background information.

Returns:

Answer (str): The answer to the question.

More Resources

Search and Information Retrieval in the Era of Generative AI - Zilliz blog: Despite advances in LLMs like ChatGPT, search still matters. Combining GenAI with search and vector databases enhances search accuracy and experience.
Semantic Search with Milvus and OpenAI - Zilliz blog: In this guide, weâll explore semantic search capabilities through the integration of Milvus and OpenAIâs Embedding API, using a book title search as an example use case.
Enhancing RAG with Knowledge Graphs - Zilliz blog: Knowledge Graphs (KGs) store and link data based on their relationships. KG-enhanced RAG can significantly improve retrieval capabilities and answer quality.
Compare Vector Databases, Vector Search Libraries and Plugins - Zilliz blog: Deep diving into better understanding vector databases and comparing them to vector search libraries and vector search plugins.
Metrics-Driven Development of RAGs - Zilliz blog: Evaluating and improving Retrieval-Augmented Generation (RAG) systems is a nuanced but essential task in the realm of AI-driven information retrieval. By leveraging a metrics-driven approach, as demonstrated by Jithin James and Shahul Es, you can systematically refine your RAG systems to ensure they deliver accurate, relevant, and trustworthy information.
What Is Semantic Search?: Semantic search is a search technique that uses natural language processing (NLP) and machine learning (ML) to understand the context and meaning behind a user's search query.
Similarity Metrics for Vector Search - Zilliz blog: Exploring five similarity metrics for vector search: L2 or Euclidean distance, cosine distance, inner product, and hamming distance.

Jael Gu e3c0d88edd Add more resources Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>			19 Commits
.gitattributes	1.1 KiB	Initial commit	3 years ago
README.md	6.3 KiB	Add more resources	2 years ago
eqa_search.py	5.5 KiB	Add rerank config	3 years ago