# OSSChat Insert ## Description **[OSSChat](https://osschat.io/)** is enhanced ChatGPT with documentation, issues, blog posts, community Q&A as knowledge bases. Built for every community and developer. The osschat-insert pipeline is used to insert data.
## Code Example ### **Create Milvus collection** Before running the pipeline, please [create Milvus collection](https://milvus.io/docs/v2.0.x/create_collection.md) first. > The `dim` is the dimensionality of the feature vector generated by the configured `model` in the `osschat-insert` pipeline. ```python from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility collection_name = 'osschat' dim = 384 connections.connect(host='127.0.0.1', port='19530') fields = [ FieldSchema(name='id', dtype=DataType.INT64, description='ids', is_primary=True, auto_id=True), FieldSchema(name='text_id', dtype=DataType.VARCHAR, description='text', max_length=500), FieldSchema(name='text', dtype=DataType.VARCHAR, description='text', max_length=1000), FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, description='embedding vectors', dim=dim) ] schema = CollectionSchema(fields=fields, description='osschat') collection = Collection(name=collection_name, schema=schema) index_params = { 'metric_type':"IP", 'index_type':"IVF_FLAT", 'params':{"nlist":2048} } collection.create_index(field_name="embedding", index_params=index_params) ``` ### **Create pipeline and set the configuration** > If you set config.es_enable to True, you need also start [elasticsearch](https://www.elastic.co/elasticsearch/). > > More parameters refer to the Configuration. ```python from towhee import AutoPipes, AutoConfig config = AutoConfig.load_config('osschat-insert') config.embedding_model = 'all-MiniLM-L6-v2' config.milvus_host = '127.0.0.1' config.milvus_port = '19530' config.es_enable = True config.es_connection_kwargs = {hosts=['https://localhost:9200', basic_auth=('elastic', 'password')]} p = AutoPipes.pipeline('osschat-insert', config=config) res = p('https://github.com/towhee-io/towhee/blob/main/README.md', 'osschat') ``` Then you can run `collection.flush() ` and `collection.num_entities` to check the number of the data in Milvus as a knowledge base. And run `es_client.search(index='osschat', body={"query":{"match_all":{}}})['hits']['total']['value']` to get the number in elasticsearch.
## Configuration ### **EnhancedQAInsertConfig** #### **Configuration for [Text Splitter](https://towhee.io/towhee/text-splitter):** ***type***: str The type of splitter, defaults to 'RecursiveCharacter'. You can set this parameter in ['[RecursiveCharacter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html)', '[Markdown](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/markdown.html)', '[PythonCode](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/python.html)', '[Character](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/character_text_splitter.html#)', '[NLTK](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/nltk.html)', '[Spacy](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/spacy.html)', '[Tiktoken](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/tiktoken_splitter.html)', '[HuggingFace](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/huggingface_length_function.html)']. ***chunk_size***: int The size of each chunk, defaults to 300. ***splitter_kwargs***: dict The kwargs for the splitter, defaults to {}. #### **Configuration for Sentence Embedding:** ***embedding_model***: str The model name for sentence embedding, defaults to `'all-MiniLM-L6-v2'`. You can refer to the above [Model(s) list ](https://towhee.io/tasks/detail/operator?field_name=Natural-Language-Processing&task_name=Sentence-Embedding)to set the model, some of these models are from [HuggingFace](https://huggingface.co/) (open source), and some are from [OpenAI](https://openai.com/) (not open, required API key). ***openai_api_key***: str The api key of openai, default to `None`. This key is required if the model is from OpenAI, you can check the model provider in the above [Model(s) list](https://towhee.io/sentence-embedding/openai). ***embedding_device:*** int The number of device, defaults to `-1`, which means using the CPU. If the setting is not `-1`, the specified GPU device will be used. ***embedding_normalize:*** bool Whether to normalize the embedding vectors, defaults to `True`. #### **Configuration for [Milvus](https://towhee.io/ann-insert/osschat-milvus):** ***milvus_uri***: str URI of Milvus vector database, default is `None`, which will use host and port instead. ***milvus_host***: str Host of Milvus vector database, default is `'127.0.0.1'`. ***milvus_port***: str Port of Milvus vector database, default is `'19530'`. ***milvus_token***: str Token of [Zilliz Cloud user](https://zilliz.com/cloud), default is `None`. ***milvus_user***: str The user name for [Cloud user](https://zilliz.com/cloud), defaults to `None`. ***milvus_password***: str The user password for [Cloud user](https://zilliz.com/cloud), defaults to `None`. #### **Configuration for [Elasticsearch](https://towhee.io/elasticsearch/osschat-index):** ***es_enable***: bool Whether to use Elasticsearch, default is `True`. ***es_connection_kwargs*** The connection arguments to connect elastic service. ***token_model*** The model used to count tokens, defaults to 'gpt-3.5-turbo'.
## Interface Insert documentation into Milvus(and Elasticsearch) as a knowledge base. **Parameters:** ***doc***: str Path or url of the document to be loaded. ***project_name***: str The collection name for Milvus vector database, also the index name of Elasticsearch.
**Returns:** dict A dictionary includes 'milvus_res' and 'es_res'. # More Resources - [OpenAI's ChatGPT - Zilliz blog](https://zilliz.com/learn/ChatGPT-Vector-Database-Prompt-as-code): A guide to the new AI Stack - ChatGPT, your Vector Database, and Prompt as code - [Enhancing ChatGPT with Milvus: Powering AI with Long-Term Memory - Zilliz blog](https://zilliz.com/learn/enhancing-chatgpt-with-milvus): By integrating GPTCache and Milvus with ChatGPT, businesses can create a more robust and efficient AI-powered support system. This approach leverages the advanced capabilities of generative AI and introduces a form of long-term memory, allowing the AI to recall and reuse information effectively. - [ChatGPT+ Vector database + prompt-as-code - The CVP Stack - Zilliz blog](https://zilliz.com/blog/ChatGPT-VectorDB-Prompt-as-code): Extend the capability of ChatGPT with a Vector database and prompts-as-code