p = AutoPipes.pipeline('osschat-insert', config=config)
res = p('https://github.com/towhee-io/towhee/blob/main/README.md', 'osschat', 'osschat')
res = p('https://github.com/towhee-io/towhee/blob/main/README.md', 'osschat')
```
Then you can run `collection.flush() ` and `collection.num_entities` to check the number of the data in Milvus as a knowledge base.
@ -81,6 +82,7 @@ And run `es_client.search(index='osschat', body={"query":{"match_all":{}}})['hit
The type of splitter, defaults to 'RecursiveCharacter'. You can set this parameter in ['[RecursiveCharacter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html)', '[Markdown](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/markdown.html)', '[PythonCode](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/python.html)', '[Character](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/character_text_splitter.html#)', '[NLTK](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/nltk.html)', '[Spacy](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/spacy.html)', '[Tiktoken](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/tiktoken_splitter.html)', '[HuggingFace](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/huggingface_length_function.html)'].
***chunk_size***: int
The size of each chunk, defaults to 300.
***splitter_kwargs***: dict
@ -90,43 +92,62 @@ The kwargs for the splitter, defaults to {}.
#### **Configuration for Sentence Embedding:**
***embedding_model***: str
The model name for sentence embedding, defaults to `'all-MiniLM-L6-v2'`.
You can refer to the above [Model(s) list ](https://towhee.io/tasks/detail/operator?field_name=Natural-Language-Processing&task_name=Sentence-Embedding)to set the model, some of these models are from [HuggingFace](https://huggingface.co/) (open source), and some are from [OpenAI](https://openai.com/) (not open, required API key).
***openai_api_key***: str
The api key of openai, default to `None`.
This key is required if the model is from OpenAI, you can check the model provider in the above [Model(s) list](https://towhee.io/sentence-embedding/openai).
***embedding_device:*** int
The number of device, defaults to `-1`, which means using the CPU.
If the setting is not `-1`, the specified GPU device will be used.
***embedding_normalize:*** bool
Whether to normalize the embedding vectors, defaults to `True`.
#### **Configuration for [Milvus](https://towhee.io/ann-insert/osschat-milvus):**
***milvus_host***: str
Host of Milvus vector database, default is `'127.0.0.1'`.
***milvus_port***: str
Port of Milvus vector database, default is `'19530'`.
***milvus_user***: str
The user name for [Cloud user](https://zilliz.com/cloud), defaults to `None`.
***milvus_password***: str
The user password for [Cloud user](https://zilliz.com/cloud), defaults to `None`.
#### **Configuration for [Elasticsearch](https://towhee.io/elasticsearch/osschat-index):**
***es_enable***: bool
Whether to use Elasticsearch, default is `True`.
***es_host***: str
Host of Elasticsearch, default is `'127.0.0.1'`.
***es_port***: str
Port of Elasticsearche, default is `'9200'`.
***es_user***: str
The user name for Elasticsearch, defaults to `None`.
***es_password***: str
The user password for Elasticsearch, defaults to `None`.
<br/>
@ -143,11 +164,9 @@ Insert documentation into Milvus as a knowledge base.
Path or url of the document to be loaded.
***milvus_collection***: str
The collection name for Milvus vector database, is required when inserting data into Milvus.
***project_name***: str
***es_index***: str
The index name of elasticsearch.
The collection name for Milvus vector database, also the index name of elasticsearch.