diff --git a/README.md b/README.md index 9e6f249..7092586 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,72 @@ -# text-spliter +# Text Spliter + +*author: shiyu22* + +
+ + + +### Description + +**Text spliter** is used to split text into chunk lists. + +> Refer to [Recursive Characters](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html) for the operation of splitting text. + +
+ + + +### Code Example + +```Python +from towhee import pipe, ops, DataCollection + +p = ( + pipe.input('url') + .map('url', 'text', ops.text_loader()) + .flat_map('text', 'text', ops.text_spliter()) + .output('url', 'text') + ) + +res = p('https://github.com/towhee-io/towhee/blob/main/README.md') +DataCollection(res).show() +``` + +result + +
+ + + +## Factory Constructor + +Create the operator via the following factory method + +***towhee.text_loader(chunk_size=300)*** + +**Parameters:** + +​ ***chunk_size***: int + +​ The size of each chunk, defaults to 300. + +
+ + + +### Interface + +The operator split incoming the text and return chunks. + +**Parameters:** + +​ ***data***: str + +​ The text data. + + + +**Return**: List[Document] + +A list of the chunked document. diff --git a/result.png b/result.png new file mode 100644 index 0000000..753b55d Binary files /dev/null and b/result.png differ