towhee
/
text-splitter
copied
2 changed files with 71 additions and 1 deletions
@ -1,2 +1,72 @@ |
|||||
# text-spliter |
|
||||
|
# Text Spliter |
||||
|
|
||||
|
*author: shiyu22* |
||||
|
|
||||
|
<br /> |
||||
|
|
||||
|
|
||||
|
|
||||
|
### Description |
||||
|
|
||||
|
**Text spliter** is used to split text into chunk lists. |
||||
|
|
||||
|
> Refer to [Recursive Characters](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html) for the operation of splitting text. |
||||
|
|
||||
|
<br /> |
||||
|
|
||||
|
|
||||
|
|
||||
|
### Code Example |
||||
|
|
||||
|
```Python |
||||
|
from towhee import pipe, ops, DataCollection |
||||
|
|
||||
|
p = ( |
||||
|
pipe.input('url') |
||||
|
.map('url', 'text', ops.text_loader()) |
||||
|
.flat_map('text', 'text', ops.text_spliter()) |
||||
|
.output('url', 'text') |
||||
|
) |
||||
|
|
||||
|
res = p('https://github.com/towhee-io/towhee/blob/main/README.md') |
||||
|
DataCollection(res).show() |
||||
|
``` |
||||
|
|
||||
|
<img src="./result.png" alt="result" height="200px"/> |
||||
|
|
||||
|
<br /> |
||||
|
|
||||
|
|
||||
|
|
||||
|
## Factory Constructor |
||||
|
|
||||
|
Create the operator via the following factory method |
||||
|
|
||||
|
***towhee.text_loader(chunk_size=300)*** |
||||
|
|
||||
|
**Parameters:** |
||||
|
|
||||
|
***chunk_size***: int |
||||
|
|
||||
|
The size of each chunk, defaults to 300. |
||||
|
|
||||
|
<br /> |
||||
|
|
||||
|
|
||||
|
|
||||
|
### Interface |
||||
|
|
||||
|
The operator split incoming the text and return chunks. |
||||
|
|
||||
|
**Parameters:** |
||||
|
|
||||
|
***data***: str |
||||
|
|
||||
|
The text data. |
||||
|
|
||||
|
|
||||
|
|
||||
|
**Return**: List[Document] |
||||
|
|
||||
|
A list of the chunked document. |
||||
|
|
||||
|
After Width: | Height: | Size: 149 KiB |
Loading…
Reference in new issue