towhee
/
text-splitter
copied
2 changed files with 71 additions and 1 deletions
@ -1,2 +1,72 @@ |
|||
# text-spliter |
|||
# Text Spliter |
|||
|
|||
*author: shiyu22* |
|||
|
|||
<br /> |
|||
|
|||
|
|||
|
|||
### Description |
|||
|
|||
**Text spliter** is used to split text into chunk lists. |
|||
|
|||
> Refer to [Recursive Characters](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html) for the operation of splitting text. |
|||
|
|||
<br /> |
|||
|
|||
|
|||
|
|||
### Code Example |
|||
|
|||
```Python |
|||
from towhee import pipe, ops, DataCollection |
|||
|
|||
p = ( |
|||
pipe.input('url') |
|||
.map('url', 'text', ops.text_loader()) |
|||
.flat_map('text', 'text', ops.text_spliter()) |
|||
.output('url', 'text') |
|||
) |
|||
|
|||
res = p('https://github.com/towhee-io/towhee/blob/main/README.md') |
|||
DataCollection(res).show() |
|||
``` |
|||
|
|||
<img src="./result.png" alt="result" height="200px"/> |
|||
|
|||
<br /> |
|||
|
|||
|
|||
|
|||
## Factory Constructor |
|||
|
|||
Create the operator via the following factory method |
|||
|
|||
***towhee.text_loader(chunk_size=300)*** |
|||
|
|||
**Parameters:** |
|||
|
|||
***chunk_size***: int |
|||
|
|||
The size of each chunk, defaults to 300. |
|||
|
|||
<br /> |
|||
|
|||
|
|||
|
|||
### Interface |
|||
|
|||
The operator split incoming the text and return chunks. |
|||
|
|||
**Parameters:** |
|||
|
|||
***data***: str |
|||
|
|||
The text data. |
|||
|
|||
|
|||
|
|||
**Return**: List[Document] |
|||
|
|||
A list of the chunked document. |
|||
|
|||
|
After Width: | Height: | Size: 149 KiB |
Loading…
Reference in new issue