towhee
/
text-loader
copied
3 changed files with 73 additions and 1 deletions
@ -1,2 +1,73 @@ |
|||
# text_loader |
|||
# Text Loader |
|||
|
|||
*author: shiyu22* |
|||
|
|||
<br /> |
|||
|
|||
|
|||
|
|||
### Description |
|||
|
|||
**Text Loader** is used to load the documents and split it to a list of text. |
|||
|
|||
**Text loader** is used to load files and split them into text lists. It supports loading local files (with file path), or web links (with url). |
|||
|
|||
> Refer to [Recursive Characters](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html) for the operation of splitting text. |
|||
|
|||
<br /> |
|||
|
|||
|
|||
|
|||
### Code Example |
|||
|
|||
Writing the pipeline in the simplified way |
|||
|
|||
```Python |
|||
from towhee import pipe, ops, DataCollection |
|||
|
|||
p = ( |
|||
pipe.input('url') |
|||
.flat_map('url', 'text', ops.text_loader(source_type='url')) |
|||
.output('url', 'text') |
|||
) |
|||
|
|||
res = p('https://docs.towhee.io/Getting%20Started/create-pipeline/') |
|||
DataCollection(res).show() |
|||
``` |
|||
|
|||
<img src="./result.png" alt="result" height="180px"/> |
|||
|
|||
<br /> |
|||
|
|||
|
|||
|
|||
## Factory Constructor |
|||
|
|||
Create the operator via the following factory method |
|||
|
|||
***towhee.text_loader(chunk_size=300, source_type='file')*** |
|||
|
|||
<br /> |
|||
|
|||
|
|||
|
|||
### Interface |
|||
|
|||
The operator load the documentation, then split incoming the text and return chunks. |
|||
|
|||
**Parameters:** |
|||
|
|||
***chunk_size***: int |
|||
|
|||
The size of each chunk, defaults to 300. |
|||
|
|||
***source_type***: str |
|||
|
|||
The type of the soure, defaults to 'file', you can also set to 'url' for you url of your documentation. |
|||
|
|||
|
|||
|
|||
**Return**: List[Document] |
|||
|
|||
A list of the chunked document. |
|||
|
|||
|
@ -1 +1,2 @@ |
|||
langchain>=0.0.151 |
|||
unstructured |
|||
|
After Width: | Height: | Size: 122 KiB |
Loading…
Reference in new issue