towhee
/
text-loader
copied
3 changed files with 73 additions and 1 deletions
@ -1,2 +1,73 @@ |
|||||
# text_loader |
|
||||
|
# Text Loader |
||||
|
|
||||
|
*author: shiyu22* |
||||
|
|
||||
|
<br /> |
||||
|
|
||||
|
|
||||
|
|
||||
|
### Description |
||||
|
|
||||
|
**Text Loader** is used to load the documents and split it to a list of text. |
||||
|
|
||||
|
**Text loader** is used to load files and split them into text lists. It supports loading local files (with file path), or web links (with url). |
||||
|
|
||||
|
> Refer to [Recursive Characters](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html) for the operation of splitting text. |
||||
|
|
||||
|
<br /> |
||||
|
|
||||
|
|
||||
|
|
||||
|
### Code Example |
||||
|
|
||||
|
Writing the pipeline in the simplified way |
||||
|
|
||||
|
```Python |
||||
|
from towhee import pipe, ops, DataCollection |
||||
|
|
||||
|
p = ( |
||||
|
pipe.input('url') |
||||
|
.flat_map('url', 'text', ops.text_loader(source_type='url')) |
||||
|
.output('url', 'text') |
||||
|
) |
||||
|
|
||||
|
res = p('https://docs.towhee.io/Getting%20Started/create-pipeline/') |
||||
|
DataCollection(res).show() |
||||
|
``` |
||||
|
|
||||
|
<img src="./result.png" alt="result" height="180px"/> |
||||
|
|
||||
|
<br /> |
||||
|
|
||||
|
|
||||
|
|
||||
|
## Factory Constructor |
||||
|
|
||||
|
Create the operator via the following factory method |
||||
|
|
||||
|
***towhee.text_loader(chunk_size=300, source_type='file')*** |
||||
|
|
||||
|
<br /> |
||||
|
|
||||
|
|
||||
|
|
||||
|
### Interface |
||||
|
|
||||
|
The operator load the documentation, then split incoming the text and return chunks. |
||||
|
|
||||
|
**Parameters:** |
||||
|
|
||||
|
***chunk_size***: int |
||||
|
|
||||
|
The size of each chunk, defaults to 300. |
||||
|
|
||||
|
***source_type***: str |
||||
|
|
||||
|
The type of the soure, defaults to 'file', you can also set to 'url' for you url of your documentation. |
||||
|
|
||||
|
|
||||
|
|
||||
|
**Return**: List[Document] |
||||
|
|
||||
|
A list of the chunked document. |
||||
|
|
||||
|
@ -1 +1,2 @@ |
|||||
langchain>=0.0.151 |
langchain>=0.0.151 |
||||
|
unstructured |
||||
|
After Width: | Height: | Size: 122 KiB |
Loading…
Reference in new issue