logo
Browse Source

Update README

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
main
shiyu22 2 years ago
parent
commit
59684bfe8b
  1. 73
      README.md
  2. 1
      requirements.txt
  3. BIN
      result.png

73
README.md

@ -1,2 +1,73 @@
# text_loader
# Text Loader
*author: shiyu22*
<br />
### Description
**Text Loader** is used to load the documents and split it to a list of text.
**Text loader** is used to load files and split them into text lists. It supports loading local files (with file path), or web links (with url).
> Refer to [Recursive Characters](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html) for the operation of splitting text.
<br />
### Code Example
Writing the pipeline in the simplified way
```Python
from towhee import pipe, ops, DataCollection
p = (
pipe.input('url')
.flat_map('url', 'text', ops.text_loader(source_type='url'))
.output('url', 'text')
)
res = p('https://docs.towhee.io/Getting%20Started/create-pipeline/')
DataCollection(res).show()
```
<img src="./result.png" alt="result" height="180px"/>
<br />
## Factory Constructor
Create the operator via the following factory method
***towhee.text_loader(chunk_size=300, source_type='file')***
<br />
### Interface
The operator load the documentation, then split incoming the text and return chunks.
**Parameters:**
***chunk_size***: int
​ The size of each chunk, defaults to 300.
***source_type***: str
​ The type of the soure, defaults to 'file', you can also set to 'url' for you url of your documentation.
**Return**: List[Document]
A list of the chunked document.

1
requirements.txt

@ -1 +1,2 @@
langchain>=0.0.151
unstructured

BIN
result.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 122 KiB

Loading…
Cancel
Save