towhee
copied
Readme
Files and versions
1.4 KiB
Text Loader
author: shiyu22
Description
Text loader is used to load files and split them into text lists. It supports loading local files (with file path), or web links (with url).
Refer to Recursive Characters for the operation of splitting text.
Code Example
from towhee import pipe, ops, DataCollection
p = (
pipe.input('url')
.flat_map('url', 'text', ops.text_loader(source_type='url'))
.output('url', 'text')
)
res = p('https://docs.towhee.io/Getting%20Started/create-pipeline/')
DataCollection(res).show()

Factory Constructor
Create the operator via the following factory method
towhee.text_loader(chunk_size=300, source_type='file')
Parameters:
chunk_size: int
The size of each chunk, defaults to 300.
source_type: str
The type of the soure, defaults to 'file', you can also set to 'url' for you url of your documentation.
Interface
The operator load the documentation, then split incoming the text and return chunks.
Parameters:
data_src: str
Path or url of the document to be loaded.
Return: List[Document]
A list of the chunked document.
1.4 KiB
Text Loader
author: shiyu22
Description
Text loader is used to load files and split them into text lists. It supports loading local files (with file path), or web links (with url).
Refer to Recursive Characters for the operation of splitting text.
Code Example
from towhee import pipe, ops, DataCollection
p = (
pipe.input('url')
.flat_map('url', 'text', ops.text_loader(source_type='url'))
.output('url', 'text')
)
res = p('https://docs.towhee.io/Getting%20Started/create-pipeline/')
DataCollection(res).show()

Factory Constructor
Create the operator via the following factory method
towhee.text_loader(chunk_size=300, source_type='file')
Parameters:
chunk_size: int
The size of each chunk, defaults to 300.
source_type: str
The type of the soure, defaults to 'file', you can also set to 'url' for you url of your documentation.
Interface
The operator load the documentation, then split incoming the text and return chunks.
Parameters:
data_src: str
Path or url of the document to be loaded.
Return: List[Document]
A list of the chunked document.