logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

1.4 KiB

Text Loader

author: shiyu22


Description

Text loader is used to load files and split them into text lists. It supports loading local files (with file path), or web links (with url).

Refer to Recursive Characters for the operation of splitting text.


Code Example

from towhee import pipe, ops, DataCollection

p = (
    pipe.input('url')
        .flat_map('url', 'text', ops.text_loader(source_type='url'))
        .output('url', 'text')
    )

res = p('https://docs.towhee.io/Getting%20Started/create-pipeline/')
DataCollection(res).show()
result


Factory Constructor

Create the operator via the following factory method

towhee.text_loader(chunk_size=300, source_type='file')

Parameters:

chunk_size: int

​ The size of each chunk, defaults to 300.

source_type: str

​ The type of the soure, defaults to 'file', you can also set to 'url' for you url of your documentation.


Interface

The operator load the documentation, then split incoming the text and return chunks.

Parameters:

data_src: str

​ Path or url of the document to be loaded.

Return: List[Document]

A list of the chunked document.

1.4 KiB

Text Loader

author: shiyu22


Description

Text loader is used to load files and split them into text lists. It supports loading local files (with file path), or web links (with url).

Refer to Recursive Characters for the operation of splitting text.


Code Example

from towhee import pipe, ops, DataCollection

p = (
    pipe.input('url')
        .flat_map('url', 'text', ops.text_loader(source_type='url'))
        .output('url', 'text')
    )

res = p('https://docs.towhee.io/Getting%20Started/create-pipeline/')
DataCollection(res).show()
result


Factory Constructor

Create the operator via the following factory method

towhee.text_loader(chunk_size=300, source_type='file')

Parameters:

chunk_size: int

​ The size of each chunk, defaults to 300.

source_type: str

​ The type of the soure, defaults to 'file', you can also set to 'url' for you url of your documentation.


Interface

The operator load the documentation, then split incoming the text and return chunks.

Parameters:

data_src: str

​ Path or url of the document to be loaded.

Return: List[Document]

A list of the chunked document.