logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

Updated 2 years ago

towhee

Text Loader

author: shiyu22


Description

Text Loader is used to load the documents and split it to a list of text.

Text loader is used to load files and split them into text lists. It supports loading local files (with file path), or web links (with url).

Refer to Recursive Characters for the operation of splitting text.


Code Example

Writing the pipeline in the simplified way

from towhee import pipe, ops, DataCollection

p = (
    pipe.input('url')
        .flat_map('url', 'text', ops.text_loader(source_type='url'))
        .output('url', 'text')
    )

res = p('https://docs.towhee.io/Getting%20Started/create-pipeline/')
DataCollection(res).show()
result


Factory Constructor

Create the operator via the following factory method

towhee.text_loader(chunk_size=300, source_type='file')


Interface

The operator load the documentation, then split incoming the text and return chunks.

Parameters:

chunk_size: int

​ The size of each chunk, defaults to 300.

source_type: str

​ The type of the soure, defaults to 'file', you can also set to 'url' for you url of your documentation.

Return: List[Document]

A list of the chunked document.

shiyu22 59684bfe8b Update README 5 Commits
file-icon .gitattributes
1.1 KiB
download-icon
Initial commit 2 years ago
file-icon README.md
1.4 KiB
download-icon
Update README 2 years ago
file-icon __init__.py
106 B
download-icon
Update loader 2 years ago
file-icon loader.py
1.8 KiB
download-icon
Update loader 2 years ago
file-icon requirements.txt
32 B
download-icon
Update README 2 years ago
file-icon result.png
122 KiB
download-icon
Update README 2 years ago