Text Loader

author: shiyu22

Description

Text loader is used to load text file. It supports loading data from url or file path(file format as .md or .txt).

Code Example

from towhee import pipe, ops, DataCollection

p = (
    pipe.input('url')
        .map('url', 'text', ops.text_loader())
        .output('url', 'text')
    )

res = p('https://github.com/towhee-io/towhee/blob/main/README.md')
DataCollection(res).show()

Factory Constructor

Create the operator via the following factory method

towhee.text_loader()

Interface

The operator load the documentation, then split incoming the text and return chunks.

Parameters:

data_src: str

Path or url of the document to be loaded.

Return: str

String data with the text.

More Resources

Experiment with 5 Chunking Strategies via LangChain for LLM - Zilliz blog: Explore the complexities of text chunking in retrieval augmented generation applications and learn how different chunking strategies impact the same piece of data.
Massive Text Embedding Benchmark (MTEB): A standardized way to evaluate text embedding models across a range of tasks and languages, leading to better text embedding models for your app
ChatGPT retrieval plugin with Zilliz and Milvus - Zilliz blog: Milvus and Zilliz are one of the preferred vector databases to store these embeddings that can be accessed with the ChatGPT retrieval plugin.
Tutorial: Diving into Text Embedding Models | Zilliz Webinar: Register for a free webinar diving into text embedding models in a presentation and tutorial
Tutorial: Diving into Text Embedding Models | Zilliz Webinar: Register for a free webinar diving into text embedding models in a presentation and tutorial
An LLM Powered Text to Image Prompt Generation with Milvus - Zilliz blog: An interesting LLM project powered by the Milvus vector database for generating more efficient text-to-image prompts.
A Beginner's Guide to Website Chunking and Embedding for Your RAG Applications - Zilliz blog: This post explains how to extract content from a website and use it as context for LLMs in a RAG application. However, before doing so, we need to understand website fundamentals.
Text as Data, From Anywhere to Anywhere - Zilliz blog: Whether you prefer a no-code or minimal-code approach, Airbyte and PyAirbyte offer robust solutions for integrating both structured and unstructured data. AJ Steers' painted a good picture of the potential of these tools in revolutionizing data workflows.
From Text to Image: Fundamentals of CLIP - Zilliz blog: Search algorithms rely on semantic similarity to retrieve the most relevant results. With the CLIP model, the semantics of texts and images can be connected in a high-dimensional vector space. Read this simple introduction to see how CLIP can help you build a powerful text-to-image service.

Jael Gu 0412fe94d2 Add more resources Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>			12 Commits
.gitattributes	1.1 KiB	Initial commit	2 years ago
README.md	3.6 KiB	Add more resources	11 months ago
__init__.py	106 B	Update loader	2 years ago
loader.py	1.5 KiB	Debug failed loads	2 years ago
requirements.txt	51 B	Install smaller unstructured	2 years ago
result.png	38 KiB	Update loader	2 years ago