# Text Loader

*author: shiyu22*

<br />


### Description

**Text loader** is used to load text file. It supports loading data from url or file path(file format as .md or .txt).

<br />


### Code Example

```Python
from towhee import pipe, ops, DataCollection

p = (
    pipe.input('url')
        .map('url', 'text', ops.text_loader())
        .output('url', 'text')
    )

res = p('https://github.com/towhee-io/towhee/blob/main/README.md')
DataCollection(res).show()
```

<img src="./result.png" alt="result" height="80px"/>

<br />


## Factory Constructor

Create the operator via the following factory method

***towhee.text_loader()***

<br />


### Interface

The operator load the documentation, then split incoming the text and return chunks.

**Parameters:**

​	***data_src***: str

​	Path or url of the document to be loaded.


**Return**: str

String data with the text.


# More Resources

- [Experiment with 5 Chunking Strategies via LangChain for LLM  - Zilliz blog](https://zilliz.com/blog/experimenting-with-different-chunking-strategies-via-langchain): Explore the complexities of text chunking in retrieval augmented generation applications and learn how different chunking strategies impact the same piece of data.
- [Massive Text Embedding Benchmark (MTEB)](https://zilliz.com/glossary/massive-text-embedding-benchmark-(mteb)): A standardized way to evaluate text embedding models across a range of tasks and languages, leading to better text embedding models for your app
- [ChatGPT retrieval plugin with Zilliz and Milvus - Zilliz blog](https://zilliz.com/blog/chatgpt-retrieval-plugin-zilliz-milvus): Milvus and Zilliz are one of the preferred vector databases to store these embeddings that can be accessed with the ChatGPT retrieval plugin.
- [Tutorial: Diving into Text Embedding Models | Zilliz Webinar](https://zilliz.com/event/tutorial-text-embedding-models): Register for a free webinar diving into text embedding models in a presentation and tutorial
- [Tutorial: Diving into Text Embedding Models | Zilliz Webinar](https://zilliz.com/event/tutorial-text-embedding-models/success): Register for a free webinar diving into text embedding models in a presentation and tutorial
- [An LLM Powered Text to Image Prompt Generation with Milvus - Zilliz blog](https://zilliz.com/blog/llm-powered-text-to-image-prompt-generation-with-milvus): An interesting LLM project powered by the Milvus vector database for generating more efficient text-to-image prompts.
- [A Beginner's Guide to Website Chunking and Embedding for Your RAG Applications - Zilliz blog](https://zilliz.com/learn/beginner-guide-to-website-chunking-and-embedding-for-your-genai-applications): This post explains how to extract content from a website and use it as context for LLMs in a RAG application. However, before doing so, we need to understand website fundamentals.
- [Text as Data, From Anywhere to Anywhere - Zilliz blog](https://zilliz.com/blog/text-as-data-from-anywhere-to-anywhere): Whether you prefer a no-code or minimal-code approach, Airbyte and PyAirbyte offer robust solutions for integrating both structured and unstructured data. AJ Steers' painted a good picture of the potential of these tools in revolutionizing data workflows.
- [From Text to Image: Fundamentals of CLIP - Zilliz blog](https://zilliz.com/blog/fundamentals-of-clip): Search algorithms rely on semantic similarity to retrieve the most relevant results. With the CLIP model, the semantics of texts and images can be connected in a high-dimensional vector space. Read this simple introduction to see how CLIP can help you build a powerful text-to-image service.