towhee
/
text-loader
copied
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions
Updated 6 months ago
towhee
Text Loader
author: shiyu22
Description
Text loader is used to load text file. It supports loading data from url or file path(file format as .md or .txt).
Code Example
from towhee import pipe, ops, DataCollection
p = (
pipe.input('url')
.map('url', 'text', ops.text_loader())
.output('url', 'text')
)
res = p('https://github.com/towhee-io/towhee/blob/main/README.md')
DataCollection(res).show()

Factory Constructor
Create the operator via the following factory method
towhee.text_loader()
Interface
The operator load the documentation, then split incoming the text and return chunks.
Parameters:
data_src: str
Path or url of the document to be loaded.
Return: str
String data with the text.
More Resources
- Experiment with 5 Chunking Strategies via LangChain for LLM - Zilliz blog: Explore the complexities of text chunking in retrieval augmented generation applications and learn how different chunking strategies impact the same piece of data.
- Massive Text Embedding Benchmark (MTEB): A standardized way to evaluate text embedding models across a range of tasks and languages, leading to better text embedding models for your app
- ChatGPT retrieval plugin with Zilliz and Milvus - Zilliz blog: Milvus and Zilliz are one of the preferred vector databases to store these embeddings that can be accessed with the ChatGPT retrieval plugin.
- Tutorial: Diving into Text Embedding Models | Zilliz Webinar: Register for a free webinar diving into text embedding models in a presentation and tutorial
- Tutorial: Diving into Text Embedding Models | Zilliz Webinar: Register for a free webinar diving into text embedding models in a presentation and tutorial
- An LLM Powered Text to Image Prompt Generation with Milvus - Zilliz blog: An interesting LLM project powered by the Milvus vector database for generating more efficient text-to-image prompts.
- A Beginner's Guide to Website Chunking and Embedding for Your RAG Applications - Zilliz blog: This post explains how to extract content from a website and use it as context for LLMs in a RAG application. However, before doing so, we need to understand website fundamentals.
- Text as Data, From Anywhere to Anywhere - Zilliz blog: Whether you prefer a no-code or minimal-code approach, Airbyte and PyAirbyte offer robust solutions for integrating both structured and unstructured data. AJ Steers' painted a good picture of the potential of these tools in revolutionizing data workflows.
- From Text to Image: Fundamentals of CLIP - Zilliz blog: Search algorithms rely on semantic similarity to retrieve the most relevant results. With the CLIP model, the semantics of texts and images can be connected in a high-dimensional vector space. Read this simple introduction to see how CLIP can help you build a powerful text-to-image service.
| 12 Commits | ||
---|---|---|---|
|
1.1 KiB
|
2 years ago | |
|
3.6 KiB
|
6 months ago | |
|
106 B
|
2 years ago | |
|
1.5 KiB
|
2 years ago | |
|
51 B
|
1 year ago | |
|
38 KiB
|
2 years ago |