readthedocs

author: junjie.jiang

Desription

To get the list of documents for a single Read the Docs project.

Code Example

Example


from towhee import DataLoader, pipe, ops
p = (
    pipe.input('url')
    .map('url', 'text', ops.text_loader())
    .flat_map('text', 'sentence', ops.text_splitter())
    .map('sentence', 'embedding', ops.sentence_embedding.transformers(model_name='all-MiniLM-L6-v2'))
    .map('embedding', 'embedding', ops.towhee.np_normalize())
    .output('embedding')
)



for data in DataLoader(ops.data_source.readthedocs('https://towhee.readthedocs.io/en/latest/', include='html', exclude='index.html')):
    print(p(data).to_list(kv_format=True))

# batch
for data in DataLoader(ops.data_source.readthedocs('https://towhee.readthedocs.io/en/latest/', include='html', exclude='index.html'), batch_size=10):
    p.batch(data)

Parameters:

page_prefix: str

The root path of the page. Generally, the crawled links are relative paths. The complete URL needs to be obtained by splicing the root path + relative path.

index_page: str

The main page contains links to all other pages, if None, will use page_prefix.

example: https://towhee.readthedocs.io/en/latest/

include: Union[List[str], str]

Only contains URLs that meet this condition.

exclude: Union[List[str], str]

Filter out URLs that meet this condition.

More Resources

RAG Without OpenAI: BentoML, OctoAI and Milvus - Zilliz blog: In this tutorial we will use BentoML to serve embeddings, OctoAI to get the LLM and Milvus as our vector database.
Building RAG with Llama3, Ollama, DSPy, and Milvus - Zilliz blog: In this article, we aim to guide readers through constructing an RAG system using four key technologies: Llama3, Ollama, DSPy, and Milvus. First, letâs understand what they are.
An LLM Powered Text to Image Prompt Generation with Milvus - Zilliz blog: An interesting LLM project powered by the Milvus vector database for generating more efficient text-to-image prompts.
Vectorizing and Querying EPUB Content with the Unstructured and Milvus - Zilliz blog: In this post, we explore the vectorization and retrieval of EPUB data using Milvus and the Unstructured framework, offering developers actionable insights for enhancing LLM performance.
Vectorizing PDFs - Ingesting PDFs into Vector Databases with Milvus and Zilliz - Zilliz blog: You will learn how Zilliz Cloud Pipeline transforms PDF data into a format ready for LLMs to use in semantic search tasks. Finally, we will conduct data retrieval using vector search.
Training Text Embeddings with Jina AI - Zilliz blog: In a recent talk by Bo Wang, he discussed the creation of Jina text embeddings for modern vector search and RAG systems. He also shared methodologies for training embedding models that effectively encode extensive information, along with guidance o

Jael Gu ba5d165cc3 Add more resources Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>			5 Commits
.gitattributes	1.1 KiB	Initial commit	2 years ago
README.md	3.4 KiB	Add more resources	11 months ago
__init__.py	704 B	Add docs reader	2 years ago
docs_reader.py	2.1 KiB	Update	2 years ago
requirements.txt	13 B	Add docs reader	2 years ago