# readthedocs *author: junjie.jiang*
## Desription To get the list of documents for a single Read the Docs project.
## Code Example ### Example ```python from towhee import DataLoader, pipe, ops p = ( pipe.input('url') .map('url', 'text', ops.text_loader()) .flat_map('text', 'sentence', ops.text_splitter()) .map('sentence', 'embedding', ops.sentence_embedding.transformers(model_name='all-MiniLM-L6-v2')) .map('embedding', 'embedding', ops.towhee.np_normalize()) .output('embedding') ) for data in DataLoader(ops.data_source.readthedocs('https://towhee.readthedocs.io/en/latest/', include='html', exclude='index.html')): print(p(data).to_list(kv_format=True)) # batch for data in DataLoader(ops.data_source.readthedocs('https://towhee.readthedocs.io/en/latest/', include='html', exclude='index.html'), batch_size=10): p.batch(data) ``` **Parameters:** ***page_prefix:*** *str* The root path of the page. Generally, the crawled links are relative paths. The complete URL needs to be obtained by splicing the root path + relative path. ***index_page:*** *str* The main page contains links to all other pages, if None, will use `page_prefix`. example: https://towhee.readthedocs.io/en/latest/ ***include:*** *Union[List[str], str]* Only contains URLs that meet this condition. ***exclude:*** *Union[List[str], str]* Filter out URLs that meet this condition.