# Molecular Fingerprinting *author: shiyu*
## Desription Molecular Fingerprinting encodes a Simplified Molecular Input Line Entry Specification (SMILES) as a fingerprint. The fingerprint can represent elements, atom pairs, or functional groups, etc., and are often used for substructure searches and similarity searches in drug discovery. This operator uses [RDKit](https://www.rdkit.org/docs/index.html) to generate the molecular fingerprint.
## Code Example An example that use the Morgan algorithm to generate a fingerprint of the molecular formula 'Cc1ccc(cc1)S(=O)(=O)N'. *Write a same pipeline with explicit inputs/outputs name specifications:* ```python from towhee.dc2 import pipe, ops, DataCollection p = ( pipe.input('smiles') .map('smiles', 'fingerprint', ops.molecular_fingerprinting.rdkit()) .output('smiles', 'fingerprint') ) DataCollection(p('Cc1ccc(cc1)S(=O)(=O)N')).show() ```
## Factory Constructor Create the operator via the following factory method: ***molecular_fingerprinting.rdkit( algorithm: str = 'morgan', size: int = 2048)*** **Parameters:** ***algorithm:*** *str* Which algorithm to use for fingerprinting, including 'morgan', 'daylight', 'ap', 'maccs', defaluts to 'morgan', and there is the [list of available fingerprints](https://www.rdkit.org/docs/GettingStartedInPython.html#list-of-available-fingerprints). ***size:*** *int* The bit vector size just for morgan and daylight algorithm, defaults to 2048.
## Interface An molecular fingerprinting operator takes a SMILES as input. It uses the RDKit specified by algorithm name to generate a SMILES fingerprint. **Parameters:** ***smiles:*** *str* A Simplified Molecular Input Line Entry Specification (SMILES). **Returns:** *bytes* The molecular fingerprint.