# Molecular Fingerprinting
*author: shiyu*
## Desription
Molecular Fingerprinting encodes a Simplified Molecular Input Line Entry Specification (SMILES) as a fingerprint. The fingerprint can represent elements, atom pairs, or functional groups, etc., and are often used for substructure searches and similarity searches in drug discovery.
This operator uses [RDKit](https://www.rdkit.org/docs/index.html) to generate the molecular fingerprint.
## Code Example
> Before running the following code, you need to install rdkit, refer to https://www.rdkit.org/docs/Install.html.
>
> ```
> # install rdkit with conda
> $ conda install -c conda-forge rdkit
> ```
An example that use the Morgan algorithm to generate a fingerprint of the molecular formula 'Cc1ccc(cc1)S(=O)(=O)N'.
*Write the pipeline in simplified style:*
```python
import towhee
towhee.dc(['Cc1ccc(cc1)S(=O)(=O)N']) \
.molecular_fingerprinting.rdkit() \
.show()
```
*Write a same pipeline with explicit inputs/outputs name specifications:*
```python
import towhee
towhee.dc['smiles'](['Cc1ccc(cc1)S(=O)(=O)N']) \
.molecular_fingerprinting.rdkit['smiles', 'fingerprint']() \
.show()
```
## Factory Constructor
Create the operator via the following factory method:
***molecular_fingerprinting.rdkit( algorithm: str = 'morgan', size: int = 2048)***
**Parameters:**
***algorithm:*** *str*
Which algorithm to use for fingerprinting, including 'morgan', 'daylight', 'ap', 'maccs', defaluts to 'morgan', and there is the [list of available fingerprints](https://www.rdkit.org/docs/GettingStartedInPython.html#list-of-available-fingerprints).
***size:*** *int*
The bit vector size just for morgan and daylight algorithm, defaults to 2048.
## Interface
An molecular fingerprinting operator takes a SMILES as input.
It uses the RDKit specified by algorithm name to generate a SMILES fingerprint.
**Parameters:**
***smiles:*** *str*
A Simplified Molecular Input Line Entry Specification (SMILES).
**Returns:** *bytes*
The molecular fingerprint.