logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

96 lines
3.1 KiB

# Molecular Fingerprinting
3 years ago
*author: shiyu*
<br />
## Desription
Molecular Fingerprinting encodes a Simplified Molecular Input Line Entry Specification (SMILES) as a fingerprint. The fingerprint can represent elements, atom pairs, or functional groups, etc., and are often used for substructure searches and similarity searches in drug discovery.
This operator uses [RDKit](https://www.rdkit.org/docs/index.html) to generate the molecular fingerprint.
<br />
## Code Example
An example that use the Morgan algorithm to generate a fingerprint of the molecular formula 'Cc1ccc(cc1)S(=O)(=O)N'.
*Write a same pipeline with explicit inputs/outputs name specifications:*
```python
from towhee.dc2 import pipe, ops, DataCollection
p = (
pipe.input('smiles')
.map('smiles', 'fingerprint', ops.molecular_fingerprinting.rdkit())
.output('smiles', 'fingerprint')
)
DataCollection(p('Cc1ccc(cc1)S(=O)(=O)N')).show()
```
<img src="./result2.png" height="75px"/>
<br />
## Factory Constructor
Create the operator via the following factory method:
***molecular_fingerprinting.rdkit( algorithm: str = 'morgan', size: int = 2048)***
**Parameters:**
***algorithm:*** *str*
Which algorithm to use for fingerprinting, including 'morgan', 'daylight', 'ap', 'maccs', defaluts to 'morgan', and there is the [list of available fingerprints](https://www.rdkit.org/docs/GettingStartedInPython.html#list-of-available-fingerprints).
***size:*** *int*
The bit vector size just for morgan and daylight algorithm, defaults to 2048.
<br />
## Interface
An molecular fingerprinting operator takes a SMILES as input.
It uses the RDKit specified by algorithm name to generate a SMILES fingerprint.
**Parameters:**
***smiles:*** *str*
A Simplified Molecular Input Line Entry Specification (SMILES).
**Returns:** *bytes*
The molecular fingerprint.
# More Resources
- [DNA Sequence Classification based on Milvus - Zilliz blog](https://zilliz.com/blog/dna-sequence-classification-based-on-milvus): Use Milvus, an open-source vector database, to recognize gene families of DNA sequences. Less space but higher accuracy.
- [Accelerating New Drug Discovery - Zilliz blog](https://zilliz.com/blog/molecular-structure-similarity-with-milvus): How to run molecular structure similarity analysis in Milvus
- [Introduction to Milvus Vector Database - Zilliz blog](https://zilliz.com/learn/introduction-to-milvus-vector-database): Zilliz tells the story about building the world's very first open-source vector database, Milvus. The start, the pivotal, and the future.
- [The 2024 Playbook: Top Use Cases for Vector Search - Zilliz blog](https://zilliz.com/learn/top-use-cases-for-vector-search): An exploration of vector search technologies and their most popular use cases.
- [Emerging Trends in Vector Database Research and Development - Zilliz blog](https://zilliz.com/blog/emerging-trends-in-vector-database-research-and-development): This post discusses the development and anticipated future of vector databases from both technical and practical perspectives, focusing on cost-efficiency and business requirements.