copied
Readme
Files and versions
Updated 4 months ago
molecular-fingerprinting
Molecular Fingerprinting
author: shiyu
Desription
Molecular Fingerprinting encodes a Simplified Molecular Input Line Entry Specification (SMILES) as a fingerprint. The fingerprint can represent elements, atom pairs, or functional groups, etc., and are often used for substructure searches and similarity searches in drug discovery.
This operator uses RDKit to generate the molecular fingerprint.
Code Example
An example that use the Morgan algorithm to generate a fingerprint of the molecular formula 'Cc1ccc(cc1)S(=O)(=O)N'.
Write a same pipeline with explicit inputs/outputs name specifications:
from towhee.dc2 import pipe, ops, DataCollection
p = (
pipe.input('smiles')
.map('smiles', 'fingerprint', ops.molecular_fingerprinting.rdkit())
.output('smiles', 'fingerprint')
)
DataCollection(p('Cc1ccc(cc1)S(=O)(=O)N')).show()
Factory Constructor
Create the operator via the following factory method:
molecular_fingerprinting.rdkit( algorithm: str = 'morgan', size: int = 2048)
Parameters:
algorithm: str
Which algorithm to use for fingerprinting, including 'morgan', 'daylight', 'ap', 'maccs', defaluts to 'morgan', and there is the list of available fingerprints.
size: int
The bit vector size just for morgan and daylight algorithm, defaults to 2048.
Interface
An molecular fingerprinting operator takes a SMILES as input. It uses the RDKit specified by algorithm name to generate a SMILES fingerprint.
Parameters:
smiles: str
A Simplified Molecular Input Line Entry Specification (SMILES).
Returns: bytes
The molecular fingerprint.
More Resources
- DNA Sequence Classification based on Milvus - Zilliz blog: Use Milvus, an open-source vector database, to recognize gene families of DNA sequences. Less space but higher accuracy.
- Accelerating New Drug Discovery - Zilliz blog: How to run molecular structure similarity analysis in Milvus
- Introduction to Milvus Vector Database - Zilliz blog: Zilliz tells the story about building the world's very first open-source vector database, Milvus. The start, the pivotal, and the future.
- The 2024 Playbook: Top Use Cases for Vector Search - Zilliz blog: An exploration of vector search technologies and their most popular use cases.
- Emerging Trends in Vector Database Research and Development - Zilliz blog: This post discusses the development and anticipated future of vector databases from both technical and practical perspectives, focusing on cost-efficiency and business requirements.
Jael Gu
afce8260b1
| 8 Commits | ||
---|---|---|---|
.gitattributes |
1.1 KiB
|
3 years ago | |
README.md |
3.1 KiB
|
4 months ago | |
__init__.py |
668 B
|
3 years ago | |
rdkit.py |
2.2 KiB
|
3 years ago | |
requirements.txt |
11 B
|
3 years ago | |
result1.png |
12 KiB
|
3 years ago | |
result2.png |
25 KiB
|
3 years ago |