logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

1.8 KiB

Molecular Fingerprinting

author: shiyu


Desription

Molecular Fingerprinting encodes a Simplified Molecular Input Line Entry Specification (SMILES) as a fingerprint. The fingerprint can represent elements, atom pairs, or functional groups, etc., and are often used for substructure searches and similarity searches in drug discovery.

This operator uses RDKit to generate the molecular fingerprint.


Code Example

An example that use the Morgan algorithm to generate a fingerprint of the molecular formula 'Cc1ccc(cc1)S(=O)(=O)N'.

Write a same pipeline with explicit inputs/outputs name specifications:

from towhee.dc2 import pipe, ops, DataCollection

p = (
    pipe.input('smiles')
        .map('smiles', 'fingerprint', ops.molecular_fingerprinting.rdkit())
        .output('smiles', 'fingerprint')
)

DataCollection(p('Cc1ccc(cc1)S(=O)(=O)N')).show()


Factory Constructor

Create the operator via the following factory method:

molecular_fingerprinting.rdkit( algorithm: str = 'morgan', size: int = 2048)

Parameters:

algorithm: str

Which algorithm to use for fingerprinting, including 'morgan', 'daylight', 'ap', 'maccs', defaluts to 'morgan', and there is the list of available fingerprints.

size: int

The bit vector size just for morgan and daylight algorithm, defaults to 2048.


Interface

An molecular fingerprinting operator takes a SMILES as input. It uses the RDKit specified by algorithm name to generate a SMILES fingerprint.

Parameters:

smiles: str

A Simplified Molecular Input Line Entry Specification (SMILES).

Returns: bytes

The molecular fingerprint.

1.8 KiB

Molecular Fingerprinting

author: shiyu


Desription

Molecular Fingerprinting encodes a Simplified Molecular Input Line Entry Specification (SMILES) as a fingerprint. The fingerprint can represent elements, atom pairs, or functional groups, etc., and are often used for substructure searches and similarity searches in drug discovery.

This operator uses RDKit to generate the molecular fingerprint.


Code Example

An example that use the Morgan algorithm to generate a fingerprint of the molecular formula 'Cc1ccc(cc1)S(=O)(=O)N'.

Write a same pipeline with explicit inputs/outputs name specifications:

from towhee.dc2 import pipe, ops, DataCollection

p = (
    pipe.input('smiles')
        .map('smiles', 'fingerprint', ops.molecular_fingerprinting.rdkit())
        .output('smiles', 'fingerprint')
)

DataCollection(p('Cc1ccc(cc1)S(=O)(=O)N')).show()


Factory Constructor

Create the operator via the following factory method:

molecular_fingerprinting.rdkit( algorithm: str = 'morgan', size: int = 2048)

Parameters:

algorithm: str

Which algorithm to use for fingerprinting, including 'morgan', 'daylight', 'ap', 'maccs', defaluts to 'morgan', and there is the list of available fingerprints.

size: int

The bit vector size just for morgan and daylight algorithm, defaults to 2048.


Interface

An molecular fingerprinting operator takes a SMILES as input. It uses the RDKit specified by algorithm name to generate a SMILES fingerprint.

Parameters:

smiles: str

A Simplified Molecular Input Line Entry Specification (SMILES).

Returns: bytes

The molecular fingerprint.