skchem.descriptors package

Submodules

skchem.descriptors.atom module

## skchem.descriptors.atom

Module specifying atom based descriptor generators.

class skchem.descriptors.atom.AtomFeaturizer(features='all', **kwargs)[source]

Bases: skchem.base.AtomTransformer, skchem.base.Featurizer

features
minor_axis
name
class skchem.descriptors.atom.DistanceTransformer(max_atoms=100, **kwargs)[source]

Bases: skchem.base.AtomTransformer, skchem.base.Featurizer

Base class implementing Distance Matrix transformers.

Concrete classes inheriting from this should implement _transform_mol.

minor_axis
transform(mols)[source]
class skchem.descriptors.atom.GraphDistanceTransformer(max_atoms=100, **kwargs)[source]

Bases: skchem.descriptors.atom.DistanceTransformer

Transformer class for generating Graph distance matrices.

name()[source]
class skchem.descriptors.atom.SpacialDistanceTransformer(max_atoms=100, **kwargs)[source]

Bases: skchem.descriptors.atom.DistanceTransformer

Transformer class for generating 3D distance matrices.

name()[source]
skchem.descriptors.atom.atomic_mass(a)[source]

Atomic mass of atom

skchem.descriptors.atom.atomic_number(a)[source]

Atomic number of atom

skchem.descriptors.atom.crippen_log_p_contrib(a)[source]

Hacky way of getting logP contribution.

skchem.descriptors.atom.crippen_molar_refractivity_contrib(a)[source]

Hacky way of getting molar refractivity contribution.

skchem.descriptors.atom.electronegativity(a)[source]
skchem.descriptors.atom.element(a)[source]

Return the element

skchem.descriptors.atom.explicit_valence(a)[source]

Explicit valence of atom

skchem.descriptors.atom.first_ionization(a)[source]
skchem.descriptors.atom.formal_charge(a)[source]

Formal charge of atom

skchem.descriptors.atom.gasteiger_charge(a, force_calc=False)[source]

Hacky way of getting gasteiger charge

skchem.descriptors.atom.group(a)[source]
skchem.descriptors.atom.implicit_valence(a)[source]

Implicit valence of atom

skchem.descriptors.atom.is_aromatic(a)[source]

Boolean if atom is aromatic

skchem.descriptors.atom.is_element(a, symbol='C')[source]

Is the atom of a given element

skchem.descriptors.atom.is_h_acceptor(a)[source]

Is an H acceptor?

skchem.descriptors.atom.is_h_donor(a)[source]

Is an H donor?

skchem.descriptors.atom.is_hetero(a)[source]

Is a heteroatom?

skchem.descriptors.atom.is_hybridized(a, hybrid_type=rdkit.Chem.rdchem.HybridizationType.SP3)[source]

Hybridized as type hybrid_type, default SP3

skchem.descriptors.atom.is_in_ring(a)[source]

Whether the atom is in a ring

skchem.descriptors.atom.labute_asa_contrib(a)[source]

Hacky way of getting accessible surface area contribution.

skchem.descriptors.atom.num_explicit_hydrogens(a)[source]

Number of explicit hydrodgens

skchem.descriptors.atom.num_hydrogens(a)[source]

Number of hydrogens

skchem.descriptors.atom.num_implicit_hydrogens(a)[source]

Number of implicit hydrogens

skchem.descriptors.atom.period(a)[source]
skchem.descriptors.atom.tpsa_contrib(a)[source]

Hacky way of getting total polar surface area contribution.

skchem.descriptors.atom.valence(a)[source]

returns the valence of the atom

skchem.descriptors.chemaxon module

## skchem.descriptors.atom

Module specifying atom based descriptor generators.

class skchem.descriptors.chemaxon.ChemAxonAtomFeaturizer(features='optimal', **kwargs)[source]

Bases: skchem.descriptors.chemaxon.ChemAxonBaseFeaturizer, skchem.base.AtomTransformer, skchem.base.BatchTransformer

minor_axis
name
class skchem.descriptors.chemaxon.ChemAxonBaseFeaturizer(features='optimal', **kwargs)[source]

Bases: skchem.base.CLIWrapper, skchem.base.Featurizer

features
install_hint = ' Install ChemAxon from https://www.chemaxon.com. It requires a license, which can be freely obtained\nfor academics. '
monitor_progress(filename)[source]
validate_install()[source]
class skchem.descriptors.chemaxon.ChemAxonFeaturizer(features='optimal', **kwargs)[source]

Bases: skchem.descriptors.chemaxon.ChemAxonBaseFeaturizer, skchem.base.BatchTransformer, skchem.base.Transformer

columns
name
class skchem.descriptors.chemaxon.ChemAxonNMRPredictor(features='optimal', **kwargs)[source]

Bases: skchem.descriptors.chemaxon.ChemAxonBaseFeaturizer, skchem.base.BatchTransformer, skchem.base.AtomTransformer

features
minor_axis
monitor_progress(filename)[source]
name()[source]
transform(inp)[source]

skchem.descriptors.fingerprints module

## skchem.descriptors.fingerprints

Fingerprinting classes and associated functions are defined.

class skchem.descriptors.fingerprints.AtomPairFeaturizer(min_length=1, max_length=30, n_feats=2048, as_bits=False, use_chirality=False, **kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

Atom Pair Fingerprints, implemented by RDKit.

columns
name
class skchem.descriptors.fingerprints.ConnectivityInvariantsFeaturizer(include_ring_membership=True, **kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

Connectivity invariants fingerprints

columns
name
class skchem.descriptors.fingerprints.ErGFeaturizer(atom_types=0, fuzz_increment=0.3, min_path=1, max_path=15, **kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

Extended Reduced Graph Fingerprints.

Implemented in RDKit.

columns
name
class skchem.descriptors.fingerprints.FeatureInvariantsFeaturizer(**kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

Feature invariants fingerprints.

columns
name
class skchem.descriptors.fingerprints.MACCSFeaturizer(**kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

MACCS Keys Fingerprints

columns
name
class skchem.descriptors.fingerprints.MorganFeaturizer(radius=2, n_feats=2048, as_bits=True, use_features=False, use_bond_types=True, use_chirality=False, **kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

Morgan fingerprints, implemented by RDKit.

Notes

Currently, folded bits are by far the fastest implementation.

Examples

>>> import skchem
>>> import pandas as pd
>>> pd.options.display.max_rows = pd.options.display.max_columns = 5
>>> mf = skchem.descriptors.MorganFeaturizer()
>>> m = skchem.Mol.from_smiles('CCC')

Can transform an individual molecule to yield a Series:

>>> mf.transform(m)
morgan_fp_idx
0       0
1       0
       ..
2046    0
2047    0
Name: MorganFeaturizer, dtype: uint8

Can transform a list of molecules to yield a DataFrame:

>>> mf.transform([m])
morgan_fp_idx  0     1     ...   2046  2047
0                 0     0  ...      0     0

[1 rows x 2048 columns]

Change the number of features the fingerprint is folded down to using n_feats.

>>> mf.n_feats = 1024
>>> mf.transform(m)
morgan_fp_idx
0       0
1       0
       ..
1022    0
1023    0
Name: MorganFeaturizer, dtype: uint8

Count fingerprints with as_bits = False

>>> mf.as_bits = False
>>> res = mf.transform(m); res[res > 0]
morgan_fp_idx
33     2
80     1
294    2
320    1
Name: MorganFeaturizer, dtype: int64

Pseudo-gradient with grad shows which atoms contributed to which feature.

>>> mf.grad(m)[res > 0]
atom_idx  0  1  2
features
33        1  0  1
80        0  1  0
294       1  2  1
320       1  1  1
columns
grad(mol)[source]

Calculate the pseudo gradient with respect to the atoms.

The pseudo gradient is the number of times the atom set that particular bit.

Parameters:mol (skchem.Mol) – The molecule for which to calculate the pseudo gradient.
Returns:Dataframe of pseudogradients, with columns corresponding to atoms, and rows corresponding to features of the fingerprint.
Return type:pandas.DataFrame
name
class skchem.descriptors.fingerprints.RDKFeaturizer(min_path=1, max_path=7, n_feats=2048, n_bits_per_hash=2, use_hs=True, target_density=0.0, min_size=128, branched_paths=True, use_bond_types=True, **kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

RDKit fingerprint

columns
name
class skchem.descriptors.fingerprints.TopologicalTorsionFeaturizer(target_size=4, n_feats=2048, as_bits=False, use_chirality=False, **kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

Topological Torsion fingerprints, implemented by RDKit.

columns
names

skchem.descriptors.moe module

## skchem.descriptors.moe

Module specifying moe descriptors.

class skchem.descriptors.moe.MOEDescriptorCalculator[source]

Bases: object

transform(obj)[source]

skchem.descriptors.physicochemical module

## skchem.descriptors.physicochemical

Physicochemical descriptors and associated functions are defined.

class skchem.descriptors.physicochemical.PhysicochemicalFeaturizer(features='all', **kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

Physicochemical descriptor generator using RDKit descriptor

columns
features
name

Module contents

## skchem.descriptors

A module concerned with calculating molecular descriptors.

class skchem.descriptors.PhysicochemicalFeaturizer(features='all', **kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

Physicochemical descriptor generator using RDKit descriptor

columns
features
name
class skchem.descriptors.AtomFeaturizer(features='all', **kwargs)[source]

Bases: skchem.base.AtomTransformer, skchem.base.Featurizer

features
minor_axis
name
class skchem.descriptors.AtomPairFeaturizer(min_length=1, max_length=30, n_feats=2048, as_bits=False, use_chirality=False, **kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

Atom Pair Fingerprints, implemented by RDKit.

columns
name
class skchem.descriptors.MorganFeaturizer(radius=2, n_feats=2048, as_bits=True, use_features=False, use_bond_types=True, use_chirality=False, **kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

Morgan fingerprints, implemented by RDKit.

Notes

Currently, folded bits are by far the fastest implementation.

Examples

>>> import skchem
>>> import pandas as pd
>>> pd.options.display.max_rows = pd.options.display.max_columns = 5
>>> mf = skchem.descriptors.MorganFeaturizer()
>>> m = skchem.Mol.from_smiles('CCC')

Can transform an individual molecule to yield a Series:

>>> mf.transform(m)
morgan_fp_idx
0       0
1       0
       ..
2046    0
2047    0
Name: MorganFeaturizer, dtype: uint8

Can transform a list of molecules to yield a DataFrame:

>>> mf.transform([m])
morgan_fp_idx  0     1     ...   2046  2047
0                 0     0  ...      0     0

[1 rows x 2048 columns]

Change the number of features the fingerprint is folded down to using n_feats.

>>> mf.n_feats = 1024
>>> mf.transform(m)
morgan_fp_idx
0       0
1       0
       ..
1022    0
1023    0
Name: MorganFeaturizer, dtype: uint8

Count fingerprints with as_bits = False

>>> mf.as_bits = False
>>> res = mf.transform(m); res[res > 0]
morgan_fp_idx
33     2
80     1
294    2
320    1
Name: MorganFeaturizer, dtype: int64

Pseudo-gradient with grad shows which atoms contributed to which feature.

>>> mf.grad(m)[res > 0]
atom_idx  0  1  2
features
33        1  0  1
80        0  1  0
294       1  2  1
320       1  1  1
columns
grad(mol)[source]

Calculate the pseudo gradient with respect to the atoms.

The pseudo gradient is the number of times the atom set that particular bit.

Parameters:mol (skchem.Mol) – The molecule for which to calculate the pseudo gradient.
Returns:Dataframe of pseudogradients, with columns corresponding to atoms, and rows corresponding to features of the fingerprint.
Return type:pandas.DataFrame
name
class skchem.descriptors.MACCSFeaturizer(**kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

MACCS Keys Fingerprints

columns
name
class skchem.descriptors.TopologicalTorsionFeaturizer(target_size=4, n_feats=2048, as_bits=False, use_chirality=False, **kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

Topological Torsion fingerprints, implemented by RDKit.

columns
names
class skchem.descriptors.RDKFeaturizer(min_path=1, max_path=7, n_feats=2048, n_bits_per_hash=2, use_hs=True, target_density=0.0, min_size=128, branched_paths=True, use_bond_types=True, **kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

RDKit fingerprint

columns
name
class skchem.descriptors.ErGFeaturizer(atom_types=0, fuzz_increment=0.3, min_path=1, max_path=15, **kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

Extended Reduced Graph Fingerprints.

Implemented in RDKit.

columns
name
class skchem.descriptors.ConnectivityInvariantsFeaturizer(include_ring_membership=True, **kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

Connectivity invariants fingerprints

columns
name
class skchem.descriptors.FeatureInvariantsFeaturizer(**kwargs)[source]

Bases: skchem.base.Transformer, skchem.base.Featurizer

Feature invariants fingerprints.

columns
name
class skchem.descriptors.ChemAxonNMRPredictor(features='optimal', **kwargs)[source]

Bases: skchem.descriptors.chemaxon.ChemAxonBaseFeaturizer, skchem.base.BatchTransformer, skchem.base.AtomTransformer

features
minor_axis
monitor_progress(filename)[source]
name()[source]
transform(inp)[source]
class skchem.descriptors.ChemAxonFeaturizer(features='optimal', **kwargs)[source]

Bases: skchem.descriptors.chemaxon.ChemAxonBaseFeaturizer, skchem.base.BatchTransformer, skchem.base.Transformer

columns
name
class skchem.descriptors.ChemAxonAtomFeaturizer(features='optimal', **kwargs)[source]

Bases: skchem.descriptors.chemaxon.ChemAxonBaseFeaturizer, skchem.base.AtomTransformer, skchem.base.BatchTransformer

minor_axis
name
class skchem.descriptors.GraphDistanceTransformer(max_atoms=100, **kwargs)[source]

Bases: skchem.descriptors.atom.DistanceTransformer

Transformer class for generating Graph distance matrices.

name()[source]
class skchem.descriptors.SpacialDistanceTransformer(max_atoms=100, **kwargs)[source]

Bases: skchem.descriptors.atom.DistanceTransformer

Transformer class for generating 3D distance matrices.

name()[source]