## skchem.descriptors.atom
Module specifying atom based descriptor generators.
skchem.descriptors.atom.
AtomFeaturizer
(features='all', **kwargs)[source]¶Bases: skchem.base.AtomTransformer
, skchem.base.Featurizer
features
¶minor_axis
¶name
¶skchem.descriptors.atom.
DistanceTransformer
(max_atoms=100, **kwargs)[source]¶Bases: skchem.base.AtomTransformer
, skchem.base.Featurizer
Base class implementing Distance Matrix transformers.
Concrete classes inheriting from this should implement _transform_mol.
minor_axis
¶skchem.descriptors.atom.
GraphDistanceTransformer
(max_atoms=100, **kwargs)[source]¶Bases: skchem.descriptors.atom.DistanceTransformer
Transformer class for generating Graph distance matrices.
skchem.descriptors.atom.
SpacialDistanceTransformer
(max_atoms=100, **kwargs)[source]¶Bases: skchem.descriptors.atom.DistanceTransformer
Transformer class for generating 3D distance matrices.
skchem.descriptors.atom.
crippen_molar_refractivity_contrib
(a)[source]¶Hacky way of getting molar refractivity contribution.
skchem.descriptors.atom.
gasteiger_charge
(a, force_calc=False)[source]¶Hacky way of getting gasteiger charge
skchem.descriptors.atom.
is_hybridized
(a, hybrid_type=rdkit.Chem.rdchem.HybridizationType.SP3)[source]¶Hybridized as type hybrid_type, default SP3
skchem.descriptors.atom.
labute_asa_contrib
(a)[source]¶Hacky way of getting accessible surface area contribution.
## skchem.descriptors.atom
Module specifying atom based descriptor generators.
skchem.descriptors.chemaxon.
ChemAxonAtomFeaturizer
(features='optimal', **kwargs)[source]¶Bases: skchem.descriptors.chemaxon.ChemAxonBaseFeaturizer
, skchem.base.AtomTransformer
, skchem.base.BatchTransformer
minor_axis
¶name
¶skchem.descriptors.chemaxon.
ChemAxonBaseFeaturizer
(features='optimal', **kwargs)[source]¶Bases: skchem.base.CLIWrapper
, skchem.base.Featurizer
features
¶install_hint
= ' Install ChemAxon from https://www.chemaxon.com. It requires a license, which can be freely obtained\nfor academics. '¶skchem.descriptors.chemaxon.
ChemAxonFeaturizer
(features='optimal', **kwargs)[source]¶Bases: skchem.descriptors.chemaxon.ChemAxonBaseFeaturizer
, skchem.base.BatchTransformer
, skchem.base.Transformer
columns
¶name
¶skchem.descriptors.chemaxon.
ChemAxonNMRPredictor
(features='optimal', **kwargs)[source]¶Bases: skchem.descriptors.chemaxon.ChemAxonBaseFeaturizer
, skchem.base.BatchTransformer
, skchem.base.AtomTransformer
features
¶minor_axis
¶## skchem.descriptors.fingerprints
Fingerprinting classes and associated functions are defined.
skchem.descriptors.fingerprints.
AtomPairFeaturizer
(min_length=1, max_length=30, n_feats=2048, as_bits=False, use_chirality=False, **kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
Atom Pair Fingerprints, implemented by RDKit.
columns
¶name
¶skchem.descriptors.fingerprints.
ConnectivityInvariantsFeaturizer
(include_ring_membership=True, **kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
Connectivity invariants fingerprints
columns
¶name
¶skchem.descriptors.fingerprints.
ErGFeaturizer
(atom_types=0, fuzz_increment=0.3, min_path=1, max_path=15, **kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
Extended Reduced Graph Fingerprints.
Implemented in RDKit.
columns
¶name
¶skchem.descriptors.fingerprints.
FeatureInvariantsFeaturizer
(**kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
Feature invariants fingerprints.
columns
¶name
¶skchem.descriptors.fingerprints.
MACCSFeaturizer
(**kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
MACCS Keys Fingerprints
columns
¶name
¶skchem.descriptors.fingerprints.
MorganFeaturizer
(radius=2, n_feats=2048, as_bits=True, use_features=False, use_bond_types=True, use_chirality=False, **kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
Morgan fingerprints, implemented by RDKit.
Notes
Currently, folded bits are by far the fastest implementation.
Examples
>>> import skchem
>>> import pandas as pd
>>> pd.options.display.max_rows = pd.options.display.max_columns = 5
>>> mf = skchem.descriptors.MorganFeaturizer()
>>> m = skchem.Mol.from_smiles('CCC')
Can transform an individual molecule to yield a Series:
>>> mf.transform(m)
morgan_fp_idx
0 0
1 0
..
2046 0
2047 0
Name: MorganFeaturizer, dtype: uint8
Can transform a list of molecules to yield a DataFrame:
>>> mf.transform([m])
morgan_fp_idx 0 1 ... 2046 2047
0 0 0 ... 0 0
[1 rows x 2048 columns]
Change the number of features the fingerprint is folded down to using n_feats.
>>> mf.n_feats = 1024
>>> mf.transform(m)
morgan_fp_idx
0 0
1 0
..
1022 0
1023 0
Name: MorganFeaturizer, dtype: uint8
Count fingerprints with as_bits = False
>>> mf.as_bits = False
>>> res = mf.transform(m); res[res > 0]
morgan_fp_idx
33 2
80 1
294 2
320 1
Name: MorganFeaturizer, dtype: int64
Pseudo-gradient with grad shows which atoms contributed to which feature.
>>> mf.grad(m)[res > 0]
atom_idx 0 1 2
features
33 1 0 1
80 0 1 0
294 1 2 1
320 1 1 1
columns
¶grad
(mol)[source]¶Calculate the pseudo gradient with respect to the atoms.
The pseudo gradient is the number of times the atom set that particular bit.
Parameters: | mol (skchem.Mol) – The molecule for which to calculate the pseudo gradient. |
---|---|
Returns: | Dataframe of pseudogradients, with columns corresponding to atoms, and rows corresponding to features of the fingerprint. |
Return type: | pandas.DataFrame |
name
¶skchem.descriptors.fingerprints.
RDKFeaturizer
(min_path=1, max_path=7, n_feats=2048, n_bits_per_hash=2, use_hs=True, target_density=0.0, min_size=128, branched_paths=True, use_bond_types=True, **kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
RDKit fingerprint
columns
¶name
¶skchem.descriptors.fingerprints.
TopologicalTorsionFeaturizer
(target_size=4, n_feats=2048, as_bits=False, use_chirality=False, **kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
Topological Torsion fingerprints, implemented by RDKit.
columns
¶names
¶## skchem.descriptors.physicochemical
Physicochemical descriptors and associated functions are defined.
skchem.descriptors.physicochemical.
PhysicochemicalFeaturizer
(features='all', **kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
Physicochemical descriptor generator using RDKit descriptor
columns
¶features
¶name
¶## skchem.descriptors
A module concerned with calculating molecular descriptors.
skchem.descriptors.
PhysicochemicalFeaturizer
(features='all', **kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
Physicochemical descriptor generator using RDKit descriptor
columns
¶features
¶name
¶skchem.descriptors.
AtomFeaturizer
(features='all', **kwargs)[source]¶Bases: skchem.base.AtomTransformer
, skchem.base.Featurizer
features
¶minor_axis
¶name
¶skchem.descriptors.
AtomPairFeaturizer
(min_length=1, max_length=30, n_feats=2048, as_bits=False, use_chirality=False, **kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
Atom Pair Fingerprints, implemented by RDKit.
columns
¶name
¶skchem.descriptors.
MorganFeaturizer
(radius=2, n_feats=2048, as_bits=True, use_features=False, use_bond_types=True, use_chirality=False, **kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
Morgan fingerprints, implemented by RDKit.
Notes
Currently, folded bits are by far the fastest implementation.
Examples
>>> import skchem
>>> import pandas as pd
>>> pd.options.display.max_rows = pd.options.display.max_columns = 5
>>> mf = skchem.descriptors.MorganFeaturizer()
>>> m = skchem.Mol.from_smiles('CCC')
Can transform an individual molecule to yield a Series:
>>> mf.transform(m)
morgan_fp_idx
0 0
1 0
..
2046 0
2047 0
Name: MorganFeaturizer, dtype: uint8
Can transform a list of molecules to yield a DataFrame:
>>> mf.transform([m])
morgan_fp_idx 0 1 ... 2046 2047
0 0 0 ... 0 0
[1 rows x 2048 columns]
Change the number of features the fingerprint is folded down to using n_feats.
>>> mf.n_feats = 1024
>>> mf.transform(m)
morgan_fp_idx
0 0
1 0
..
1022 0
1023 0
Name: MorganFeaturizer, dtype: uint8
Count fingerprints with as_bits = False
>>> mf.as_bits = False
>>> res = mf.transform(m); res[res > 0]
morgan_fp_idx
33 2
80 1
294 2
320 1
Name: MorganFeaturizer, dtype: int64
Pseudo-gradient with grad shows which atoms contributed to which feature.
>>> mf.grad(m)[res > 0]
atom_idx 0 1 2
features
33 1 0 1
80 0 1 0
294 1 2 1
320 1 1 1
columns
¶grad
(mol)[source]¶Calculate the pseudo gradient with respect to the atoms.
The pseudo gradient is the number of times the atom set that particular bit.
Parameters: | mol (skchem.Mol) – The molecule for which to calculate the pseudo gradient. |
---|---|
Returns: | Dataframe of pseudogradients, with columns corresponding to atoms, and rows corresponding to features of the fingerprint. |
Return type: | pandas.DataFrame |
name
¶skchem.descriptors.
MACCSFeaturizer
(**kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
MACCS Keys Fingerprints
columns
¶name
¶skchem.descriptors.
TopologicalTorsionFeaturizer
(target_size=4, n_feats=2048, as_bits=False, use_chirality=False, **kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
Topological Torsion fingerprints, implemented by RDKit.
columns
¶names
¶skchem.descriptors.
RDKFeaturizer
(min_path=1, max_path=7, n_feats=2048, n_bits_per_hash=2, use_hs=True, target_density=0.0, min_size=128, branched_paths=True, use_bond_types=True, **kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
RDKit fingerprint
columns
¶name
¶skchem.descriptors.
ErGFeaturizer
(atom_types=0, fuzz_increment=0.3, min_path=1, max_path=15, **kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
Extended Reduced Graph Fingerprints.
Implemented in RDKit.
columns
¶name
¶skchem.descriptors.
ConnectivityInvariantsFeaturizer
(include_ring_membership=True, **kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
Connectivity invariants fingerprints
columns
¶name
¶skchem.descriptors.
FeatureInvariantsFeaturizer
(**kwargs)[source]¶Bases: skchem.base.Transformer
, skchem.base.Featurizer
Feature invariants fingerprints.
columns
¶name
¶skchem.descriptors.
ChemAxonNMRPredictor
(features='optimal', **kwargs)[source]¶Bases: skchem.descriptors.chemaxon.ChemAxonBaseFeaturizer
, skchem.base.BatchTransformer
, skchem.base.AtomTransformer
features
¶minor_axis
¶skchem.descriptors.
ChemAxonFeaturizer
(features='optimal', **kwargs)[source]¶Bases: skchem.descriptors.chemaxon.ChemAxonBaseFeaturizer
, skchem.base.BatchTransformer
, skchem.base.Transformer
columns
¶name
¶skchem.descriptors.
ChemAxonAtomFeaturizer
(features='optimal', **kwargs)[source]¶Bases: skchem.descriptors.chemaxon.ChemAxonBaseFeaturizer
, skchem.base.AtomTransformer
, skchem.base.BatchTransformer
minor_axis
¶name
¶skchem.descriptors.
GraphDistanceTransformer
(max_atoms=100, **kwargs)[source]¶Bases: skchem.descriptors.atom.DistanceTransformer
Transformer class for generating Graph distance matrices.
skchem.descriptors.
SpacialDistanceTransformer
(max_atoms=100, **kwargs)[source]¶Bases: skchem.descriptors.atom.DistanceTransformer
Transformer class for generating 3D distance matrices.