scikit-chem is first and formost a wrapper around rdkit to make
it more Pythonic, and more intuitive to a user familiar with other
libraries in the Scientific Python Stack. The package implements a core
Mol
class, physically representing a molecule. It is a direct
subclass of the rdkit.Mol
class:
In [1]:
import rdkit.Chem
issubclass(skchem.Mol, rdkit.Chem.Mol)
Out[1]:
True
As such, it has all the methods available that an rdkit.Mol
class
has, for example:
In [2]:
hasattr(skchem.Mol, 'GetAromaticAtoms')
Out[2]:
True
Constructors are provided as classmethods on the skchem.Mol
object,
in the same fashion as pandas objects are constructed. For example,
to make a pandas.DataFrame
from a dictionary, you call:
In [3]:
df = pd.DataFrame.from_dict({'a': [10, 20], 'b': [20, 40]}); df
Out[3]:
a | b | |
---|---|---|
0 | 10 | 20 |
1 | 20 | 40 |
Analogously, to make a skchem.Mol
from a smiles string, you call;
In [4]:
mol = skchem.Mol.from_smiles('CC(=O)Cl'); mol
Out[4]:
<Mol name="None" formula="C2H3ClO" at 0x11dc8f490>
The available methods are:
In [5]:
[method for method in skchem.Mol.__dict__ if method.startswith('from_')]
Out[5]:
['from_tplblock',
'from_molblock',
'from_molfile',
'from_binary',
'from_tplfile',
'from_mol2block',
'from_pdbfile',
'from_pdbblock',
'from_smiles',
'from_smarts',
'from_mol2file',
'from_inchi']
When a molecule fails to parse, a ValueError
is raised:
In [6]:
skchem.Mol.from_smiles('NOTSMILES')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-99e03ef822e7> in <module>()
----> 1 skchem.Mol.from_smiles('NOTSMILES')
/Users/rich/projects/scikit-chem/skchem/core/mol.py in constructor(_, in_arg, name, *args, **kwargs)
419 m = getattr(rdkit.Chem, 'MolFrom' + constructor_name)(in_arg, *args, **kwargs)
420 if m is None:
--> 421 raise ValueError('Failed to parse molecule, {}'.format(in_arg))
422 m = Mol.from_super(m)
423 m.name = name
ValueError: Failed to parse molecule, NOTSMILES
Atoms and bonds are accessible as a property:
In [7]:
mol.atoms
Out[7]:
<AtomView values="['C', 'C', 'O', 'Cl']" at 0x11dc9ac88>
In [8]:
mol.bonds
Out[8]:
<BondView values="['C-C', 'C=O', 'C-Cl']" at 0x11dc9abe0>
These are iterable:
In [9]:
[a for a in mol.atoms]
Out[9]:
[<Atom element="C" at 0x11dcfe8a0>,
<Atom element="C" at 0x11dcfe9e0>,
<Atom element="O" at 0x11dcfed00>,
<Atom element="Cl" at 0x11dcfedf0>]
subscriptable:
In [10]:
mol.atoms[3]
Out[10]:
<Atom element="Cl" at 0x11dcfef30>
sliceable:
In [11]:
mol.atoms[:3]
Out[11]:
[<Atom element="C" at 0x11dcfebc0>,
<Atom element="C" at 0x11de690d0>,
<Atom element="O" at 0x11de693f0>]
indexable:
In [19]:
mol.atoms[[1, 3]]
Out[19]:
[<Atom element="C" at 0x11de74760>, <Atom element="Cl" at 0x11de7fe40>]
and maskable:
In [18]:
mol.atoms[[True, False, True, False]]
Out[18]:
[<Atom element="C" at 0x11de74ad0>, <Atom element="O" at 0x11de74f30>]
Properties on the rdkit objects are accessible through the props
property:
In [11]:
mol.props['is_reactive'] = 'very!'
In [12]:
mol.atoms[1].props['kind'] = 'electrophilic'
mol.atoms[3].props['leaving group'] = 1
mol.bonds[2].props['bond strength'] = 'strong'
These are using the rdkit
property functionality internally:
In [13]:
mol.GetProp('is_reactive')
Out[13]:
'very!'
Note
RDKit properties can only store str
s, int
s and float
s. Any other type will be coerced to a string before storage.
The properties of atoms and bonds are accessible molecule wide:
In [14]:
mol.atoms.props
Out[14]:
<MolPropertyView values="{'leaving group': [nan, nan, nan, 1.0], 'kind': [None, 'electrophilic', None, None]}" at 0x11daf8390>
In [15]:
mol.bonds.props
Out[15]:
<MolPropertyView values="{'bond strength': [None, None, 'strong']}" at 0x11daf80f0>
These can be exported as pandas objects:
In [16]:
mol.atoms.props.to_frame()
Out[16]:
kind | leaving group | |
---|---|---|
atom_idx | ||
0 | None | NaN |
1 | electrophilic | NaN |
2 | None | NaN |
3 | None | 1.0 |
Molecules are exported and/or serialized in a very similar way in which
they are constructed, again with an inspiration from pandas
.
In [17]:
df.to_csv()
Out[17]:
',a,b\n0,10,20\n1,20,40\n'
In [18]:
mol.to_inchi_key()
Out[18]:
'WETWJCDKMRHUPV-UHFFFAOYSA-N'
The total available formats are:
In [19]:
[method for method in skchem.Mol.__dict__ if method.startswith('to_')]
Out[19]:
['to_inchi',
'to_json',
'to_smiles',
'to_smarts',
'to_inchi_key',
'to_binary',
'to_dict',
'to_molblock',
'to_tplfile',
'to_formula',
'to_molfile',
'to_pdbblock',
'to_tplblock']