{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Molecules in **scikit-chem**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**scikit-chem** is first and formost a wrapper around **rdkit** to make it more *Pythonic*, and more intuitive to a user familiar with other libraries in the Scientific Python Stack. The package implements a core `Mol` class, physically representing a molecule. It is a direct subclass of the `rdkit.Mol` class:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import rdkit.Chem\n",
"issubclass(skchem.Mol, rdkit.Chem.Mol)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As such, it has all the methods available that an `rdkit.Mol` class has, for example:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hasattr(skchem.Mol, 'GetAromaticAtoms')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initializing new molecules"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Constructors are provided as classmethods on the `skchem.Mol` object, in the same fashion as **pandas** objects are constructed. For example, to make a `pandas.DataFrame` from a dictionary, you call:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
" \n",
" | \n",
" a | \n",
" b | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 10 | \n",
" 20 | \n",
"
\n",
" \n",
" 1 | \n",
" 20 | \n",
" 40 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" a b\n",
"0 10 20\n",
"1 20 40"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame.from_dict({'a': [10, 20], 'b': [20, 40]}); df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Analogously, to make a `skchem.Mol` from a smiles string, you call;"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mol = skchem.Mol.from_smiles('CC(=O)Cl'); mol"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The available methods are:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['from_tplblock',\n",
" 'from_molblock',\n",
" 'from_molfile',\n",
" 'from_binary',\n",
" 'from_tplfile',\n",
" 'from_mol2block',\n",
" 'from_pdbfile',\n",
" 'from_pdbblock',\n",
" 'from_smiles',\n",
" 'from_smarts',\n",
" 'from_mol2file',\n",
" 'from_inchi']"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[method for method in skchem.Mol.__dict__ if method.startswith('from_')]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When a molecule fails to parse, a `ValueError` is raised:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "ValueError",
"evalue": "Failed to parse molecule, NOTSMILES",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mskchem\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mMol\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfrom_smiles\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'NOTSMILES'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m/Users/rich/projects/scikit-chem/skchem/core/mol.py\u001b[0m in \u001b[0;36mconstructor\u001b[0;34m(_, in_arg, name, *args, **kwargs)\u001b[0m\n\u001b[1;32m 419\u001b[0m \u001b[0mm\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrdkit\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mChem\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'MolFrom'\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mconstructor_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0min_arg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 420\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mm\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 421\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Failed to parse molecule, {}'\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0min_arg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 422\u001b[0m \u001b[0mm\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mMol\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfrom_super\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mm\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 423\u001b[0m \u001b[0mm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mValueError\u001b[0m: Failed to parse molecule, NOTSMILES"
]
}
],
"source": [
"skchem.Mol.from_smiles('NOTSMILES')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Molecule accessors"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Atoms** and **bonds** are accessible as a property:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mol.atoms"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mol.bonds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These are iterable:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[,\n",
" ,\n",
" ,\n",
" ]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[a for a in mol.atoms]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"subscriptable:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mol.atoms[3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"sliceable:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[,\n",
" ,\n",
" ]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mol.atoms[:3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"indexable:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[, ]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mol.atoms[[1, 3]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"and maskable:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[, ]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mol.atoms[[True, False, True, False]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Properties** on the rdkit objects are accessible through the `props` property:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mol.props['is_reactive'] = 'very!'"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"mol.atoms[1].props['kind'] = 'electrophilic'\n",
"mol.atoms[3].props['leaving group'] = 1\n",
"mol.bonds[2].props['bond strength'] = 'strong'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These are using the `rdkit` property functionality internally:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'very!'"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mol.GetProp('is_reactive')"
]
},
{
"cell_type": "raw",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
".. note::\n",
" RDKit properties can only store ``str`` s, ``int`` s and ``float`` s. Any other type will be coerced to a string before storage."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The properties of atoms and bonds are accessible molecule wide:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mol.atoms.props"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mol.bonds.props"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These can be exported as pandas objects:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" kind | \n",
" leaving group | \n",
"
\n",
" \n",
" atom_idx | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" None | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" electrophilic | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" None | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" None | \n",
" 1.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" kind leaving group\n",
"atom_idx \n",
"0 None NaN\n",
"1 electrophilic NaN\n",
"2 None NaN\n",
"3 None 1.0"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mol.atoms.props.to_frame()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Export and Serialization"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Molecules are exported and/or serialized in a very similar way in which they are constructed, again with an inspiration from `pandas`."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"',a,b\\n0,10,20\\n1,20,40\\n'"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.to_csv()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'WETWJCDKMRHUPV-UHFFFAOYSA-N'"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mol.to_inchi_key()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The total available formats are:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['to_inchi',\n",
" 'to_json',\n",
" 'to_smiles',\n",
" 'to_smarts',\n",
" 'to_inchi_key',\n",
" 'to_binary',\n",
" 'to_dict',\n",
" 'to_molblock',\n",
" 'to_tplfile',\n",
" 'to_formula',\n",
" 'to_molfile',\n",
" 'to_pdbblock',\n",
" 'to_tplblock']"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[method for method in skchem.Mol.__dict__ if method.startswith('to_')]"
]
}
],
"metadata": {
"celltoolbar": "Raw Cell Format",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
},
"widgets": {
"state": {},
"version": "1.1.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}