{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Molecules in **scikit-chem**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**scikit-chem** is first and formost a wrapper around **rdkit** to make it more *Pythonic*, and more intuitive to a user familiar with other libraries in the Scientific Python Stack. The package implements a core `Mol` class, physically representing a molecule. It is a direct subclass of the `rdkit.Mol` class:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import rdkit.Chem\n", "issubclass(skchem.Mol, rdkit.Chem.Mol)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As such, it has all the methods available that an `rdkit.Mol` class has, for example:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hasattr(skchem.Mol, 'GetAromaticAtoms')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initializing new molecules" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Constructors are provided as classmethods on the `skchem.Mol` object, in the same fashion as **pandas** objects are constructed. For example, to make a `pandas.DataFrame` from a dictionary, you call:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
01020
12040
\n", "
" ], "text/plain": [ " a b\n", "0 10 20\n", "1 20 40" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame.from_dict({'a': [10, 20], 'b': [20, 40]}); df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Analogously, to make a `skchem.Mol` from a smiles string, you call;" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol = skchem.Mol.from_smiles('CC(=O)Cl'); mol" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The available methods are:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "['from_tplblock',\n", " 'from_molblock',\n", " 'from_molfile',\n", " 'from_binary',\n", " 'from_tplfile',\n", " 'from_mol2block',\n", " 'from_pdbfile',\n", " 'from_pdbblock',\n", " 'from_smiles',\n", " 'from_smarts',\n", " 'from_mol2file',\n", " 'from_inchi']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[method for method in skchem.Mol.__dict__ if method.startswith('from_')]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When a molecule fails to parse, a `ValueError` is raised:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "ename": "ValueError", "evalue": "Failed to parse molecule, NOTSMILES", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mskchem\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mMol\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfrom_smiles\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'NOTSMILES'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m/Users/rich/projects/scikit-chem/skchem/core/mol.py\u001b[0m in \u001b[0;36mconstructor\u001b[0;34m(_, in_arg, name, *args, **kwargs)\u001b[0m\n\u001b[1;32m 419\u001b[0m \u001b[0mm\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrdkit\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mChem\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'MolFrom'\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mconstructor_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0min_arg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 420\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mm\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 421\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Failed to parse molecule, {}'\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0min_arg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 422\u001b[0m \u001b[0mm\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mMol\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfrom_super\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mm\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 423\u001b[0m \u001b[0mm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mValueError\u001b[0m: Failed to parse molecule, NOTSMILES" ] } ], "source": [ "skchem.Mol.from_smiles('NOTSMILES')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Molecule accessors" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Atoms** and **bonds** are accessible as a property:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.atoms" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.bonds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These are iterable:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[,\n", " ,\n", " ,\n", " ]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[a for a in mol.atoms]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "subscriptable:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.atoms[3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "sliceable:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[,\n", " ,\n", " ]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.atoms[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "indexable:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[, ]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.atoms[[1, 3]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and maskable:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[, ]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.atoms[[True, False, True, False]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Properties** on the rdkit objects are accessible through the `props` property:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "mol.props['is_reactive'] = 'very!'" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [], "source": [ "mol.atoms[1].props['kind'] = 'electrophilic'\n", "mol.atoms[3].props['leaving group'] = 1\n", "mol.bonds[2].props['bond strength'] = 'strong'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These are using the `rdkit` property functionality internally:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'very!'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.GetProp('is_reactive')" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. note::\n", " RDKit properties can only store ``str`` s, ``int`` s and ``float`` s. Any other type will be coerced to a string before storage." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The properties of atoms and bonds are accessible molecule wide:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.atoms.props" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.bonds.props" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These can be exported as pandas objects:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
kindleaving group
atom_idx
0NoneNaN
1electrophilicNaN
2NoneNaN
3None1.0
\n", "
" ], "text/plain": [ " kind leaving group\n", "atom_idx \n", "0 None NaN\n", "1 electrophilic NaN\n", "2 None NaN\n", "3 None 1.0" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.atoms.props.to_frame()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Export and Serialization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Molecules are exported and/or serialized in a very similar way in which they are constructed, again with an inspiration from `pandas`." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "',a,b\\n0,10,20\\n1,20,40\\n'" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.to_csv()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'WETWJCDKMRHUPV-UHFFFAOYSA-N'" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol.to_inchi_key()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The total available formats are:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "['to_inchi',\n", " 'to_json',\n", " 'to_smiles',\n", " 'to_smarts',\n", " 'to_inchi_key',\n", " 'to_binary',\n", " 'to_dict',\n", " 'to_molblock',\n", " 'to_tplfile',\n", " 'to_formula',\n", " 'to_molfile',\n", " 'to_pdbblock',\n", " 'to_tplblock']" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[method for method in skchem.Mol.__dict__ if method.startswith('to_')]" ] } ], "metadata": { "celltoolbar": "Raw Cell Format", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" }, "widgets": { "state": {}, "version": "1.1.2" } }, "nbformat": 4, "nbformat_minor": 0 }