{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Input/Output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas objects are the main data structures used for collections of molecules. **scikit-chem** provides convenience functions to load objects into `pandas.DataFrame`s from common file formats in cheminformatics.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reading files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The **scikit-chem** functionality is modelled after the `pandas` API.  To load an csv file using `pandas` you would call:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5.1</td>\n",
       "      <td>3.5</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>Iris-setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.9</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>Iris-setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4.7</td>\n",
       "      <td>3.2</td>\n",
       "      <td>1.3</td>\n",
       "      <td>0.2</td>\n",
       "      <td>Iris-setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4.6</td>\n",
       "      <td>3.1</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.2</td>\n",
       "      <td>Iris-setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5.0</td>\n",
       "      <td>3.6</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>Iris-setosa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     0    1    2    3            4\n",
       "0  5.1  3.5  1.4  0.2  Iris-setosa\n",
       "1  4.9  3.0  1.4  0.2  Iris-setosa\n",
       "2  4.7  3.2  1.3  0.2  Iris-setosa\n",
       "3  4.6  3.1  1.5  0.2  Iris-setosa\n",
       "4  5.0  3.6  1.4  0.2  Iris-setosa"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('https://archive.org/download/scikit-chem_example_files/iris.csv', \n",
    "                 header=None); df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Analogously with **scikit-chem**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "smi = skchem.read_smiles('https://archive.org/download/scikit-chem_example_files/example.smi')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Currently available:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['read_sdf', 'read_smiles']"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[method for method in skchem.io.__dict__ if method.startswith('read_')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**scikit-chem** also adds convenience methods onto `pandas.DataFrame` objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>structure</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>&lt;Mol: CC&gt;</td>\n",
       "      <td>ethane</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>&lt;Mol: CCC&gt;</td>\n",
       "      <td>propane</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>&lt;Mol: c1ccccc1&gt;</td>\n",
       "      <td>benzene</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>&lt;Mol: CC(=O)[O-].[Na+]&gt;</td>\n",
       "      <td>sodium acetate</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>&lt;Mol: NC(CO)C(=O)O&gt;</td>\n",
       "      <td>serine</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 structure               1\n",
       "0                <Mol: CC>          ethane\n",
       "1               <Mol: CCC>         propane\n",
       "2          <Mol: c1ccccc1>         benzene\n",
       "3  <Mol: CC(=O)[O-].[Na+]>  sodium acetate\n",
       "4      <Mol: NC(CO)C(=O)O>          serine"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame.from_smiles('https://archive.org/download/scikit-chem_example_files/example.smi')"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. note:: \n",
    "\n",
    "    Currently, only ``read_smiles`` can read files over a network connection.  This functionality is planned to be added in future for all file types."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Writing files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, this is analogous to `pandas`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ",0,1,2,3,4\n",
      "0,5.1,3.5,1.4,0.2,Iris-setosa\n",
      "1,4.9,3.0,1.4,0.2,Iris-setosa\n",
      "2,4.7,3.2,1.3,0.2,Iris-setosa\n",
      "3,4.6,3.1,1.5,0.2,Iris-setosa\n",
      "4,5.0,3.6,1.4,0.2,Iris-setosa\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from io import StringIO\n",
    "sio = StringIO()\n",
    "df.to_csv(sio)\n",
    "sio.seek(0)\n",
    "print(sio.read())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n",
      "     RDKit          \n",
      "\n",
      "  2  1  0  0  0  0  0  0  0  0999 V2000\n",
      "    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n",
      "    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n",
      "  1  2  1  0\n",
      "M  END\n",
      ">  <1>  (1) \n",
      "ethane\n",
      "\n",
      "$$$$\n",
      "1\n",
      "     RDKit          \n",
      "\n",
      "  3  2  0  0  0  0  0  0  0  0999 V2000\n",
      "    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n",
      "    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n",
      "    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n",
      "  1  2  1  0\n",
      "  2  3  1  0\n",
      "M  END\n",
      ">  <1>  (2) \n",
      "propane\n",
      "\n",
      "$$$$\n",
      "\n"
     ]
    }
   ],
   "source": [
    "sio = StringIO()\n",
    "smi.iloc[:2].to_sdf(sio) # don't write too many!\n",
    "sio.seek(0)\n",
    "print(sio.read())"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. todo::\n",
    "    \n",
    "    Document reading and writing to local files by filenames, to file-like objects, and from remote objects by URI"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Raw Cell Format",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  },
  "widgets": {
   "state": {},
   "version": "1.1.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}