skchem.data.datasets package

Submodules

skchem.data.datasets.base module

class skchem.data.datasets.base.Dataset(**kwargs)[source]

Bases: fuel.datasets.hdf5.H5PYDataset

Abstract base class providing an interface to the skchem data format.

classmethod available_sets()[source]
classmethod available_sources()[source]
classmethod download(output_directory=None, download_directory=None)[source]

Download the dataset and convert it.

Parameters:
  • output_directory (str) – The directory to save the data to. Defaults to the first directory in the fuel data path.
  • download_directory (str) – The directory to save the raw files to. Defaults to a temporary directory.
Returns:

The path of the downloaded and processed dataset.

Return type:

str

classmethod load_data(sets=(), sources=())[source]

Load a set of sources.

Parameters:
  • sets (tuple[str]) – The sets to return data for.
  • sources – The sources to return data for.

Example

(X_train, y_train), (X_test, y_test) = Dataset.load_data(sets=(‘train’, ‘test’), sources=(‘X’, ‘y’))

classmethod load_set(set_name, sources=())[source]

Load the sources for a single set.

Parameters:
  • set_name (str) – The set name.
  • sources (tuple[str]) – The sources to return data for.
Returns:

tuple[np.array]

The requested sources for the requested set.

classmethod read_frame(key, *args, **kwargs)[source]

Load a set of features from the dataset as a pandas object.

Parameters:key (str) –

The HDF5 key for required data. Typically, this will be one of

  • structure: for the raw molecules
  • smiles: for the smiles
  • features/{feat_name}: for the features
  • targets/{targ_name}: for the targets
Returns:
pd.Series or pd.DataFrame or pd.Panel
The data as a dataframe.

skchem.data.datasets.bradley_open_mp module

class skchem.data.datasets.bradley_open_mp.BradleyOpenMP(**kwargs)[source]

Bases: skchem.data.datasets.base.Dataset

converter

alias of BradleyOpenMPConverter

downloader

alias of BradleyOpenMPDownloader

filename = 'bradley_open_mp.h5'

skchem.data.datasets.bursi_ames module

class skchem.data.datasets.bursi_ames.BursiAmes(**kwargs)[source]

Bases: skchem.data.datasets.base.Dataset

converter

alias of BursiAmesConverter

downloader

alias of BursiAmesDownloader

filename = 'bursi_ames.h5'

skchem.data.datasets.diversity_set module

# file title

Description

class skchem.data.datasets.diversity_set.Diversity(**kwargs)[source]

Bases: skchem.data.datasets.base.Dataset

Example dataset, the NCI DTP Diversity Set III.

converter

alias of DiversityConverter

downloader

alias of DiversityDownloader

filename = 'diversity.h5'

skchem.data.datasets.muller_ames module

class skchem.data.datasets.muller_ames.MullerAmes(**kwargs)[source]

Bases: skchem.data.datasets.base.Dataset

converter

alias of MullerAmesConverter

downloader

alias of MullerAmesDownloader

filename = 'muller_ames.h5'

skchem.data.datasets.nmrshiftdb2 module

class skchem.data.datasets.nmrshiftdb2.NMRShiftDB2(**kwargs)[source]

Bases: skchem.data.datasets.base.Dataset

converter

alias of NMRShiftDB2Converter

downloader

alias of NMRShiftDB2Downloader

filename = 'nmrshiftdb2.h5'

skchem.data.datasets.physprop module

class skchem.data.datasets.physprop.PhysProp(**kwargs)[source]

Bases: skchem.data.datasets.base.Dataset

converter

alias of PhysPropConverter

downloader

alias of PhysPropDownloader

filename = 'physprop.h5'

skchem.data.datasets.tox21 module

class skchem.data.datasets.tox21.Tox21(**kwargs)[source]

Bases: skchem.data.datasets.base.Dataset

converter

alias of Tox21Converter

downloader

alias of Tox21Downloader

filename = 'tox21.h5'

Module contents

## skchem.data.datasets

Module defining skchem datasets.

class skchem.data.datasets.Diversity(**kwargs)[source]

Bases: skchem.data.datasets.base.Dataset

Example dataset, the NCI DTP Diversity Set III.

converter

alias of DiversityConverter

downloader

alias of DiversityDownloader

filename = 'diversity.h5'
class skchem.data.datasets.BursiAmes(**kwargs)[source]

Bases: skchem.data.datasets.base.Dataset

converter

alias of BursiAmesConverter

downloader

alias of BursiAmesDownloader

filename = 'bursi_ames.h5'
class skchem.data.datasets.MullerAmes(**kwargs)[source]

Bases: skchem.data.datasets.base.Dataset

converter

alias of MullerAmesConverter

downloader

alias of MullerAmesDownloader

filename = 'muller_ames.h5'
class skchem.data.datasets.PhysProp(**kwargs)[source]

Bases: skchem.data.datasets.base.Dataset

converter

alias of PhysPropConverter

downloader

alias of PhysPropDownloader

filename = 'physprop.h5'
class skchem.data.datasets.BradleyOpenMP(**kwargs)[source]

Bases: skchem.data.datasets.base.Dataset

converter

alias of BradleyOpenMPConverter

downloader

alias of BradleyOpenMPDownloader

filename = 'bradley_open_mp.h5'
class skchem.data.datasets.NMRShiftDB2(**kwargs)[source]

Bases: skchem.data.datasets.base.Dataset

converter

alias of NMRShiftDB2Converter

downloader

alias of NMRShiftDB2Downloader

filename = 'nmrshiftdb2.h5'
class skchem.data.datasets.Tox21(**kwargs)[source]

Bases: skchem.data.datasets.base.Dataset

converter

alias of Tox21Converter

downloader

alias of Tox21Downloader

filename = 'tox21.h5'
class skchem.data.datasets.ChEMBL(**kwargs)[source]

Bases: skchem.data.datasets.base.Dataset

converter

alias of ChEMBLConverter

downloader

alias of ChEMBLDownloader

filename = 'chembl.h5'