skchem.io package

Submodules

skchem.io.sdf module

# skchem.io.sdf

Defining input and output operations for sdf files.

skchem.io.sdf.read_sdf(sdf, error_bad_mol=False, warn_bad_mol=True, nmols=None, skipmols=None, skipfooter=None, read_props=True, mol_props=False, *args, **kwargs)[source]

Read an sdf file into a pd.DataFrame.

The function wraps the RDKit ForwardSDMolSupplier object.

Parameters:
  • sdf (str or file-like) – The location of data to load as a file path, or a file-like object.
  • error_bad_mol (bool) – Whether an error should be raised if a molecule fails to parse. Default is False.
  • warn_bad_mol (bool) – Whether a warning should be output if a molecule fails to parse. Default is True.
  • nmols (int) – The number of molecules to read. If None, read all molecules. Default is None.
  • skipmols (int) – The number of molecules to skip at start. Default is 0.
  • skipfooter (int) – The number of molecules to skip from the end. Default is 0.
  • read_props (bool) – Whether to read the properties into the data frame. Default is True.
  • mol_props (bool) – Whether to keep properties in the molecule dictionary after they are extracted to the DataFrame. Default is False.
  • kwargs (args,) – Arguments will be passed to RDKit ForwardSDMolSupplier.
Returns:

The loaded data frame, with Mols supplied in the structure field.

Return type:

pandas.DataFrame

See also

rdkit.Chem.SDForwardMolSupplier skchem.read_smiles

skchem.io.sdf.write_sdf(data, sdf, write_cols=True, index_as_name=True, mol_props=False, *args, **kwargs)[source]

Write an sdf file from a dataframe.

Parameters:
  • data (pandas.Series or pandas.DataFrame) – Pandas data structure with a structure column containing compounds to serialize.
  • sdf (str or file-like) – A file path or file-like object specifying where to write the compound data.
  • write_cols (bool) – Whether columns should be written as props. Default True.
  • index_as_name (bool) – Whether to use index as the header, or the molecule’s name. Default is True.
  • mol_props (bool) – Whether to write properties in the Mol dictionary in addition to fields in the frame.
Warn:
This function will change the names of the compounds if the index_as_name argument is True, and will delete all properties in the molecule dictionary if mol_props is False.

skchem.io.smiles module

# skchem.io.smiles

Defining input and output operations for smiles files.

skchem.io.smiles.read_smiles(smiles_file, smiles_column=0, name_column=None, delimiter='\t', title_line=False, error_bad_mol=False, warn_bad_mol=True, drop_bad_mol=True, *args, **kwargs)[source]

Read a smiles file into a pandas dataframe.

The class wraps the pandas read_csv function.

smiles_file (str, file-like):
Location of data to load, specified as a string or passed directly as a file-like object. URLs may also be used, see the pandas.read_csv documentation.
smiles_column (int):
The column index at which SMILES are provided. Defaults to 0.
name_column (int):
The column index at which compound names are provided, for use as the index in the DataFrame. If None, use the default index. Defaults to None.
delimiter (str):
The delimiter used. Defaults to t.
title_line (bool):
Whether a title line is provided, to use as column titles. Defaults to False.
error_bad_mol (bool):
Whether an error should be raised when a molecule fails to parse. Defaults to False.
warn_bad_mol (bool):
Whether a warning should be raised when a molecule fails to parse. Defaults to True.
drop_bad_mol (bool):
If true, drop any column with smiles that failed to parse. Otherwise, the field is None. Defaults to True.
args, kwargs:
Arguments will be passed to pandas read_csv arguments.
Returns:The loaded data frame, with Mols supplied in the structure field.
Return type:pandas.DataFrame

See also

pandas.read_csv skchem.Mol.from_smiles skchem.io.sdf

skchem.io.smiles.write_smiles(data, smiles_path)[source]

Write a dataframe to a smiles file.

Parameters:
  • data (pd.Series or pd.DataFrame) – The dataframe to write.
  • smiles_path (str) – The path to write the dataframe to.

Module contents

skchem.io

Module defining input and output methods in scikit-chem.

skchem.io.read_sdf(sdf, error_bad_mol=False, warn_bad_mol=True, nmols=None, skipmols=None, skipfooter=None, read_props=True, mol_props=False, *args, **kwargs)[source]

Read an sdf file into a pd.DataFrame.

The function wraps the RDKit ForwardSDMolSupplier object.

Parameters:
  • sdf (str or file-like) – The location of data to load as a file path, or a file-like object.
  • error_bad_mol (bool) – Whether an error should be raised if a molecule fails to parse. Default is False.
  • warn_bad_mol (bool) – Whether a warning should be output if a molecule fails to parse. Default is True.
  • nmols (int) – The number of molecules to read. If None, read all molecules. Default is None.
  • skipmols (int) – The number of molecules to skip at start. Default is 0.
  • skipfooter (int) – The number of molecules to skip from the end. Default is 0.
  • read_props (bool) – Whether to read the properties into the data frame. Default is True.
  • mol_props (bool) – Whether to keep properties in the molecule dictionary after they are extracted to the DataFrame. Default is False.
  • kwargs (args,) – Arguments will be passed to RDKit ForwardSDMolSupplier.
Returns:

The loaded data frame, with Mols supplied in the structure field.

Return type:

pandas.DataFrame

See also

rdkit.Chem.SDForwardMolSupplier skchem.read_smiles

skchem.io.write_sdf(data, sdf, write_cols=True, index_as_name=True, mol_props=False, *args, **kwargs)[source]

Write an sdf file from a dataframe.

Parameters:
  • data (pandas.Series or pandas.DataFrame) – Pandas data structure with a structure column containing compounds to serialize.
  • sdf (str or file-like) – A file path or file-like object specifying where to write the compound data.
  • write_cols (bool) – Whether columns should be written as props. Default True.
  • index_as_name (bool) – Whether to use index as the header, or the molecule’s name. Default is True.
  • mol_props (bool) – Whether to write properties in the Mol dictionary in addition to fields in the frame.
Warn:
This function will change the names of the compounds if the index_as_name argument is True, and will delete all properties in the molecule dictionary if mol_props is False.
skchem.io.read_smiles(smiles_file, smiles_column=0, name_column=None, delimiter='\t', title_line=False, error_bad_mol=False, warn_bad_mol=True, drop_bad_mol=True, *args, **kwargs)[source]

Read a smiles file into a pandas dataframe.

The class wraps the pandas read_csv function.

smiles_file (str, file-like):
Location of data to load, specified as a string or passed directly as a file-like object. URLs may also be used, see the pandas.read_csv documentation.
smiles_column (int):
The column index at which SMILES are provided. Defaults to 0.
name_column (int):
The column index at which compound names are provided, for use as the index in the DataFrame. If None, use the default index. Defaults to None.
delimiter (str):
The delimiter used. Defaults to t.
title_line (bool):
Whether a title line is provided, to use as column titles. Defaults to False.
error_bad_mol (bool):
Whether an error should be raised when a molecule fails to parse. Defaults to False.
warn_bad_mol (bool):
Whether a warning should be raised when a molecule fails to parse. Defaults to True.
drop_bad_mol (bool):
If true, drop any column with smiles that failed to parse. Otherwise, the field is None. Defaults to True.
args, kwargs:
Arguments will be passed to pandas read_csv arguments.
Returns:The loaded data frame, with Mols supplied in the structure field.
Return type:pandas.DataFrame

See also

pandas.read_csv skchem.Mol.from_smiles skchem.io.sdf

skchem.io.write_smiles(data, smiles_path)[source]

Write a dataframe to a smiles file.

Parameters:
  • data (pd.Series or pd.DataFrame) – The dataframe to write.
  • smiles_path (str) – The path to write the dataframe to.
skchem.io.read_config(conf)[source]

Deserialize an object from a config dict.

Parameters:conf (dict) – The config dict to deseriailize.
Returns:object

Note

config is different from params, in that it specifies the class. The params dict is a subdict in config.

skchem.io.write_config(obj)[source]

Serialize an object to a config dict.

skchem.io.read_yaml(conf)[source]

Deserialize an object from a yaml file, filename or str.

Parameters:yaml (str or filelike) – The yaml file to deserialize.
Returns:object
skchem.io.write_yaml(obj, target=None)[source]

Serialize a scikit-chem object to yaml.

skchem.io.read_json(conf)[source]

Deserialize an object from a json file, filename or str.

Parameters:json (str or filelike) – The json file to deserialize.
Returns:object
skchem.io.write_json(obj, target=None)[source]

Serialize a scikit-chem object as json.