scikit-chem expands on the scikit-learn Pipeline
object to
support filtering. It is initialized using a list of Transformer
objects.
In [10]:
pipeline = skchem.pipeline.Pipeline([
skchem.standardizers.ChemAxonStandardizer(keep_failed=True),
skchem.forcefields.UFF(),
skchem.filters.OrganicFilter(),
skchem.descriptors.MorganFeaturizer()])
The pipeline will apply each in turn to objects, using the the highest
priority function that each object implements, according to the order
transform_filter
> filter
> transform
.
For example, our pipeline can transform sodium acetate all the way to fingerprints:
In [11]:
mol = skchem.Mol.from_smiles('CC(=O)[O-].[Na+]')
In [4]:
pipeline.transform_filter(mol)
Out[4]:
morgan_fp_idx
0 0
1 0
2 0
3 0
4 0
..
2043 0
2044 0
2045 0
2046 0
2047 0
Name: MorganFeaturizer, dtype: uint8
It also works on collections of molecules:
In [12]:
mols = skchem.read_smiles('https://archive.org/download/scikit-chem_example_files/example.smi', name_column=1); mols
Out[12]:
batch
ethane <Mol: CC>
propane <Mol: CCC>
benzene <Mol: c1ccccc1>
sodium acetate <Mol: CC(=O)[O-].[Na+]>
serine <Mol: NC(CO)C(=O)O>
Name: structure, dtype: object
In [16]:
pipeline.transform_filter(mols)
ChemAxonStandardizer: 100% (5 of 5) |##########################################| Elapsed Time: 0:00:04 Time: 0:00:04
UFF: 100% (5 of 5) |###########################################################| Elapsed Time: 0:00:00 Time: 0:00:00
OrganicFilter: 100% (5 of 5) |#################################################| Elapsed Time: 0:00:00 Time: 0:00:00
MorganFeaturizer: 100% (5 of 5) |##############################################| Elapsed Time: 0:00:00 Time: 0:00:00
Out[16]:
morgan_fp_idx | 0 | 1 | 2 | 3 | 4 | ... | 2043 | 2044 | 2045 | 2046 | 2047 |
---|---|---|---|---|---|---|---|---|---|---|---|
batch | |||||||||||
ethane | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 |
propane | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 |
benzene | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 |
sodium acetate | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 |
serine | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 1 | 0 | 0 |
5 rows × 2048 columns
In [ ]: