scikit-chem
expands on the scikit-learn Pipeline
object to
support filtering. It is initialized using a list of Transformer
objects.
In [10]:
pipeline = skchem.pipeline.Pipeline([
skchem.standardizers.ChemAxonStandardizer(keep_failed=True),
skchem.forcefields.UFF(),
skchem.filters.OrganicFilter(),
skchem.descriptors.MorganFeaturizer()])
The pipeline will apply each in turn to objects, using the the highest
priority function that each object implements, according to the order
transform_filter
> filter
> transform
.
For example, our pipeline can transform sodium acetate all the way to fingerprints:
In [11]:
mol = skchem.Mol.from_smiles('CC(=O)[O-].[Na+]')
In [4]:
pipeline.transform_filter(mol)
Out[4]:
morgan_fp_idx
0 0
1 0
2 0
3 0
4 0
..
2043 0
2044 0
2045 0
2046 0
2047 0
Name: MorganFeaturizer, dtype: uint8
It also works on collections of molecules:
In [8]:
mols = skchem.read_smiles('https://archive.org/download/scikit-chem_example_files/example.smi', name_column=1).squeeze(); mols
Out[8]:
1
ethane <Mol: CC>
propane <Mol: CCC>
benzene <Mol: c1ccccc1>
sodium acetate <Mol: CC(=O)[O-].[Na+]>
serine <Mol: NC(CO)C(=O)O>
Name: structure, dtype: object
In [9]:
pipeline.transform_filter(mols)
ChemAxonStandardizer: 100% (5 of 5) |##########################################| Elapsed Time: 0:00:02 Time: 0:00:02
UFF: 100% (5 of 5) |###########################################################| Elapsed Time: 0:00:00 Time: 0:00:00
OrganicFilter: 100% (5 of 5) |#################################################| Elapsed Time: 0:00:00 Time: 0:00:00
MorganFeaturizer: 100% (5 of 5) |##############################################| Elapsed Time: 0:00:00 Time: 0:00:00
Out[9]:
morgan_fp_idx | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 2038 | 2039 | 2040 | 2041 | 2042 | 2043 | 2044 | 2045 | 2046 | 2047 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | |||||||||||||||||||||
ethane | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
propane | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
benzene | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
sodium acetate | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
serine | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
5 rows × 2048 columns