# skchem.filters
Chemical filters are defined.
skchem.filters.base.
BaseFilter
(agg='any', **kwargs)[source]¶Bases: skchem.base.BaseTransformer
The base Filter class.
agg
¶callable – The aggregate function to use. String aliases for ‘any’, ‘not any’, ‘all’, ‘not all’ are available.
columns
¶pd.Index – The column index to use.
skchem.filters.base.
Filter
(func=None, agg='any', n_jobs=1, verbose=True)[source]¶Bases: skchem.filters.base.BaseFilter
, skchem.base.Transformer
Filter base class.
Examples
>>> import skchem
Initialize the filter with a function: >>> is_named = skchem.filters.Filter(lambda m: m.name is not None)
Filter results can be found with transform: >>> ethane = skchem.Mol.from_smiles(‘CC’, name=’ethane’) >>> is_named.transform(ethane) True
>>> anonymous = skchem.Mol.from_smiles('c1ccccc1')
>>> is_named.transform(anonymous)
False
Can take a series or dataframe: >>> mols = pd.Series({‘anonymous’: anonymous, ‘ethane’: ethane}) >>> is_named.transform(mols) anonymous False ethane True Name: Filter, dtype: bool
Using filter will drop out molecules that fail the test: >>> is_named.filter(mols) ethane <Mol: CC> dtype: object
Only failed are retained with the neg keyword argument: >>> is_named.filter(mols, neg=True) anonymous <Mol: c1ccccc1> dtype: object
skchem.filters.base.
TransformFilter
(agg='any', **kwargs)[source]¶Bases: skchem.filters.base.BaseFilter
Transform Filter object.
Implements transform_filter, which allows a transform, then a filter step returning the transformed values that are not False, None or np.nan.
# skchem.filters.simple
Simple filters for compounds.
skchem.filters.simple.
AtomNumberFilter
(above=3, below=60, include_hydrogens=False, n_jobs=1, verbose=True)[source]¶Bases: skchem.filters.base.Filter
Filter whether the number of atoms in a Mol falls in a defined interval.
above <= n_atoms < below
Examples
>>> import skchem
>>> data = [
... skchem.Mol.from_smiles('CC', name='ethane'),
... skchem.Mol.from_smiles('CCCC', name='butane'),
... skchem.Mol.from_smiles('NC(C)C(=O)O', name='alanine'),
... skchem.Mol.from_smiles('C12C=CC(C=C2)C=C1', name='barrelene')
... ]
>>> af = skchem.filters.AtomNumberFilter(above=3, below=7)
>>> af.transform(data)
ethane False
butane True
alanine True
barrelene False
Name: num_atoms_in_range, dtype: bool
>>> af.filter(data)
butane <Mol: CCCC>
alanine <Mol: CC(N)C(=O)O>
Name: structure, dtype: object
>>> af = skchem.filters.AtomNumberFilter(above=5, below=15, include_hydrogens=True)
>>> af.transform(data)
ethane True
butane True
alanine True
barrelene False
Name: num_atoms_in_range, dtype: bool
columns
¶skchem.filters.simple.
ElementFilter
(elements=None, as_bits=False, agg='any', n_jobs=1, verbose=True)[source]¶Bases: skchem.filters.base.Filter
Filter by elements.
Examples
Basic usage on molecules:
>>> import skchem
>>> hal_f = skchem.filters.ElementFilter(['F', 'Cl', 'Br', 'I'])
Molecules with one of the atoms transform to True.
>>> m1 = skchem.Mol.from_smiles('ClC(Cl)Cl', name='chloroform')
>>> hal_f.transform(m1)
True
Molecules with none of the atoms transform to False.
>>> m2 = skchem.Mol.from_smiles('CC', name='ethane')
>>> hal_f.transform(m2)
False
Can see the atom breakdown by passing agg == False: >>> hal_f.transform(m1, agg=False) has_element F 0 Cl 3 Br 0 I 0 Name: ElementFilter, dtype: int64
Can transform series.
>>> ms = [m1, m2]
>>> hal_f.transform(ms)
chloroform True
ethane False
dtype: bool
>>> hal_f.transform(ms, agg=False)
has_element F Cl Br I
chloroform 0 3 0 0
ethane 0 0 0 0
Can also filter series:
>>> hal_f.filter(ms)
chloroform <Mol: ClC(Cl)Cl>
Name: structure, dtype: object
>>> hal_f.filter(ms, neg=True)
ethane <Mol: CC>
Name: structure, dtype: object
columns
¶elements
¶skchem.filters.simple.
MassFilter
(above=3, below=900, n_jobs=1, verbose=True)[source]¶Bases: skchem.filters.base.Filter
Filter whether the molecular weight of a molecule is outside a range.
above <= mass < below
Examples
>>> import skchem
>>> data = [
... skchem.Mol.from_smiles('CC', name='ethane'),
... skchem.Mol.from_smiles('CCCC', name='butane'),
... skchem.Mol.from_smiles('NC(C)C(=O)O', name='alanine'),
... skchem.Mol.from_smiles('C12C=CC(C=C2)C=C1', name='barrelene')
... ]
>>> mf = skchem.filters.MassFilter(above=31, below=100)
>>> mf.transform(data)
ethane False
butane True
alanine True
barrelene False
Name: mass_in_range, dtype: bool
>>> mf.filter(data)
butane <Mol: CCCC>
alanine <Mol: CC(N)C(=O)O>
Name: structure, dtype: object
columns
¶skchem.filters.simple.
OrganicFilter
(n_jobs=1, verbose=True)[source]¶Bases: skchem.filters.simple.ElementFilter
Whether a molecule is organic.
For the purpose of this function, an organic molecule is defined as having atoms with elements only in the set H, B, C, N, O, F, P, S, Cl, Br, I.
Examples
Basic usage as a function on molecules: >>> import skchem >>> of = skchem.filters.OrganicFilter() >>> benzene = skchem.Mol.from_smiles(‘c1ccccc1’, name=’benzene’)
>>> of.transform(benzene)
True
>>> ferrocene = skchem.Mol.from_smiles('[cH-]1cccc1.[cH-]1cccc1.[Fe+2]',
... name='ferrocene')
>>> of.transform(ferrocene)
False
More useful on collections:
>>> sa = skchem.Mol.from_smiles('CC(=O)[O-].[Na+]', name='sodium acetate')
>>> norbornane = skchem.Mol.from_smiles('C12CCC(C2)CC1', name='norbornane')
>>> data = [benzene, ferrocene, norbornane, sa]
>>> of.transform(data)
benzene True
ferrocene False
norbornane True
sodium acetate False
dtype: bool
>>> of.filter(data)
benzene <Mol: c1ccccc1>
norbornane <Mol: C1CC2CCC1C2>
Name: structure, dtype: object
>>> of.filter(data, neg=True)
ferrocene <Mol: [Fe+2].c1cc[cH-]c1.c1cc[cH-]c1>
sodium acetate <Mol: CC(=O)[O-].[Na+]>
Name: structure, dtype: object
skchem.filters.simple.
mass
(mol, above=10, below=900)[source]¶Whether a the molecular weight of a molecule is lower than a threshold.
above <= mass < below
Parameters: |
|
---|---|
Returns: | Whether the mass of the molecule is lower than the threshold. |
Return type: | bool |
Examples
Basic usage as a function on molecules:
>>> import skchem
>>> m = skchem.Mol.from_smiles('c1ccccc1') # benzene has M_r = 78.
>>> skchem.filters.mass(m, above=70)
True
>>> skchem.filters.mass(m, above=80)
False
>>> skchem.filters.mass(m, below=80)
True
>>> skchem.filters.mass(m, below=70)
False
>>> skchem.filters.mass(m, above=70, below=80)
True
skchem.filters.simple.
n_atoms
(mol, above=2, below=75, include_hydrogens=False)[source]¶Whether the number of atoms in a molecule falls in a defined interval.
above <= n_atoms < below
Parameters: |
|
---|---|
Returns: | Whether the molecule has more atoms than the threshold. |
Return type: | bool |
Examples
Basic usage as a function on molecules:
>>> import skchem
>>> m = skchem.Mol.from_smiles('c1ccccc1') # benzene has 6 atoms.
Lower threshold:
>>> skchem.filters.n_atoms(m, above=3)
True
>>> skchem.filters.n_atoms(m, above=8)
False
Higher threshold:
>>> skchem.filters.n_atoms(m, below=8)
True
>>> skchem.filters.n_atoms(m, below=3)
False
Bounds work like Python slices - inclusive lower, exclusive upper:
>>> skchem.filters.n_atoms(m, above=6)
True
>>> skchem.filters.n_atoms(m, below=6)
False
Both can be used at once:
>>> skchem.filters.n_atoms(m, above=3, below=8)
True
Can include hydrogens:
>>> skchem.filters.n_atoms(m, above=3, below=8, include_hydrogens=True)
False
>>> skchem.filters.n_atoms(m, above=9, below=14, include_hydrogens=True)
True
# skchem.filters.smarts
Module defines SMARTS filters.
skchem.filters.smarts.
PAINSFilter
(n_jobs=1, verbose=True)[source]¶Bases: skchem.filters.smarts.SMARTSFilter
Whether a molecule passes the Pan Assay INterference (PAINS) filters.
These are supplied with RDKit, and were originally proposed by Baell et al.
_pains
¶pd.Series – a series of smarts template molecules.
References
[The original paper](http://dx.doi.org/10.1021/jm901137j)
Examples
Basic usage as a function on molecules:
>>> import skchem
>>> benzene = skchem.Mol.from_smiles('c1ccccc1', name='benzene')
>>> pf = skchem.filters.PAINSFilter()
>>> pf.transform(benzene)
True
>>> catechol = skchem.Mol.from_smiles('Oc1c(O)cccc1', name='catechol')
>>> pf.transform(catechol)
False
>>> res = pf.transform(catechol, agg=False)
>>> res[res]
names
catechol_A(92) True
Name: PAINSFilter, dtype: bool
More useful in combination with pandas DataFrames:
>>> data = [benzene, catechol]
>>> pf.transform(data)
benzene True
catechol False
dtype: bool
>>> pf.filter(data)
benzene <Mol: c1ccccc1>
Name: structure, dtype: object
skchem.filters.smarts.
SMARTSFilter
(smarts, agg='any', merge_hs=True, n_jobs=1, verbose=True)[source]¶Bases: skchem.filters.base.Filter
Filter a molecule based on smarts.
Examples
>>> import skchem
>>> data = [
... skchem.Mol.from_smiles('CC', name='ethane'),
... skchem.Mol.from_smiles('c1ccccc1', name='benzene'),
... skchem.Mol.from_smiles('c1ccccc1-c2c(C=O)ccnc2', name='bg')
... ]
>>> f = skchem.filters.SMARTSFilter({'benzene': 'c1ccccc1',
... 'pyridine': 'c1ccccn1',
... 'acetyl': 'C=O'})
>>> f.transform(data, agg=False)
acetyl benzene pyridine
ethane False False False
benzene False True False
bg True True True
>>> f.transform(data)
ethane False
benzene True
bg True
dtype: bool
>>> f.filter(data)
benzene <Mol: c1ccccc1>
bg <Mol: O=Cc1ccncc1-c1ccccc1>
Name: structure, dtype: object
>>> f.agg = all
>>> f.filter(data)
bg <Mol: O=Cc1ccncc1-c1ccccc1>
Name: structure, dtype: object
columns
¶# skchem.filters.stereo
Stereo filters for scikit-chem.
skchem.filters.stereo.
ChiralFilter
(check_meso=True, n_jobs=1, verbose=True)[source]¶Bases: skchem.filters.base.Filter
Filter chiral compounds.
Examples
>>> import skchem
>>> cf = skchem.filters.ChiralFilter()
>>> ms = [
... skchem.Mol.from_smiles('F[C@@H](F)[C@H](F)F', name='achiral'),
... skchem.Mol.from_smiles('F[C@@H](Br)[C@H](Br)F', name='chiral'),
... skchem.Mol.from_smiles('F[C@H](Br)[C@H](Br)F', name='meso'),
... skchem.Mol.from_smiles('FC(Br)C(Br)F', name='racemic')
... ]
>>> cf.transform(ms)
achiral False
chiral True
meso False
racemic False
Name: is_chiral, dtype: bool
columns
¶is_meso
(mol)[source]¶Determines whether the molecule is meso.
Meso compounds have chiral centres, but has a mirror plane allowing superposition.
Examples
>>> import skchem
>>> cf = skchem.filters.ChiralFilter()
>>> meso = skchem.Mol.from_smiles('F[C@H](Br)[C@H](Br)F')
>>> cf.is_meso(meso)
True
>>> non_meso = skchem.Mol.from_smiles('F[C@H](Br)[C@@H](Br)F')
>>> cf.is_meso(non_meso)
False
# skchem.filters
Molecule filters for scikit-chem.
skchem.filters.
ChiralFilter
(check_meso=True, n_jobs=1, verbose=True)[source]¶Bases: skchem.filters.base.Filter
Filter chiral compounds.
Examples
>>> import skchem
>>> cf = skchem.filters.ChiralFilter()
>>> ms = [
... skchem.Mol.from_smiles('F[C@@H](F)[C@H](F)F', name='achiral'),
... skchem.Mol.from_smiles('F[C@@H](Br)[C@H](Br)F', name='chiral'),
... skchem.Mol.from_smiles('F[C@H](Br)[C@H](Br)F', name='meso'),
... skchem.Mol.from_smiles('FC(Br)C(Br)F', name='racemic')
... ]
>>> cf.transform(ms)
achiral False
chiral True
meso False
racemic False
Name: is_chiral, dtype: bool
columns
¶is_meso
(mol)[source]¶Determines whether the molecule is meso.
Meso compounds have chiral centres, but has a mirror plane allowing superposition.
Examples
>>> import skchem
>>> cf = skchem.filters.ChiralFilter()
>>> meso = skchem.Mol.from_smiles('F[C@H](Br)[C@H](Br)F')
>>> cf.is_meso(meso)
True
>>> non_meso = skchem.Mol.from_smiles('F[C@H](Br)[C@@H](Br)F')
>>> cf.is_meso(non_meso)
False
skchem.filters.
SMARTSFilter
(smarts, agg='any', merge_hs=True, n_jobs=1, verbose=True)[source]¶Bases: skchem.filters.base.Filter
Filter a molecule based on smarts.
Examples
>>> import skchem
>>> data = [
... skchem.Mol.from_smiles('CC', name='ethane'),
... skchem.Mol.from_smiles('c1ccccc1', name='benzene'),
... skchem.Mol.from_smiles('c1ccccc1-c2c(C=O)ccnc2', name='bg')
... ]
>>> f = skchem.filters.SMARTSFilter({'benzene': 'c1ccccc1',
... 'pyridine': 'c1ccccn1',
... 'acetyl': 'C=O'})
>>> f.transform(data, agg=False)
acetyl benzene pyridine
ethane False False False
benzene False True False
bg True True True
>>> f.transform(data)
ethane False
benzene True
bg True
dtype: bool
>>> f.filter(data)
benzene <Mol: c1ccccc1>
bg <Mol: O=Cc1ccncc1-c1ccccc1>
Name: structure, dtype: object
>>> f.agg = all
>>> f.filter(data)
bg <Mol: O=Cc1ccncc1-c1ccccc1>
Name: structure, dtype: object
columns
¶skchem.filters.
PAINSFilter
(n_jobs=1, verbose=True)[source]¶Bases: skchem.filters.smarts.SMARTSFilter
Whether a molecule passes the Pan Assay INterference (PAINS) filters.
These are supplied with RDKit, and were originally proposed by Baell et al.
_pains
¶pd.Series – a series of smarts template molecules.
References
[The original paper](http://dx.doi.org/10.1021/jm901137j)
Examples
Basic usage as a function on molecules:
>>> import skchem
>>> benzene = skchem.Mol.from_smiles('c1ccccc1', name='benzene')
>>> pf = skchem.filters.PAINSFilter()
>>> pf.transform(benzene)
True
>>> catechol = skchem.Mol.from_smiles('Oc1c(O)cccc1', name='catechol')
>>> pf.transform(catechol)
False
>>> res = pf.transform(catechol, agg=False)
>>> res[res]
names
catechol_A(92) True
Name: PAINSFilter, dtype: bool
More useful in combination with pandas DataFrames:
>>> data = [benzene, catechol]
>>> pf.transform(data)
benzene True
catechol False
dtype: bool
>>> pf.filter(data)
benzene <Mol: c1ccccc1>
Name: structure, dtype: object
skchem.filters.
ElementFilter
(elements=None, as_bits=False, agg='any', n_jobs=1, verbose=True)[source]¶Bases: skchem.filters.base.Filter
Filter by elements.
Examples
Basic usage on molecules:
>>> import skchem
>>> hal_f = skchem.filters.ElementFilter(['F', 'Cl', 'Br', 'I'])
Molecules with one of the atoms transform to True.
>>> m1 = skchem.Mol.from_smiles('ClC(Cl)Cl', name='chloroform')
>>> hal_f.transform(m1)
True
Molecules with none of the atoms transform to False.
>>> m2 = skchem.Mol.from_smiles('CC', name='ethane')
>>> hal_f.transform(m2)
False
Can see the atom breakdown by passing agg == False: >>> hal_f.transform(m1, agg=False) has_element F 0 Cl 3 Br 0 I 0 Name: ElementFilter, dtype: int64
Can transform series.
>>> ms = [m1, m2]
>>> hal_f.transform(ms)
chloroform True
ethane False
dtype: bool
>>> hal_f.transform(ms, agg=False)
has_element F Cl Br I
chloroform 0 3 0 0
ethane 0 0 0 0
Can also filter series:
>>> hal_f.filter(ms)
chloroform <Mol: ClC(Cl)Cl>
Name: structure, dtype: object
>>> hal_f.filter(ms, neg=True)
ethane <Mol: CC>
Name: structure, dtype: object
columns
¶elements
¶skchem.filters.
OrganicFilter
(n_jobs=1, verbose=True)[source]¶Bases: skchem.filters.simple.ElementFilter
Whether a molecule is organic.
For the purpose of this function, an organic molecule is defined as having atoms with elements only in the set H, B, C, N, O, F, P, S, Cl, Br, I.
Examples
Basic usage as a function on molecules: >>> import skchem >>> of = skchem.filters.OrganicFilter() >>> benzene = skchem.Mol.from_smiles(‘c1ccccc1’, name=’benzene’)
>>> of.transform(benzene)
True
>>> ferrocene = skchem.Mol.from_smiles('[cH-]1cccc1.[cH-]1cccc1.[Fe+2]',
... name='ferrocene')
>>> of.transform(ferrocene)
False
More useful on collections:
>>> sa = skchem.Mol.from_smiles('CC(=O)[O-].[Na+]', name='sodium acetate')
>>> norbornane = skchem.Mol.from_smiles('C12CCC(C2)CC1', name='norbornane')
>>> data = [benzene, ferrocene, norbornane, sa]
>>> of.transform(data)
benzene True
ferrocene False
norbornane True
sodium acetate False
dtype: bool
>>> of.filter(data)
benzene <Mol: c1ccccc1>
norbornane <Mol: C1CC2CCC1C2>
Name: structure, dtype: object
>>> of.filter(data, neg=True)
ferrocene <Mol: [Fe+2].c1cc[cH-]c1.c1cc[cH-]c1>
sodium acetate <Mol: CC(=O)[O-].[Na+]>
Name: structure, dtype: object
skchem.filters.
AtomNumberFilter
(above=3, below=60, include_hydrogens=False, n_jobs=1, verbose=True)[source]¶Bases: skchem.filters.base.Filter
Filter whether the number of atoms in a Mol falls in a defined interval.
above <= n_atoms < below
Examples
>>> import skchem
>>> data = [
... skchem.Mol.from_smiles('CC', name='ethane'),
... skchem.Mol.from_smiles('CCCC', name='butane'),
... skchem.Mol.from_smiles('NC(C)C(=O)O', name='alanine'),
... skchem.Mol.from_smiles('C12C=CC(C=C2)C=C1', name='barrelene')
... ]
>>> af = skchem.filters.AtomNumberFilter(above=3, below=7)
>>> af.transform(data)
ethane False
butane True
alanine True
barrelene False
Name: num_atoms_in_range, dtype: bool
>>> af.filter(data)
butane <Mol: CCCC>
alanine <Mol: CC(N)C(=O)O>
Name: structure, dtype: object
>>> af = skchem.filters.AtomNumberFilter(above=5, below=15, include_hydrogens=True)
>>> af.transform(data)
ethane True
butane True
alanine True
barrelene False
Name: num_atoms_in_range, dtype: bool
columns
¶skchem.filters.
MassFilter
(above=3, below=900, n_jobs=1, verbose=True)[source]¶Bases: skchem.filters.base.Filter
Filter whether the molecular weight of a molecule is outside a range.
above <= mass < below
Examples
>>> import skchem
>>> data = [
... skchem.Mol.from_smiles('CC', name='ethane'),
... skchem.Mol.from_smiles('CCCC', name='butane'),
... skchem.Mol.from_smiles('NC(C)C(=O)O', name='alanine'),
... skchem.Mol.from_smiles('C12C=CC(C=C2)C=C1', name='barrelene')
... ]
>>> mf = skchem.filters.MassFilter(above=31, below=100)
>>> mf.transform(data)
ethane False
butane True
alanine True
barrelene False
Name: mass_in_range, dtype: bool
>>> mf.filter(data)
butane <Mol: CCCC>
alanine <Mol: CC(N)C(=O)O>
Name: structure, dtype: object
columns
¶skchem.filters.
Filter
(func=None, agg='any', n_jobs=1, verbose=True)[source]¶Bases: skchem.filters.base.BaseFilter
, skchem.base.Transformer
Filter base class.
Examples
>>> import skchem
Initialize the filter with a function: >>> is_named = skchem.filters.Filter(lambda m: m.name is not None)
Filter results can be found with transform: >>> ethane = skchem.Mol.from_smiles(‘CC’, name=’ethane’) >>> is_named.transform(ethane) True
>>> anonymous = skchem.Mol.from_smiles('c1ccccc1')
>>> is_named.transform(anonymous)
False
Can take a series or dataframe: >>> mols = pd.Series({‘anonymous’: anonymous, ‘ethane’: ethane}) >>> is_named.transform(mols) anonymous False ethane True Name: Filter, dtype: bool
Using filter will drop out molecules that fail the test: >>> is_named.filter(mols) ethane <Mol: CC> dtype: object
Only failed are retained with the neg keyword argument: >>> is_named.filter(mols, neg=True) anonymous <Mol: c1ccccc1> dtype: object