The Good, the Bad and the Ugly RDKit molecules
Rdkit is a nice cheminformatics toolkit with python bindings. Wildcard Pharmaceutical Consulting have over the years used it a lot for a couple of different projects in Python Programming. However, RDKit strives to ensure that the molecules created makes chemical sense, which can be a show stopper when working with large Sdfiles from various sources. OpenBabel is not so picky with molecules and can be used visualizing and trouble shooting ”Broken” molecules.
But it is possible to load ”unsanitizable” molecules into Rdkit molecular objects and then visualise them as the following python prompt example show. The molecule is created from a senseless smiles string, but could as well have been from a large SD file that needed to be automatically curated and standardized before developing a QSAR model or loading into a Chemical Database.
Python 2.7.3 (default, Jun 22 2015, 19:33:41) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from rdkit import Chem >>> from rdkit.Chem import Draw >>> mol = Chem.MolFromSmiles("c1ccccc1(C)(C)") [10:39:29] Can't kekulize mol
Rdkit doesn’t think penta valent carbon is a sensible idea and can’t kekulize the molecule. Me neither, but sometimes users by accident draw a methyl to much on a aromatic ring, so this is sometimes encountered in the wild and is a complete show stopper for the python script. But we can ask Rdkit NOT to Sanitize the molecule.
>>> mol = Chem.MolFromSmiles("c1ccccc1(C)(C)", sanitize=False) >>> Draw.MolToFile(mol, "BadMolecule.png",kekulize=False)
and there it is, our “Bad” molecule…..
More advanced work will sometime require the molecule to have an updated property cache.