SMILES enumeration and vectorization for Keras

Esben Jannik Bjerrum/ December 1, 2017/ Blog, Cheminformatics, Machine Learning, Neural Network, RDkit, SMILES enumeration/ 0 comments

Girl smiling and holding molecule
The SMILES enumeration code at GitHub has been revamped and revised into an object for easier use. It can work in conjunction with a SMILES iterator object that give on-the-fly enumeration and vectorization for training of SMILES based Recurrent Neural Network (RNN) models of molecules for Keras. Doing the vectorization on-demand saves disk space for vectorized objects so that massive datasets can be used, on the potential cost of some training speed depending on GPU/CPU/Dataset Size/Network Size. The cost is largest for small datasets and small networks.
Cheers
Esben Jannik Bjerrum
 

Share this Post

Leave a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>
*
*