SMILES enumeration and vectorization for Keras

Esben Jannik Bjerrum/ December 1, 2017/ Blog, Cheminformatics, Machine Learning, Neural Network, RDkit, SMILES enumeration/ 0 comments

Girl smiling and holding molecule
The SMILES enumeration code at GitHub has been revamped and revised into an object for easier use. It can work in conjunction with a SMILES iterator object that give on-the-fly enumeration and vectorization for training of SMILES based Recurrent Neural Network (RNN) models of molecules for Keras. Doing the vectorization on-demand saves disk space for vectorized objects so that massive datasets can be used, on the potential cost of some training speed depending on GPU/CPU/Dataset Size/Network Size. The cost is largest for small datasets and small networks.
Cheers
Esben Jannik Bjerrum
 

Share this Post

Leave a Comment

Your email address will not be published.

*
*