SMILES enumeration and vectorization for Keras
The SMILES enumeration code at GitHub has been revamped and revised into an object for easier use. It can work in conjunction with a SMILES iterator object that give on-the-fly enumeration and vectorization for training of SMILES based Recurrent Neural Network (RNN) models of molecules for Keras. Doing the vectorization on-demand saves disk space for vectorized objects so that massive datasets can be used, on the potential cost of some training speed depending on GPU/CPU/Dataset Size/Network Size. The cost is largest for small datasets and small networks.
Esben Jannik Bjerrum