Machine Learning optimization of Smina cross docking accuracy
In the two previous blog posts Ligand docking with Smina and Never use re-docking for …, it was demonstrated how easy it is to dock a small ligand using Smina, and how deceptively accurate a docking program can be when using re-docking rather that cross-docking.
It is however possible to re-tune the docking function to a specific purpose using machine learning approaches. This is the subject of my recent publication in Computational Biology and Chemistry: http://dx.doi.org/10.1016/j.compbiolchem.2016.04.005
I am quite enthusiastic about the approach. The fact that an alternative parameter set could be automatically derived from a small training set that matched the performance of the default parameters is not an insignificant accomplishment
– Anonymous Reviewer
In the article it’s demonstrated how the docking function can be re-tuned using a surprisingly small training set. In the approach the docking program in itself is treated as a kind of “black box”. A training set of 11 cross docking receptor pairs is fed to the docking program together with custom weights for the docking and scoring function. After docking, the RMSD of the cross-docked ligand is compared to the native pose and a loss function is calculated. In machine learning, a loss function is a function which calculates the “goodness” of the solution. So if good weights are chosen, ligands will be docked with a low RMSD between the docked and the native pose resulting in a low loss function. The loss function is then iteratively minimized with a particle swarm optimization algorithm. Particle swarm optimization is a global optimization algorithm that mimics the behavior of schools of fish and flocking birds to efficiently reach a good solution in a few hundred steps in this case.
This resulted in new weights for the Smina program that can be used instead of the default ones. Here’s how to redo the cross-docking experiment from last time using the new weights. First create a new text file with the terms and weights and name it CrossDock.score
-0.0460161 gauss(o=0,_w=0.5,_c=8) -0.000384274 gauss(o=3,_w=2,_c=8) -0.00812176 hydrophobic(g=0.5,_b=1.5,_c=8) -0.431416 non_dir_h_bond(g=-0.7,_b=0,_c=8) 0.366584 repulsion(o=0,_c=8)
Then dock as in the previous blog post, but using the –custom_scoring switch of Smina.
smina.static --custom_scoring CrossDock.score -r 1G32-receptor.pdbqt -l 1OYT-FSN.pdbqt --autobox_ligand 1OYT-FSN.pdbqt --autobox_add 8 --exhaustiveness 16 -o FSN-Crossdock-newWeights.pdbqt pymol FSN-Crossdock-newWeights.pdbqt 1OYT-FSN.pdbqt 1G32-receptor.pdbqt 1OYT-receptor.pdb qt
This gives better results than the previous cross-docking although not quite as perfect as the first test using re-docking. The optimized weights are lower for repulsion but higher for the first Gaussian and the hydrogen bond terms. This enables the docking program to dock ligands even in the presence of minor steric clashes. I haven’t tested but the parameters should be compatible with Autodock Vina, although the way the weights are changed are different.
The scoring should not be used to estimate the affinity of the ligand for the receptor. Here the default scoring function is much better than these weights, which are only optimized for cross docking accuracy.
1OYT and 1G32 are a receptor pair which was used in the test set in the publication. Be sure to test whether the default docking function or these weight gives best results for your target of interest. Now you know how to do it 😉 Let me know your experiences.
Esben Jannik Bjerrum
P.S. Drop me a note if you are interested in contributing for the hunt for even better docking functions.