Blog

The Blog is a collection of News, Blog posts, technical tips and tricks and other writings from my sphere of interest.
- Esben Jannik Bjerrum

Apr

How to solve problems with coordinate bonds in Rdkit

Chemical FlasksWhen I have been working with chemical databases and import of molecules I have encountered numerous problems with the way chemical structures are drawn. Most often the problem arises as creative users in the past have had a problem with registering a compound the way they wanted it. Sometimes the used software did not support a correct way, so the users found a workaround, which can give rise to errors with other software packages. A common problem arises with organometallic compound and coordinate bonds. Read on if you want to learn more about how to handle these kind of compounds with Rdkit.

Chemical Bond Types and File Formats

Chemical bonds come in a variety of versions. A regular covalent bond is formed when two atoms combine orbitals with only one electron each. This gives a molecular orbital with two electrons. However, if the electronegativity differences are large enough, a ionic bond is formed where the most electronegative atom attracts both electrons. Both these situations are handled by Rdkit without problems.
For the coordinate or dative bond, which this blog post is about, one atom donates both electrons into a covalent bond forming with another atom possessing an empty orbital. These differences in bond types can give rise to problems with software and formats that only supports the other bond types. The widely used .mol and .sdf formats didn’t contain dative or coordinate bonds in the V2000 format, but from around 2011 the hydrogen bond and dative bond was added to the specification for the V3000 format. You can read more about the formats and specification at https://en.wikipedia.org/wiki/Chemical_table_file and http://download.accelrys.com/freeware/ctfile-formats/ctfile-formats.zip (Sorry, the last file is behind a Login wall).

I recently had a need to work with database registration of chemical complexes, where it could be nice to specify the coordinate bonds. In the existing .sdf files they were specified as regular covalent bonds, but this led to problems with the internal chemistry model and electron accounting in both RDKit and Chemdraw. MarvinSketch supports a coordinate bond type and can read and write correctly formed v3000 Molfiles, ChemDraw version 15 had some issues with the V3000 format that I have informed Perkin-Elmer about. I don’t know if they have fixed them.

The bond type are defined in the BONDS section of the v3000 molfile in the second column of the bond line (M V30 ). Below is shown a snippet where the bond type 9 is the dative or coordinate bond type.

 M V30 BEGIN BOND
 M V30 1 1 1 2
 M V30 2 1 2 3
 M V30 3 1 1 4
 M V30 4 9 4 5
 M V30 5 9 3 5
 M V30 6 1 3 6

As an example I have chosen DiSodiumCopper-EDTA, a beautiful blue compound. Theres a video on YouTube showing how copper can make different coordination compounds resulting in color changes. https://www.youtube.com/watch?v=deNWxchzDRg. EDTA is a chelating agent capable of forming multipe coordination bonds, there is much more information about it on Wikipedia: https://en.wikipedia.org/wiki/Ethylenediaminetetraacetic_acid

RDKit and Dative/Coordination Bonds

Internally RDKit has support for a dative bond type, but after experimenting a bit, it was clear that v3000 formatted mol files with dative bonds could not be loaded into RDkit. If the bond types where changed to dative in an RDKit python session, they could also not be saved into .mol format when forcing the v3000 format.

Luckily, RDKit is open source (Thanks Greg and others :-), so it was possible to dive into the code and see the import and export functions. The C++ function responsible for loading mol files in v3000 formats was found in the ParseV3000BondBlock function in the Code/GraphMol/FileParsers/MolFileParser.cpp file. There I found a switch case handling the bond types and it was easy to add two extra cases for bond type 9 and 10.

      case 9: 
        bond = new Bond(Bond::DATIVE); 
        break; 
      case 10: 
        bond = new Bond(Bond::HYDROGEN); 
        break; 

For saving the dative bonds in V3000 formats, the bond types are handled in the GetV3000BondCode function in the /Code/GraphMol/FileParsers/MolFileWriter.cpp file.

      case Bond::DATIVE: 
        res = 9; 
        break; 
      case Bond::HYDROGEN: 
        res = 10; 
        break;

The complete patch file is shown in the end of this blog post.

Test drive of import, handling and export of coordinate bonds

Let’s see what happens when we try to work with molecules with Rdkit in a Python session đŸ™‚

from rdkit import Chem 
from rdkit.Chem import Draw 
cuEDTA = Chem.MolFromMolFile('DisodiumCopperEDTA_v3000.mol') 
Draw.ShowMol(cuEDTA) 
CuEDTA rendered by RDkit

CuEDTA rendered by RDkit

Nice, Coordinate bonds are shown with dotted lines.

Chem.MolToMolFile(cuEDTA,'Testsave.mol',forceV3000=True)

This also gives the right bond type in the Testsave molfile (M V30 4 9 4 5 ). Good. This illustrates how easy it sometimes is to work with the RDKit codebase. I’ve just started to work with the CPP code of RDKit and I have not done extensive testing of the patch yet. However, All the ctest tests included with the RDKit source completes without error, so it seem no to have broken anything.

Cheers,
Esben

The patch file to enable v3000 support in RDkit:

 diff --git a/Code/GraphMol/FileParsers/MolFileParser.cpp b/Code/GraphMol/FileParsers/MolFileParser.cpp 
index 1ca663b..3b4f9e2 100644 
--- a/Code/GraphMol/FileParsers/MolFileParser.cpp 
+++ b/Code/GraphMol/FileParsers/MolFileParser.cpp 
@@ -1963,6 +1963,12 @@ void ParseV3000BondBlock(std::istream *inStream, unsigned int &line, 
         bond = new Bond(Bond::AROMATIC); 
         bond->setIsAromatic(true); 
         break; 
+      case 9: 
+        bond = new Bond(Bond::DATIVE); 
+        break; 
+      case 10: 
+        bond = new Bond(Bond::HYDROGEN); 
+        break; 
       case 0: 
         bond = new Bond(Bond::UNSPECIFIED); 
         BOOST_LOG(rdWarningLog) 
diff --git a/Code/GraphMol/FileParsers/MolFileWriter.cpp b/Code/GraphMol/FileParsers/MolFileWriter.cpp 
index 793c09f..957ffac 100644 
--- a/Code/GraphMol/FileParsers/MolFileWriter.cpp 
+++ b/Code/GraphMol/FileParsers/MolFileWriter.cpp 
@@ -920,6 +920,12 @@ int GetV3000BondCode(const Bond *bond) { 
       case Bond::AROMATIC: 
         res = 4; 
         break; 
+      case Bond::DATIVE: 
+        res = 9; 
+        break; 
+      case Bond::HYDROGEN: 
+        res = 10; 
+        break; 
       default: 
         res = 0; 
         break;

Leave a Reply

Your email address will not be published. Required fields are marked *