Drug Discovery Data Science

What is Drug Discovery Data Science?

Data science was coined as a term in the sixties by our fellow Dane Peter Naur. The meaning of the term has changed somewhat over time. Today it is often understood as the field where domain knowledge, statistics and machine learning comes together with IT systems and programming to utilize the available data for business purposes.

Drug Discovery Data Science, Domain Knowledge, Machine Learning and Programming

In its core it is a data driven field, where the data itself is analyzed to find trends and patterns that can be used in future business decisions or automated systems. The business insight is sometimes neglected, but without a proper understanding of the business domain and needs, the wrong problems are sometimes solved by the programming and statistics. The word itself is a bit “buzzy”, and it has even been hailed as the forth paradigm in Science, the others being empirical, theoretical and simulation.

Drug Discover Data and Domain knowledge

Wildcard Pharmaceutical Consulting is specialist in the aspects of data science that deals with chemical structures and biological data. Our special focus area is Drug Discovery and Life Sciences.

QSAR or QSPR models (Quantitative Structure Activity/Property Relationsships) are decade old data driven approaches to modelling of properties of interest of chemical structures, and can be seen as an active part of the drug discovery data science. Cheminformatics or Chemoinformatics is the field that relates to handling of chemical data, as for storing, searching, retrieving and visualizing chemical structures. Wildcard Pharmaceutical Consulting has years of experience working with storing and retrieving chemical and biological data from databases and other IT systems. So if you have a need to work with chemical structures or biological data and are looking for solutions to enable your scientist to work more efficiently, we would encourage you to give us a non-committal and confidential call to discuss your needs.

The Statistics and Molecular Machine Learning

Machine learning is an important part of the data science field. The focus seem to have changed towards using more predictive modelling for business support. Earlier efforts were on storing, retrieving and visualizing of data in dash boards that could enable data exploration and support of business decisions taken by analysts. These efforts led to infrastructure for data that supports the next steps where we make data driven models directly supporting the business solutions and decisions in an automated way. When it comes to building machine learning models of chemical structures and biological data, we have experience with handling molecules and preparing the data for machine learning. Read more on the Molecular Machine Learning page.

Wildcard Pharmaceutical Consulting have over the years used multiple machine learning models, going from the simple, well-known statistical models to modern deep learning models and technologies. Clustering and unsupervised learning can often be used to get an overview of multi dimensional datasets. Uncertainty quantification and management can be used as part of quality assurance and early warning systems.

We believe that the data should be used to select the model, not our personal preferences or latest fashion. The model choice depends on the data amount and signal to noise properties as well as the relationship between the data and the decision the model should support.

We have the expertise to analyze the data and building test models to gauge the project feasibility. This can be done before expensive data science projects are decided upon, thus minimizing the risk of failed projects.

If you have some data, but are unsure if it can be used for a drug discovery data science project, we encourage you to give us a call.

Scientists discussing computational chemistry

Deep Learning for Drug Discovery

Deep Learning is a specialty of the Machine Learning field, where the feature engineering or calculations are skipped and the data is used in more “raw” form. With deep learning the training algorithms themselves extract and develop the necessary features needed for the machine learning task at hand. As the amount of data and computing power increases, these methods will start to outperform traditional feature engineering and extraction. Wildcard consulting are specialist when it comes to using deep learning on chemical structures and biological sequences as well as spectroscopical data. There are more details on our dedicated page, as this field is of great current interest.

The Programming

There is no data science without programming and IT expertise to bind it all together. We have decades long experience with programming in a life science and drug discovery setting. Our programming language of choice is Python, as a high level language that enables rapid prototyping. Calculation speed is often not a big issue as the “number crunching” is left for dedicated libraries programmed in other programming languages. We are fond of scikit-learn for general purpose machine learning using the Numpy package for fast matrix operation. For deep learning applications our first choice is Tensorflow and RDKit for linking the molecules with the models similar to projects like DeepChem.

For data storage and retrieval operations we have experience with traditional SQL databases such as Oracle and Postgres with specialized cartridges to handle the chemical information. There are also more modern approaches such as the Hadoop file system and data lake solutions. For python the excellent Pandas data frames enable efficient data manipulation and preparation. We are at home using Linux workstations and docker images for all our platform needs.

A Data Science Evolution, not Revolution

We are very adaptable to our customers current platforms and informatics systems. Often risks can be mitigated by choosing an evolutionary development process rather than a revolutionary. On multiple occasions we have learned new programming languages or IT systems using internal resources or procured the competences needed for successful project completion through our network. So if you have your data stored in an old system on an old platform, we are open for a discussion. If you have a legacy application written in a program language not mentioned here, we may still be of help.

Ready, Steady … Drug Discovery Data Science

If you happen to work with data in drug discovery, we encourage you to contact us for a chat about your data science needs and how we can help get the most out of it. Its not magic or rocket science, merely Drug Discovery Data Science.